Commit Graph

152407 Commits

Author SHA1 Message Date
Chen Zheng f6db18fd4a [PowerPC][NFC] make option ppc-formprep-max-vars can be set more than one time. 2021-11-04 13:44:58 +00:00
Simon Pilgrim 87d5bb66eb [X86][SSE] Improve PMADDWD SimplifyDemandedVectorElts handling
Check both operands for zero elements to remove unnecessary demanded elts.

Try to help reduce some minor regressions noticed in D110995
2021-11-04 12:56:31 +00:00
Florian Hahn b4992dbb21
[LV] Clarify uniform worklist contains instrs demanding lane 0. 2021-11-04 13:11:50 +01:00
Tim Northover 3d39612b3d Coroutines: don't infer function attrs before lowering
Coroutines have weird semantics that don't quite match normal LLVM functions,
so trying to infer even simple attributes based on thier contents can go wrong.
2021-11-04 10:24:28 +00:00
David Green 1e5f814302 [InstCombine] Fix infinite recursion in ashr/xor vector fold.
The added test has poison lanes due to the vector shuffle. This can
cause an infinite loop of combines in instcombine where it folds
xor(ashr, -1) -> select (icmp slt 0), -1, 0 -> sext (icmp slt 0) -> xor(ashr, -1).
We usually prevent this by checking that the xor constant is not -1,
but with vectors some of the lanes may be -1, some may be poison. So
this changes the way we detect that from "!C1->isAllOnesValue()" to
"!match(C1, m_AllOnes())", which is more able to detect that some of the
lanes are poison.

Fixes PR52397
2021-11-04 09:24:27 +00:00
Qiu Chaofan a84118756c [PowerPC] Enforce side effects to FPSCR read/set intrinsics
Currently, FPSCR is not modeled, so in some early passes (such as
early-cse), the read/set intrinsics to FPSCR may get incorrect
simplification.

Reviewed By: jsji

Differential Revision: https://reviews.llvm.org/D112380
2021-11-04 11:45:32 +08:00
RamNalamothu 539f500e78 [AMDGPU] Do not add debug locations to the code inside prologue
There is no real source location for code inside prologue as it is
generated by compiler but source locations are being added to code
inside prologue as a side effect of https://reviews.llvm.org/D99269
because buildSpillLoadStore() is using source location of the real
instruction in the basic block if any.

Fixes: SWDEV-307590

Reviewed By: scott.linder, sebastian-ne

Differential Revision: https://reviews.llvm.org/D113100
2021-11-04 08:02:41 +05:30
Philip Reames d4708fa480 Backout must-exit based parts of 3fc9882e, and 412eb0
Not sure these are correct.  I think I missed a case when porting this from the original SCEV change to the IndVar changes.  I may end up reapplying this later with a comment about how this is correct, but in case the current bad feeling turns out to be true, I'm removing from tree while investigating further.
2021-11-03 15:19:49 -07:00
Arthur Eubanks 88052fc362 [ArgPromo] Preserve FunctionAnalysisManagerCGSCCProxy
We already make sure to properly clear analyses for deleted functions.

This makes investigating some future potential compile time improvements easier.

Reviewed By: asbirlea

Differential Revision: https://reviews.llvm.org/D113032
2021-11-03 14:56:58 -07:00
Craig Topper 5022ac0771 [RISCV] Use HasVInstructions and HasVInstructionsAnyF in more place in TableGen. NFC
Change RISCVSubtarget.hasVInstructionAnyF() to call hasVInstructionsF32
so that any changes to hasVInstructionsF32 are reflected.

The files were missed in D112496.
2021-11-03 14:32:45 -07:00
Matthias Braun 847a680733 X86InstrInfo: Support immediates that are +1/-1 different in optimizeCompareInstr
This is a re-commit of e2c7ee0743 which
was reverted in a2a58d91e8. This includes
a fix to consistently check for EFLAGS being live-out. See phabricator
review.

Original Summary:

This extends `optimizeCompareInstr` to re-use previous comparison
results if the previous comparison was with an immediate that was 1
bigger or smaller. Example:

    CMP x, 13
    ...
    CMP x, 12   ; can be removed if we change the SETg
    SETg ...    ; x > 12  changed to `SETge` (x >= 13) removing CMP

Motivation: This often happens because SelectionDAG canonicalization
tends to add/subtract 1 often when optimizing for fallthrough blocks.
Example for `x > C` the fallthrough optimization switches true/false
blocks with `!(x > C)` --> `x <= C` and canonicalization turns this into
`x < C + 1`.

Differential Revision: https://reviews.llvm.org/D110867
2021-11-03 14:12:23 -07:00
Philip Reames 64990f1408 Revert "[indvars] Move a check slightlly earlier [NFC]"
This reverts commit 7ff943a9ed.

This wasn't NFC.  isSigned != !isUnsigned as there are also relational operators.
2021-11-03 13:38:52 -07:00
Kirill Stoimenov a55c4ec1ce [ASan] Process functions in Asan module pass
This came up as recommendation while reviewing D112098.

Reviewed By: vitalybuka

Differential Revision: https://reviews.llvm.org/D112732
2021-11-03 20:27:53 +00:00
alex-t 0a3d755ee9 [AMDGPU] Enable divergence-driven BFE selection
Detailed description: This change enables the bit field extract patterns
selection to s_bfe_u32 or v_bfe_u32 dependent on the pattern root node
divergence.

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D110950
2021-11-03 23:26:59 +03:00
Martin Storsjö a39eba7207 [Support] [Windows] Use RemoveFileOnSignal if unable to use the delete-on-close flag
This takes care of cleaning up the temp files on crashes. It doesn't
handle cleanup when explicitly killed though.

Differential Revision: https://reviews.llvm.org/D112710
2021-11-03 21:29:37 +02:00
Philip Reames 7ff943a9ed [indvars] Move a check slightlly earlier [NFC] 2021-11-03 12:24:10 -07:00
Philip Reames 3fc9882e88 [indvars] Rotate zext though icmp to reduce loop varying computation
This change looks for cases where we can prove that an exit test of a loop can be performed in a narrower bitwidth, and that by doing so we can replace a loop-varying extend with a loop-invariant truncate.

The motivation here is that doing this unblocks the trip count analysis for narrow IVs involved in extended compare exit tests. It also has the nice side effect of simply making the code faster, even if we gain no other benefit from the improved analysis ability.

I've noted a few places this could be extended, but I think this stands reasonable on it's own as well.

Differential Revision: https://reviews.llvm.org/D112262
2021-11-03 12:09:20 -07:00
Vitaly Buka 32eb697c0a [PassBuilder] Remove unused function after D113072 2021-11-03 12:03:17 -07:00
Vitaly Buka 3131714f8d [NFC][asan] Use AddressSanitizerOptions in ModuleAddressSanitizerPass
Reviewed By: kstoimenov

Differential Revision: https://reviews.llvm.org/D113072
2021-11-03 11:32:14 -07:00
Kirill Stoimenov b3145323b5 Revert "[ASan] Process functions in Asan module pass"
This reverts commit 76ea87b94e.

Reviewed By: kstoimenov

Differential Revision: https://reviews.llvm.org/D113129
2021-11-03 18:01:01 +00:00
Kirill Stoimenov 76ea87b94e [ASan] Process functions in Asan module pass
This came up as recommendation while reviewing D112098.

Reviewed By: vitalybuka

Differential Revision: https://reviews.llvm.org/D112732
2021-11-03 17:51:01 +00:00
Harald van Dijk 889c2b97bd
[X86] Fix X32 indirect call generation
The check for whether a zero extension was needed was subtly wrong and
saw a value that was already 64 bits, so did not extend.

Fixes PR52357.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D112860
2021-11-03 16:43:44 +00:00
Sanjay Patel c85df3c7d5 [InstCombine] refactor fold for icmp with trunc op; NFC
There are at least 3 related folds we can add here - see D112634.
2021-11-03 12:43:15 -04:00
Roman Lebedev 9c2469c1dd
[PassManager] `buildModuleOptimizationPipeline()`: schedule `LoopDeletion` pass run before vectorization passes
Test thanks to Michael Kuklinski from `#llvm`: https://godbolt.org/z/bdrah5Goo
originally inspired by Daniel Lemire's https://lemire.me/blog/2021/10/26/in-c-is-empty-faster-than-comparing-the-size-with-zero/

We manage to deduce that the answer does not require looping,
but we do that after the last `LoopDeletion` pass run,
so we end up being stuck with a dead loop.

Now, as with all things SCEV, this has
a very expected ~`+0.12%` compile time performance regression:
https://llvm-compile-time-tracker.com/compare.php?from=0ae7bf124a9bca76dd9a91b2f7379168ff13f562&to=c2ae57c9b961aeb4a28c747266949340613a6d84&stat=instructions
(for comparison, doing that in function simplification pipeline
would have been ~`+0.5` compile time performance regression, D112840)

Looking at the transformation stats over vanilla test-suite, i think it's rather expected:
```
| statistic name                                   |  baseline |  proposed |     Δ |      % |    |%| |
|--------------------------------------------------|----------:|----------:|------:|-------:|-------:|
| scalar-evolution.NumBruteForceTripCountsComputed |       789 |       888 |    99 | 12.55% | 12.55% |
| scalar-evolution.NumTripCountsNotComputed        |    105592 |    117900 | 12308 | 11.66% | 11.66% |
| loop-delete.NumBackedgesBroken                   |       542 |       559 |    17 |  3.14% |  3.14% |
| regalloc.numExtends                              |        81 |        79 |    -2 | -2.47% |  2.47% |
| indvars.NumFoldedUser                            |       408 |       400 |    -8 | -1.96% |  1.96% |
| indvars.NumElimCmp                               |      3831 |      3758 |   -73 | -1.91% |  1.91% |
| scalar-evolution.NumTripCountsComputed           |    299759 |    304278 |  4519 |  1.51% |  1.51% |
| loop-delete.NumDeleted                           |      8055 |      8128 |    73 |  0.91% |  0.91% |
| machine-cse.NumCommutes                          |       111 |       110 |    -1 | -0.90% |  0.90% |
| globaldce.NumFunctions                           |      1187 |      1192 |     5 |  0.42% |  0.42% |
| codegenprepare.NumSelectsExpanded                |       277 |       278 |     1 |  0.36% |  0.36% |
| loop-unroll.NumRuntimeUnrolled                   |     13841 |     13791 |   -50 | -0.36% |  0.36% |
| machinelicm.NumPostRAHoisted                     |      1168 |      1172 |     4 |  0.34% |  0.34% |
| phi-node-elimination.NumCriticalEdgesSplit       |     83054 |     82879 |  -175 | -0.21% |  0.21% |
| machine-cse.NumPREs                              |      3085 |      3079 |    -6 | -0.19% |  0.19% |
| branch-folder.NumBranchOpts                      |    108122 |    107942 |  -180 | -0.17% |  0.17% |
| loop-unroll.NumUnrolled                          |     40136 |     40067 |   -69 | -0.17% |  0.17% |
| branch-folder.NumDeadBlocks                      |    130818 |    130607 |  -211 | -0.16% |  0.16% |
| codegenprepare.NumBlocksElim                     |     92856 |     92714 |  -142 | -0.15% |  0.15% |
| instsimplify.NumSimplified                       |    103263 |    103129 |  -134 | -0.13% |  0.13% |
| instcombine.NumConstProp                         |     26070 |     26102 |    32 |  0.12% |  0.12% |
| instsimplify.NumExpand                           |      1716 |      1718 |     2 |  0.12% |  0.12% |
| loop-unroll.NumCompletelyUnrolled                |      9236 |      9225 |   -11 | -0.12% |  0.12% |
| branch-folder.NumHoist                           |      2773 |      2770 |    -3 | -0.11% |  0.11% |
| regalloc.NumReloadsRemoved                       |     10822 |     10834 |    12 |  0.11% |  0.11% |
| regalloc.NumSnippets                             |     11394 |     11406 |    12 |  0.11% |  0.11% |
| machine-cse.NumCrossBBCSEs                       |      1052 |      1053 |     1 |  0.10% |  0.10% |
| machinelicm.NumCSEed                             |     99887 |     99784 |  -103 | -0.10% |  0.10% |
| branch-folder.NumTailMerge                       |     72501 |     72435 |   -66 | -0.09% |  0.09% |
| codegenprepare.NumExtUses                        |     22007 |     21987 |   -20 | -0.09% |  0.09% |
| local.NumRemoved                                 |     68232 |     68294 |    62 |  0.09% |  0.09% |
| loop-vectorize.LoopsAnalyzed                     |     75483 |     75413 |   -70 | -0.09% |  0.09% |
```

Note that i'm only changing current PM, and not touching obsolete PM.

This is an alternative to the function simplification pipeline variant
of the same change, D112840. It has both less compile time impact
(since the additional number of SCEV trip count calculations
is way lass less than with the D112840), and it is
much more powerful/impactful (almost 2x more loops deleted).

I have checked, and doing this after loop rotation
is favorable (more loops deleted).

Reviewed By: mkazantsev

Differential Revision: https://reviews.llvm.org/D112851
2021-11-03 19:24:49 +03:00
Kazu Hirata 4bef0304e1 [AArch64, AMDGPU] Use make_early_inc_range (NFC) 2021-11-03 09:22:51 -07:00
Hans Wennborg a2a58d91e8 Revert "X86InstrInfo: Support immediates that are +1/-1 different in optimizeCompareInstr"
This casued miscompiles of switches, see comments on the code review.

> This extends `optimizeCompareInstr` to re-use previous comparison
> results if the previous comparison was with an immediate that was 1
> bigger or smaller. Example:
>
>     CMP x, 13
>     ...
>     CMP x, 12   ; can be removed if we change the SETg
>     SETg ...    ; x > 12  changed to `SETge` (x >= 13) removing CMP
>
> Motivation: This often happens because SelectionDAG canonicalization
> tends to add/subtract 1 often when optimizing for fallthrough blocks.
> Example for `x > C` the fallthrough optimization switches true/false
> blocks with `!(x > C)` --> `x <= C` and canonicalization turns this into
> `x < C + 1`.
>
> Differential Revision: https://reviews.llvm.org/D110867

This reverts commit e2c7ee0743.
2021-11-03 17:01:36 +01:00
Roman Lebedev df93c8a919
[X86] `X86TTIImpl::getInterleavedMemoryOpCostAVX512()`: fallback to scalarization cost computation for mask
I don't really buy that masked interleaved memory loads/stores are supported on X86.
There is zero costmodel test coverage, no actual cost modelling for the generation
of the mask repetition, and basically only two LV tests.
Additionally, i'm not very interested in AVX512.

I don't know if this really helps "soft" block over at
https://reviews.llvm.org/D111460#inline-1075467,
but i think it can't make things worse at least.

When we are being told that there is a masking, instead of
completely giving up and falling back to
fully scalarizing `BasicTTIImplBase::getInterleavedMemoryOpCost()`,
let's correctly query the cost of masked memory ops,
keep all the pretty shuffle cost modelling,
but scalarize the cost computation for the mask replication.

I think, not scalarizing the shuffles themselves
may adjust the computed costs a bit,
and maybe hopefully just enough to hide the "regressions"
at https://reviews.llvm.org/D111460#inline-1075467
I do mean hide, because the test coverage is non-existent.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D112873
2021-11-03 18:14:35 +03:00
Erich Keane 09233412ed Revert part of D112349 to allow ifunc resolvers be declarations.
The patch in D112349 added a previously nonexistant restriction on ifunc
resolvers that they MUST be defintions.  However, the function
multiversioning depends on being able to resolve these resolvers at
link-time, so this additional restriction was breaking.
2021-11-03 07:15:16 -07:00
David Sherwood c0f2774973 [NFC][LoopVectorize] Simple tidy-up in InnerLoopVectorizer::createVectorIntOrFpInductionPHI
Use getSignedIntOrFpConstant instead of creating int or FP constants
manually.
2021-11-03 14:05:21 +00:00
Peter Waller 7a34145f40 Reland "[AArch64][SVE][InstCombine] Combine contiguous gather/scatter to load/store"
This reverts commit 753eba6421.

Contiguous gather => masked load:

  (sve.ld1.gather.index Mask BasePtr (sve.index IndexBase 1))
  => (masked.load (gep BasePtr IndexBase) Align Mask undef)

Contiguous scatter => masked store:

  (sve.ld1.scatter.index Value Mask BasePtr (sve.index IndexBase 1))
  => (masked.store Value (gep BasePtr IndexBase) Align Mask)

Tests with <vscale x 2 x double>:

[Gather, Scatter] for each [Positive test (index=1), Negative test
(index=2), Alignment propagation].

Differential Revision: https://reviews.llvm.org/D112076
2021-11-03 13:42:14 +00:00
Peter Waller 753eba6421 Revert "[AArch64][SVE][InstCombine] Combine contiguous gather/scatter to load/store"
This reverts commit 1febf42f03, which has
a use-of-uninitialized-memory bug.

See: https://reviews.llvm.org/D112076
2021-11-03 13:39:38 +00:00
Florian Hahn 64bc31ee93
[LV] Drop unneeded use of getVPSingleValue (NFC).
VPReductionPHIRecipe inherits from VPValue, so there's no need to call
getVPSingleValue.
2021-11-03 14:26:15 +01:00
Florian Hahn 8e44bdd12a
[VPlan] Make VPWidenCanonicalIVRecipe a VPValue (NFC).
The recipe produces exactly one VPValue and can inherit directly from
it. This is in line with other recipes and avoids having to use
getVPSingleValue.
2021-11-03 14:11:01 +01:00
Andrew Savonichev 123ad720f1 [NVPTX] Mark special registers as reserved
A reserved register:
 - is not allocatable
 - is considered always live
 - is ignored by liveness tracking

NVPTX special registers match the criteria, and marking them as
reserved helps to avoid machine verifier error:

    *** Bad machine code: Using an undefined physical register ***
    - function:    foo
    - basic block: %bb.0  (0x557bb178b708)
    - instruction: %0:int32regs = MOV_SPECIAL $envreg0
    - operand 1:   $envreg0

Differential Revision: https://reviews.llvm.org/D113008
2021-11-03 15:48:04 +03:00
Cullen Rhodes d968b173d3 [TableGen] Emit a warning for unused template args
Add a warning to TableGen for unused template arguments in classes and
multiclasses, for example:

  multiclass Foo<int x> {
    def bar;
  }

  $ llvm-tblgen foo.td

  foo.td:1:20: warning: unused template argument: Foo::x
  multiclass Foo<int x> {
                     ^
A flag '--no-warn-on-unused-template-args' is added to disable the
warning. The warning is disabled for LLVM and sub-projects if
'LLVM_ENABLE_WARNINGS=OFF'.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D109359
2021-11-03 11:55:07 +00:00
Andrew Savonichev 0e70785538 [NVPTX] Add MoveParam instruction for TargetExternalSymbol operand
TargetExternalSymbol is considered to be an immediate and not a
register, so machine verifier emits an error:

    *** Bad machine code: Expected a register operand. ***
    - function:    static_offset
    - basic block: %bb.0 bb (0x560e9b306028)
    - instruction: %3:int64regs = MoveParamI64 &static_offset_param_1
    - operand 1:   &static_offset_param_1

The patch adds variants of this instruction with an immediate operand
for byval arguments on 64-bit and 32-bit targets.

Differential Revision: https://reviews.llvm.org/D113006
2021-11-03 14:43:41 +03:00
David Green 3bc586b9aa [ARM] Treat MVE gather add-like-or's like adds
LLVM has the habit of turning adds with no common bits set into ors,
which means we need to detect them and treat them like adds again in the
MVE gather/scatter lowering pass.

Differential Revision: https://reviews.llvm.org/D112922
2021-11-03 11:41:06 +00:00
Peter Waller 1febf42f03 [AArch64][SVE][InstCombine] Combine contiguous gather/scatter to load/store
Contiguous gather => masked load:

  (sve.ld1.gather.index Mask BasePtr (sve.index IndexBase 1))
  => (masked.load (gep BasePtr IndexBase) Align Mask undef)

Contiguous scatter => masked store:

  (sve.ld1.scatter.index Value Mask BasePtr (sve.index IndexBase 1))
  => (masked.store Value (gep BasePtr IndexBase) Align Mask)

Tests with <vscale x 2 x double>:

[Gather, Scatter] for each [Positive test (index=1), Negative test (index=2), Alignment propagation].

Differential Revision: https://reviews.llvm.org/D112076
2021-11-03 11:02:44 +00:00
David Green d36dd1f842 [ARM] Push gather/scatter shl index updates out of loops
This teaches the MVE gather scatter lowering pass that SHL is
essentially the same as Mul, where we are able to optimize the
induction of a gather/scatter address by pushing them out of loops.
https://alive2.llvm.org/ce/z/wG4VyT

Differential Revision: https://reviews.llvm.org/D112920
2021-11-03 11:00:05 +00:00
Qiu Chaofan 741aeda97d [PowerPC] Implement longdouble pack/unpack builtins
Implement two builtins to pack/unpack IBM extended long double float,
according to GCC 'Basic PowerPC Builtin Functions Available ISA 2.05'.

Reviewed By: jsji

Differential Revision: https://reviews.llvm.org/D112055
2021-11-03 17:57:25 +08:00
Andrew Savonichev 30a3a17df8 [NVPTX] Copy machine operand flags in TII::insertBranch
Before this patch, flags such as undef were dropped by TII::insertBranch
(used by BranchFolding pass), resulting in the following error from
machine verifier:

    *** Bad machine code: Reading virtual register without a def ***
    - function:    hoge
    - basic block: %bb.0 bb (0x562e9c240e68)
    - instruction: CBranch %2:int1regs, %bb.3
    - operand 0:   %2:int1regs

Differential Revision: https://reviews.llvm.org/D113001
2021-11-03 12:38:27 +03:00
Yi Kong 803d4f8a35 [ARM][AsmParser] Don't emit "deprecated instruction in IT block" warning if requested
Also fixed formatting in AsmMatcherEmitter because it was confusing.

Differential Revision: https://reviews.llvm.org/D112993
2021-11-03 17:18:04 +08:00
Piotr Sobczak 03961709ed [InstCombine] Extend pattern to replace shuffle's insertelement operand
In D71220 a pattern was added to replace shuffle's insertelement operand
if inserted scalar is not demanded. The pattern was added only for
the case where the shuffle's mask size is equal to element's vector size.
However, that condition is not required because the pattern does not
change the shuffle vector size.

This patch extends the pattern to also include cases where shuffle's mask
size is not equal to element's vector size.

Differential Revision: https://reviews.llvm.org/D112318
2021-11-03 09:43:04 +01:00
Ben Shi 59c3b48d99 Revert "[AArch64] Optimize add/sub with immediate"
This reverts commit 3de3ca3137.
2021-11-03 14:15:21 +08:00
Chen Zheng 5a8b196340 [PowerPC] handle more splat loads without stack operation
This mostly improves splat loads code generation on Power7

Reviewed By: jsji

Differential Revision: https://reviews.llvm.org/D106555
2021-11-03 05:17:41 +00:00
Johannes Doerfert d61aac76bf [OpenMP][FIX] Do not signal SPMD-mode but then keep generic-mode
If we assume SPMD-mode during the fixpoint iteration we have to execute
the kernel in SPMD-mode. If we change our mind during manifest there is
the chance of a mismatch between the simplification, e.g., of
`__kmpc_is_spmd_exec_mode` calls, and the execution mode. This problem
was introduced in D109438.

This patch is compromise to resolve the problem purely in OpenMP-opt
while trying to keep the benefits of D109438 around. This might not
always work, see `get_hardware_num_threads_in_block_fold` but it often
does. At the same time we do keep value specialization and execution
mode in sync.

Proper solutions to this problem should be considered. I believe a new
execution mode is the easiest way forward (Singleton-SPMD).
Alternatively, SPMD-mode execution can be used with a way to provide a
new thread_limit (here 1) to the runtime. This is more general and could
be useful if we see `num_threads` clauses or workshared loops with small
trip counts in the kernel. In either proposal we need to disable the
guarding for the kernel (which was the motivation for D109438).

Reviewed By: jhuber6

Differential Revision: https://reviews.llvm.org/D112894
2021-11-02 23:22:04 -05:00
Johannes Doerfert 73720c8059 [OpenMP][FIX] Introduce and use a simple generic-mode barrier
Before we had aligned barriers the `__kmpc_barrier_simple_spmd` was
OK to be used in the custom state machine. Now that SPMD barriers are
assumed to be aligned we need to use a "generic" barrier in places
that are not aligned.

Reviewed By: tianshilei1992

Differential Revision: https://reviews.llvm.org/D112893
2021-11-02 23:22:01 -05:00
Johannes Doerfert e6e440ae5f [OpenMP][FIX] Ensure guarding uses proper global name
Global symbols cannot have any name so we need to sanitize the string
first. Also remove an assertion that is not actually necessary nor
true in general.

Reviewed By: ggeorgakoudis

Differential Revision: https://reviews.llvm.org/D112892
2021-11-02 23:21:53 -05:00
Abinav Puthan Purayil fbe61fb0aa [AMDGPU] Fix SGPR checks in S_MOV_B64_IMM_PSEUDO generation.
The function to generate S_MOV_B64_IMM_PSEUDO was recently modified to
optimize AGPR to AGPR copy but it missed checking for the SGPR
clobbering for the S_MOV_B64_IMM_PSEUDO generation.

Differential Revision: https://reviews.llvm.org/D113005
2021-11-03 09:09:24 +05:30
Ben Shi 3de3ca3137 [AArch64] Optimize add/sub with immediate
Optimize ([add|sub] r, imm) -> ([ADD|SUB] ([ADD|SUB] r, #imm0, lsl #12), #imm1),
if imm == (imm0<<12)+imm1. and both imm0 and imm1 are non-zero 12-bit unsigned
integers.

Optimize ([add|sub] r, imm) -> ([SUB|ADD] ([SUB|ADD] r, #imm0, lsl #12), #imm1),
if imm == -(imm0<<12)-imm1, and both imm0 and imm1 are non-zero 12-bit unsigned
integers.

Reviewed By: jaykang10, dmgreen

Differential Revision: https://reviews.llvm.org/D111034
2021-11-03 03:06:43 +00:00
Liren Peng 57e093162e [ScalarEvolution] Infer loop max trip count from array accesses
Data references in a loop should not access elements over the
statically allocated size. So we can infer a loop max trip count
from this undefined behavior.

Reviewed By: reames, mkazantsev, nikic

Differential Revision: https://reviews.llvm.org/D109821
2021-11-03 10:40:18 +08:00
Phoebe Wang 8f101971b6 [X86][VARARG] Assign MMO earlier to avoid prolog insert point been sunk across VASTART_SAVE_XMM_REGS
The changes in D80163 defered the assignment of MachineMemOperand (MMO)
until the X86ExpandPseudo pass. This will result in crash due to prolog
insert point been sunk across the pseudo instruction VASTART_SAVE_XMM_REGS.

Moving the assignment to the creation of the node can avoid the problem.

Reviewed By: rnk

Differential Revision: https://reviews.llvm.org/D112859
2021-11-03 10:13:32 +08:00
Mircea Trofin 34f4fe3a90 [NFC][Regalloc] Ensure Query::interferingVRegs is accurate.
To correctly use Query, one had to first call collectInterferingVRegs to
pre-cache the query result, then call interferingVRegs. Failing the
former, interferingVRegs could be stale. This did cause a bug which was
addressed in D98232, but the underlying usability issue of the Query API
wasn't.

This patch addresses the latter by making collectInterferingVRegs an
implementation detail, and having interferingVRegs play both roles. One
side-effect of this is that interferingVRegs is not const anymore.

Differential Revision: https://reviews.llvm.org/D112882
2021-11-02 18:26:54 -07:00
Kazu Hirata 1b108ab975 [Transforms] Use make_early_inc_range (NFC) 2021-11-02 18:13:23 -07:00
Hongtao Yu d0eb472f33 [llvm-profdata] Print out section flags for FunctionMetadata section
As titled.

Reviewed By: wenlei, wlei

Differential Revision: https://reviews.llvm.org/D113064
2021-11-02 17:59:22 -07:00
Eli Friedman c964afb2c8 [AArch64] Diagnose large adrp offset on Windows.
On Windows, this relocation can only encode a 21-bit offset. Make sure
we emit an error, instead of silently truncating the offset.

Found investigating https://bugs.llvm.org/show_bug.cgi?id=52378

Differential Revision: https://reviews.llvm.org/D113051
2021-11-02 15:11:22 -07:00
Nikita Popov c00e9c6345 [BasicAA] Check known access sizes earlier (NFC)
All heuristics for variable accesses require both access sizes to
be known, so check this once at the start, rather than for each
particular heuristic.
2021-11-02 21:26:26 +01:00
Nikita Popov 0b6ed92c8a [BasicAA] Use early returns (NFC)
Reduce nesting in aliasGEP() a bit by returning early.
2021-11-02 21:17:36 +01:00
Simon Pilgrim 53900a19fd [X86][AVX] combineConcatVectorOps - use getBROADCAST_LOAD helper for splat of normal vector loads. NFCI.
Reapplied from rG1cfecf4fc427 with fix for PR51226 - ensure the load is a normal (non-ext) load.
2021-11-02 20:03:25 +00:00
Nikita Popov 51e9f33603 [BasicAA] Use saturating multiply on range if nsw
If we know that the var * scale multiplication is nsw, we can use
a saturating multiplication on the range (as a good approximation
of an nsw multiply). This recovers some cases where the fix from
D112611 is unnecessarily strict. (This can be further strengthened
by using a saturating add, but we currently don't track all the
necessary information for that.)

This exposes an issue in our NSW tracking for multiplies. The code
was assuming that (X +nsw Y) *nsw Z results in
(X *nsw Z) +nsw (Y *nsw Z) -- however, it is possible that the
distributed multiplications overflow, even if the non-distributed
one does not. We should discard the nsw flag if the the offset is
non-zero. If we just have (X *nsw Y) *nsw Z then concluding
X *nsw (Y *nsw Z) is fine.

Differential Revision: https://reviews.llvm.org/D112848
2021-11-02 20:27:39 +01:00
Chih-Ping Chen 2ed29d87ef [CodeView] Fortran debug info emission in Code View.
Differential Revision: https://reviews.llvm.org/D112826
2021-11-02 15:06:21 -04:00
Christopher Tetreault 5718b9f128 [NFC] Reformat VerifyPreservedCFG for non-CPP-aware syntax highlighters
* Move `);` outside the #ENDIF. Syntax highlighters that highlight missed
  closing parens, but are not aware of the C Preprocessor saw the original
  code as having missed parens.
2021-11-02 11:35:38 -07:00
Simon Pilgrim 82e0eb22af [X86][AVX] combineConcatVectorOps - use getBROADCAST_LOAD helper. NFCI.
This is part of rG1cfecf4fc427 that was reverted to fix PR51226 - concating the broadcasts is OK, its the splatted loads that crash (we're not detecting extloads). I'm still creating a reduced test case so haven't added the load handling again yet.
2021-11-02 18:04:35 +00:00
Fraser Cormack d065b03801 [RISCV] Optimize vp.load with an all-ones mask
Similar to D110206, this patch optimizes unmasked vp.load intrinsics to
avoid the need of a vmset instruction to set the mask. It does so by
selecting a riscv_vle intrinsic rather than a riscv_vle_mask intrinsic.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D113022
2021-11-02 17:23:39 +00:00
Dmitry Makogon e09958d5eb [LoopPeel] Peel loops with exits followed by an unreachable or deopt block
Added support for peeling loops with exits that are followed either by an
unreachable-terminated block or block that has a terminatnig deoptimize call.
All blocks in the sequence must have an unique successor, maybe except
for the last one.

Reviewed By: mkazantsev

Differential Revision: https://reviews.llvm.org/D110922
2021-11-02 23:12:04 +07:00
Arthur Eubanks e2024d72fa Revert "[NFC] Remove LinkAll*.h"
This reverts commit fe364e5dc7.

Causes breakages, e.g. https://lab.llvm.org/buildbot/#/builders/188/builds/5266
2021-11-02 09:08:09 -07:00
Jamie Schmeiser 816761f044 Add new choices dot-cfg and dot-cfg-quiet to print-changed.
Summary:
Add new options -print-changed=[dot-cfg | dot-cfg-quiet] which create
a website of DOT files showing colourized changes as the IR is changed
by passes in the new pass manager pipeline.

A new change reporter is introduced that creates a website of changes made
by passes in the opt pipeline that change the IR. The hidden option
-dot-cfg-dir=<dir> specifies a directory (defaulting to "./") into which the
website will be created.

A file passes.html is created that contains a list of all the passes that
act on the IR. Those that do not change the IR are listed as omitted
because of no change, ignored or filtered out (using -filter-print-func
and -filter-passes) or not listed in quiet mode. Those that
do change the IR are listed as a link to a DOT file which contains a
CFG depiction of the IR (ala -dot-cfg) except that the instructions,
basic blocks and links that are only in the IR before the pass (ie, removed)
and those that are only in the IR after the pass (ie, added) are shown in
red and green, respectively, while the aspects of the CFG that do not change
are shown in black. Additional hidden options
-dot-cfg-before-color=<dot named color>,
-dot-cfg-after-color=<dot named color> and
-dot-cfg-common-color=<dot named color> are defined that allow the
customization of the colors used in colorizing the CFG.
-change-printer-dot-path=<path to dot exe> is also added.

Author: Jamie Schmeiser <schmeise@ca.ibm.com>
Reviewed By: aeubanks (Arthur Eubanks)
Differential Revision: https://reviews.llvm.org/D87202
2021-11-02 12:06:25 -04:00
Arthur Eubanks fe364e5dc7 [NFC] Remove LinkAll*.h
These were added to prevent functions from being removed by WPO.

But that doesn't make sense, correct WPO will not remove functions we actually use.

I noticed these because compiling cc1_main.cpp was pulling in random LLVM pass headers.

Reviewed By: MaskRay

Differential Revision: https://reviews.llvm.org/D112971
2021-11-02 08:43:17 -07:00
Jay Foad be1a8f8834 [AMDGPU] Really preserve LiveVariables in SILowerControlFlow
https://bugs.llvm.org/show_bug.cgi?id=52204

Differential Revision: https://reviews.llvm.org/D112731
2021-11-02 15:03:37 +00:00
Matt 895145aacb Revert "[AArch64][SVE] Combine predicated FMUL/FADD into FMA"
This reverts commit fc28a2f8ce.
2021-11-02 14:56:01 +00:00
Youngsuk Kim 76b53da3ce
[SimpleLoopUnswitch] Remove duplicate include.
Header "llvm/Transforms/Scalar/SimpleLoopUnswitch.h" is currently
included twice. This commit removes the duplicate 'include' line.

Previous commit 693eedb138
seems to have mistakenly added the duplicate 'include'.

Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D112979
2021-11-02 15:22:41 +01:00
Sanjay Patel 829146164f [InstCombine] change 'not' match for bitwise select
The tests diffs are logically equivalent, and so this is
generally NFC, but this makes the code match the code
comment.

It should also be more efficient. If we choose the 'not'
operand (rather than the 'not' instruction) as the select
condition, then we don't have to invert the select
condition/operands as a subsequent transform.
2021-11-02 10:16:01 -04:00
Simon Pilgrim e173631dd1 [X86][AVX] SimplifyDemandedVectorEltsForTargetNode - use getBROADCAST_LOAD helper. NFCI.
Reduce width of X86ISD::SUBV_BROADCAST_LOAD node.
2021-11-02 14:07:22 +00:00
Simon Pilgrim 8ca666a280 [X86][AVX] lowerV2X128Shuffle - use getBROADCAST_LOAD helper. NFCI. 2021-11-02 14:07:21 +00:00
Daniele Vettorel 67887b0f81 [Scalarizer] Do not insert instructions between PHI nodes and debug intrinsics.
The scalarizer pass seems to be inserting instructions in-between PHI nodes or debug intrinsics that end up staying at the end of the pass, resulting in malformed IR and violating assumptions.

This patch adds a check to make sure the `extractelement` instructions that it adds are correctly placed after all PHI nodes and debug intrinsics.

Patch by vettoreldaniele.

Reviewed By: bjope

Differential Revision: https://reviews.llvm.org/D112472
2021-11-02 09:53:59 -04:00
Martin Liska c5029023fb Fix building with GCC 12:
Fixes: https://bugs.llvm.org/show_bug.cgi?id=52380

Differential Revision: https://reviews.llvm.org/D112990
2021-11-02 14:28:00 +01:00
jacquesguan a39eadcf16 [DAGCombiner] Teach combineShiftToMULH to handle constant and const splat vector.
Fold (srl (mul (zext i32:$a to i64), i64:c), 32) -> (mulhu $a, $b),
if c can truncate to i32 without loss.

Reviewed By: frasercrmck, craig.topper, RKSimon

Differential Revision: https://reviews.llvm.org/D108129
2021-11-02 12:04:23 +00:00
David Callahan 4ec1b8eeac [RISCV] Fix invalid kill on callee save
A callee save may be live (specifically X1) on entry and so a spill
should not mark it killed.

Differential Revision: https://reviews.llvm.org/D111285
2021-11-02 11:56:54 +00:00
Simon Pilgrim 325031786e [SelectionDAG] Optimize expansion for rotates/funnel shifts
If the type of a funnel shift needs to be expanded, expand it to two funnel shifts instead of regular shifts. For constant shifts, this doesn't make much difference, but for variable shifts it allows a more optimal lowering.

Also use the optimized funnel shift lowering for rotates.

Alive2: https://alive2.llvm.org/ce/z/TvHDB- / https://alive2.llvm.org/ce/z/yzPept

(Branched from D108058 as getting this completed should help unlock some other WIP patches).

Original Patch: @efriedma (Eli Friedman)

Differential Revision: https://reviews.llvm.org/D112443
2021-11-02 11:38:25 +00:00
Simon Pilgrim 37e17f278f [DAG] MatchRotate - remove (redundant) legal type check.
Rely on the hasOperation() instead - as commented on D77804, the mid-term intention is to recognise rotate/funnel-by-constant pre-legalization to help avoid SimplifyDemandedBits regressions.
2021-11-02 11:24:50 +00:00
Frederic Cambus 650311737e [llvm-readobj] Add support for reading OpenBSD ELF core notes.
Notes generated in OpenBSD core files provide additional information
about the kernel state and CPU registers. These notes are described
in core.5, which can be viewed here: https://man.openbsd.org/core.5

Differential Revision: https://reviews.llvm.org/D111966
2021-11-02 10:18:54 +01:00
Rosie Sumpter dcb8222d87 [LoopVectorize] Propagate fast-math flags for inloop reductions
This patch updates VPReductionRecipe::execute so that the fast-math
flags associated with the underlying instruction of the VPRecipe are
propagated through to the reductions which are created.

Differential Revision: https://reviews.llvm.org/D112548
2021-11-02 08:59:53 +00:00
Kazu Hirata 6bdb61c58a [CodeGen] Use make_early_inc_range (NFC) 2021-11-01 22:38:49 -07:00
hsmahesha e9ea992496 [IR] Replace *all* uses of a constant expression by corresponding instruction
When a constant expression CE is being converted into a corresponding instruction I,
CE is supposed to be replaced by I. However, it is possible that CE is being used multiple
times within a parent instruction PI. Make sure that *all* the uses of CE within PI are
replaced by I.

Reviewed By: rampitec, arsenm

Differential Revision: https://reviews.llvm.org/D112717
2021-11-02 10:01:46 +05:30
Hongtao Yu d137854412 [SamplePGO] Fix callsite sample lookup to use dwarf names when dwarf linkage name isn't available.
When linkage name isn't available in dwarf (ususally the case of C code),  looking up callee samples should be based on the dwarf name instead of using an empty string.

Also fixing a test issue where using empty string to look up callee samples accidentally returns the correct samples because it is treated as indirect call.

Reviewed By: wenlei

Differential Revision: https://reviews.llvm.org/D112948
2021-11-01 21:24:33 -07:00
Lang Hames e9014d9743 [ORC] Run incoming jit-dispatch calls via the TaskDispatcher in SimpleRemoteEPC.
Handlers for jit-dispatch calls are allowed to make their own EPC calls, so we
don't want to run these on the handler thread.
2021-11-01 15:49:14 -07:00
Wouter van Oortmerssen ac65366485 [WebAssembly] support "return" and unreachable code in asm type checker
To support return (it not being supported well was the ground cause for
https://github.com/WebAssembly/wasi-sdk/issues/200) we also have to have
at least a basic notion of unreachable, which in this case just means to stop
type checking until there is an end_block (an incoming control flow edge).
This is conservative (may miss on some type checking opportunities) but is
simple and an improvement over what we had before.

Differential Revision: https://reviews.llvm.org/D112953
2021-11-01 15:42:58 -07:00
Yonghong Song f63405f6e3 BPF: Workaround an InstCombine ICmp transformation with llvm.bpf.compare builtin
Commit acabad9ff6 ("[InstCombine] try to canonicalize icmp with
trunc op into mask and cmp") added a transformation to
convert "(conv)a < power_2_const" to "a & <const>" in certain
cases and bpf kernel verifier has to handle the resulted code
conservatively and this may reject otherwise legitimate program.

This commit tries to prevent such a transformation. A bpf backend
builtin llvm.bpf.compare is added. The ICMP insn, which is subject to
above InstCombine transformation, is converted to the builtin
function. The builtin function is later lowered to original ICMP insn,
certainly after InstCombine pass.

With this change, all affected bpf strobemeta* selftests are
passed now.

Differential Revision: https://reviews.llvm.org/D112938
2021-11-01 14:46:20 -07:00
Arthur Eubanks 029f1a5344 [LazyCallGraph] Skip blockaddresses
blockaddresses do not participate in the call graph since the only
instructions that use them must all return to someplace within the
current function. And passes cannot retrieve a function address from a
blockaddress.

This was suggested by efriedma in D58260.

Fixes PR50881.

Reviewed By: nickdesaulniers

Differential Revision: https://reviews.llvm.org/D112178
2021-11-01 13:10:24 -07:00
Nikita Popov 4972d12185 [SCEV] Only add direct loop users (NFC)
It it now sufficient to track only direct addrec users of a loop,
and let the SCEVUsers mechanism track and invalidate transitive users.

Differential Revision: https://reviews.llvm.org/D112875
2021-11-01 18:49:43 +01:00
Cameron McInally 702fd3d323 [SVE] Fix VLS FMA matching for CodeGenOpt::Aggressive.
For NEON, FMA matching is done in the MachineCombiner, and not the
DAGCombiner. That causes problems with VLS lowering, since the
vectors are fixed width at the DAGCombiner, but are scalable in
the MachineCombiner. This patch corrects it by matching FMAs for
VLS vectors in the DAGCombiner.

Reviewed By: paulwalker-arm

Differential Revision: https://reviews.llvm.org/D112557
2021-11-01 10:43:52 -07:00
Jay Foad b8016b626e [CodeGen] Tweak coding style in LivePhysRegs::stepForward. NFC. 2021-11-01 16:01:24 +00:00
Sanjay Patel 42c94bc1ab [InstCombine] allow vector splat matching for bitwise logic fold
Similar to 54e969cffd (and with cosmetic updates to hopefully
make that easier to read), this fold has been around since early
in LLVM history.

Intermediate folds have been added subsequently, so extra uses
are required to exercise this code.

The test example actually shows an unintended consequence with
extra uses - we end up with an extra instruction compared to what
we started with. But this at least makes scalar/vector consistent.

General proof:
https://alive2.llvm.org/ce/z/tmuBza
2021-11-01 11:39:48 -04:00
Kazu Hirata d000431fb2 [X86] Remove X86ELFObjectWriter in X86AsmBackend.cpp (NFC)
Note that the identically named class is defined in an anonymous
namespace in X86ELFObjectWriter.cpp.
2021-11-01 08:31:54 -07:00
Jay Foad 7afef22926 [AMDGPU] Use MachineInstrBuilder::addReg. NFC. 2021-11-01 15:29:51 +00:00
Jay Foad 2b548b18c1 [AMDGPU] Shrink v_mac_legacy_f32 and v_fmac_legacy_f32
Differential Revision: https://reviews.llvm.org/D112917
2021-11-01 13:55:53 +00:00
Matt Morehouse 4d8b0aa5c0 [HWASan] Apply TagMaskByte to every global tag.
Previously we only applied it to the first one, which could allow
subsequent global tags to exceed the valid number of bits.

Reviewed By: hctim

Differential Revision: https://reviews.llvm.org/D112853
2021-11-01 06:31:44 -07:00
Sanjay Patel 54e969cffd [InstCombine] allow vector splat matching for bitwise logic folds
This fold was added long ago (part of fixing PR4216),
and it matched scalars only. Intermediate folds have
been added subsequently, so extra uses are required
to exercise this code.

General proof:
https://alive2.llvm.org/ce/z/G6BBhB

One of the specific tests:
https://alive2.llvm.org/ce/z/t0JhEB
2021-11-01 08:26:42 -04:00
Sanjay Patel 511ee8759f [InstCombine] reduce code duplication with commutative matcher; NFC 2021-11-01 08:26:41 -04:00
Mubashar Ahmad 0b83a18a2b [AArch64] Enablement of Cortex-X2
Enables support for Cortex-X2 cores.

Differential Revision: https://reviews.llvm.org/D112459
2021-11-01 11:55:24 +00:00
Simon Pilgrim 6fc50e531d [CostModel][X86] Remove old FIXME comments for AVX512F vector splitting
Similar to AVX1, the cost of splitting/merging 512-bit -> 256-bits vectors for arithmetic operations are typically hidden due to different used ports etc.
2021-11-01 11:11:11 +00:00
Simon Pilgrim fd485d8cda [X86][AVX] Prefer VINSERTF128 over VPERM2F128 for 128->256 subvector concatenations
The VINSERTF128 instruction is often much quicker, and never slower, than the more general VPERM2F128 instruction, so we should try to use that in more circumstances.

This requires a fallback to a commuted VPERM2F128 for the case where we need to fold the 256-bit vector source instead of the 128-bit subvector source.

There is one interesting side effect - DAGCombine's narrowExtractedVectorLoad combine gets called in a number of locations, this often creates an extracted subvector load without regard to other uses of the original wider load. I'm expecting AVX cpus to be capable of merging such aliased loads, but I do wonder whether narrowExtractedVectorLoad's call to X86TargetLowering::shouldReduceLoadWidth needs to be altered to check for more partial uses?

Noticed while investigating the quality of interleaved load/store codegen.

Differential Revision: https://reviews.llvm.org/D111960
2021-11-01 10:45:50 +00:00
David Sherwood 87a294d5eb [LoopVectorize] Change getRuntimeVFAsFloat to use unsigned int->FP conversion
We never expect the runtime VF to be negative so we should use
the uitofp instruction instead of sitofp.

Differential revision: https://reviews.llvm.org/D112610
2021-11-01 09:58:14 +00:00
Roman Lebedev b554e41e2d
[CVP] Canonicalize signed relational comparisons of scalar integers to unsigned comparison predicates
Now that the reasoning was added to ConstantRange in D90924,
this replicates IndVars variant of this transform (D111836)
in a pass that uses value range reasoning for the transform.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D112895
2021-11-01 12:16:05 +03:00
Jun Ma 1f9fa54984 [Taildup] Don't tail-duplicate loop header with multiple successors as its latches
when Taildup hit loop with multiple latches like:
  //    1 -> 2 <-> 3                 |
  //          \  <-> 4               |
  //           \   <-> 5             |
  //            \---> rest           |
it may transform this loop into multiple loops by duplicate loop header.
However, this change may has little benefit while makes cfg much complex.
In some uncommon cases, it causes large compile time regression (offered by
@alexfh in D106056).

This patch disable tail-duplicate of such cases.

TestPlan: check-llvm

Differential Revision: https://reviews.llvm.org/D110613
2021-11-01 15:32:00 +08:00
Jun Ma c93f93b2e3 Revert "Revert "Recommit "Revert "[CVP] processSwitch: Remove default case when switch cover all possible values.""""
This reverts commit 3a998c06a8.
2021-11-01 15:31:59 +08:00
Kazu Hirata 476e1ee3da [AArch64] Remove unused declaration hasSwiftExtendedFrame (NFC) 2021-10-31 22:58:56 -07:00
Max Kazantsev e512c5b166 [SCEV][NFC] Factor out common API for getting unique operands of a SCEV
This function is used at least in 2 places, to it makes sense to make it separate.

Differential Revision: https://reviews.llvm.org/D112516
Reviewed By: reames
2021-11-01 11:36:47 +07:00
Chen Zheng eeed1545b2 [PowerPC] turn off chain commoning by default. 2021-11-01 04:11:10 +00:00
Itay Bookstein 848812a55e [Verifier] Add verification logic for GlobalIFuncs
Verify that the resolver exists, that it is a defined
Function, and that its return type matches the ifunc's
type. Add corresponding check to BitcodeReader, change
clang to emit the correct type, and fix tests to comply.

Reviewed By: MaskRay

Differential Revision: https://reviews.llvm.org/D112349
2021-10-31 20:00:57 -07:00
Zi Xuan Wu cf78715cae [CSKY] First patch to construct codegen infra and generate first add instruction
Ooops. It constructs codegen infra and provide only basic code to generate first add instruction successfully.

Differential Revision: https://reviews.llvm.org/D112206
2021-11-01 10:06:56 +08:00
Shoaib Meenai 0cf624cad7 [TimeProfiler] Reset variable to nullptr
Otherwise we'll hit a spurious assert failure when we reset and then
reinitialize TimeProfiler on the same thread, as can happen when e.g.
using LLD as a library and running it multiple times in the same
process.

Makes `lld/test/MachO/time-trace.s` pass with `LLD_IN_TEST=2`, which
runs the linker twice in the same process and exposed the issue.

Reviewed By: MaskRay, mehdi_amini

Differential Revision: https://reviews.llvm.org/D112880
2021-10-31 16:14:30 -07:00
Roman Lebedev 03a4f1f3b8
[ConstantRange] Sign-flipping of signedness-invariant comparisons
For certain combination of LHS and RHS constant ranges,
the signedness of the relational comparison predicate is irrelevant.

This implements complete and precise model for all predicates,
as confirmed by the brute-force tests. I'm not sure if there are
some more cases that we can handle here.

In a follow-up, CVP will make use of this.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D90924
2021-10-31 22:53:17 +03:00
Lang Hames ff846fcb64 [ORC][ORC-RT] Switch MachO EH/TLV registration from EPC-calls to alloc actions.
MachOPlatform used to make an EPC-call (registerObjectSections) to register the
eh-frame and thread-data sections for each linked object with the ORC runtime.

Now that JITLinkMemoryManager supports allocation actions we can use these
instead of an EPC call. This saves us one EPC-call per object linked, and
manages registration/deregistration in the executor, rather than the controller
process. In the future we may use this to allow JIT'd code in the executor to
outlive the controller object while still being able to be cleanly destroyed.

Since the code for allocation actions must be available when the actions are
run, and since the eh-frame registration code lives in the ORC runtime itself,
this change required that MachO eh-frame support be split out of
macho_platform.cpp and into its own macho_ehframe_registration.cpp file that has
no other dependencies. During bootstrap we start by forcing emission of
macho_ehframe_registration.cpp so that eh-frame registration is guaranteed to be
available for the rest of the bootstrap process. Then we load the rest of the
MachO-platform runtime support, erroring out if there is any attempt to use
TLVs. Once the bootstrap process is complete all subsequent code can use all
features.
2021-10-31 10:27:40 -07:00
Lang Hames b77c6db959 [JITLink] Fix alloc action call signature in InProcessMemoryManager.
Alloc actions should return a CWrapperFunctionResult. JITLink does not have
access to this type yet, due to library layering issues, so add a cut-down
version with a fixme.
2021-10-31 10:27:40 -07:00
Craig Topper ada5458521 [RISCV] Expand scalable vector bswap. Fix crash for bitreverse.
Fix LegalizeVectorOps to not try shuffle or unrolling expansions for
scalable vectors.

Differential Revision: https://reviews.llvm.org/D112236
2021-10-31 10:01:27 -07:00
Kazu Hirata 1a605f395f [CodeGen] Use make_early_inc_range (NFC) 2021-10-31 07:57:36 -07:00
Kazu Hirata 72710af233 [CodeGen, Target] Use MachineBasicBlock::terminators (NFC) 2021-10-31 07:57:34 -07:00
Kazu Hirata c714da2ceb [Transforms] Use {DenseSet,SetVector,SmallPtrSet}::contains (NFC) 2021-10-31 07:57:32 -07:00
Kazu Hirata 4cc7c4724f [MachineCSE] Use make_early_inc_range (NFC) 2021-10-30 19:00:23 -07:00
Kazu Hirata c8b1ed5fb2 [clang, llvm] Use Optional::getValueOr (NFC) 2021-10-30 19:00:21 -07:00
Lang Hames 213666f804 [ORC] Move CWrapperFunctionResult out of the detail:: namespace.
This type has been moved up into the llvm::orc::shared namespace.

This type was originally put in the detail:: namespace on the assumption that
few (if any) LLVM source files would need to use it. In practice it has been
needed in many places, and will continue to be needed until/unless
OrcTargetProcess is fully merged into the ORC runtime.
2021-10-30 16:12:45 -07:00
Kazu Hirata 5970249439 [Hexagon] Remove chksetELFHeaderEFlags (NFC)
The function was introduced without any use on Nov 9, 2015 in commit
7cd0892729.
2021-10-30 08:43:43 -07:00
Kazu Hirata c3d63a0697 [Hexagon] Remove ValidArch (NFC)
This function seems to be unused for at least one year.
2021-10-30 08:43:41 -07:00
Kazu Hirata c5cd371cc9 [Hexagon] Remove unused struct InstTy (NFC) 2021-10-30 08:43:39 -07:00
Roman Lebedev 25043c8276
[NFCI] Introduce `ICmpInst::compare()` and use it where appropriate
As noted in https://reviews.llvm.org/D90924#inline-1076197
apparently this is a pretty common pattern,
let's not repeat it yet again, but have it in a common place.

There may be some more places where it could be used,
but these are the most obvious ones.
2021-10-30 17:50:06 +03:00
David Green 2c4a9e830c [ValueTracking] Teach computeConstantRange that the maximum value of a half is 65504
The maximal value of a half is 0x7bff, which is 65504 when converted to
an integer. This patch teaches that to computeConstantRange to compute a
constant range with the correct maximum value.
https://alive2.llvm.org/ce/z/BV_Spb
https://alive2.llvm.org/ce/z/Nwuqvb

The maximum value for a float converted in the same way is 3.4e38, which
requires 129bits of data. I have not added that here as integer types so
larger are rare, compared to integers types larger than 17 bits require
for half floats.

The MVE tests change because instsimplify happens to be run as a part of
the backend, where it doesn't tend to for other backends.

Differential Revision: https://reviews.llvm.org/D112694
2021-10-30 14:27:38 +01:00
Christudasan Devadasan aa2d3b59ce GlobalISel/Utils: Use incoming regbank while constraining the superclasses
Register operands with superclasses can possibly have multiple regBanks
if they have different register types. The regBank ambiguity resolved
during regbankselect should be used to constrain the operand regclass
instead of obtaining one from the MCInstrDesc.

This is a prerequisite patch for D109300 that introduces allocatable AV_*
Superclasses for AMDGPU by combining both VGPRs and AGPRs and we want to
restrain the regclass to either A or V based on the incoming regbank.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D112323
2021-10-30 07:20:45 -04:00
David Green 66281baea1 [InstCombine] Fix type of constant in canonicalizeClampLike
As a followup to D108049, one of the constants could now be generated
with an incorrect type, now that the input could be truncated.
2021-10-30 09:06:21 +01:00
Kazu Hirata 972d4133e9 Use {DenseSet,SmallPtrSet}::contains (NFC) 2021-10-29 20:26:07 -07:00
Lang Hames afeb1e4ac7 [ORC] Move all pass config into MachOPlatformPlugin::modifyPassConfig.
NFC, this just makes it easier to see and reason about pass ordering.
2021-10-29 20:07:45 -07:00
Duncan P. N. Exon Smith 0d5b6423ba Support: Reduce stats in fs::copy_file on Darwin
fs::copy_file() on Darwin has a nice optimization to clone the file when
possible. Change the implementation to use clonefile() directly, instead
of the higher-level copyfile().  The latter does the wrong thing for
symlinks, which requires calling `stat` first...

With that out of the way, optimistically call clonefile() all the time,
and then for any error that's recoverable try again with copyfile()
(without the COPYFILE_CLONE flag, as before).

Differential Revision: https://reviews.llvm.org/D112250
2021-10-29 16:48:35 -07:00
Stanislav Mekhanoshin e5340ed30c [AMDGPU] Fix global isel for kernels using agprs on gfx90a
With Global ISel getReservedRegs() is called before function is
regbank selected for the first time. Defer caching of usesAGPRs()
in this case.

Differential Revision: https://reviews.llvm.org/D112644
2021-10-29 14:23:14 -07:00
Florian Hahn 274a9b0f0b
[DSE] Support redundant stores eliminated by memset.
This patch adds support to remove stores that write the same value
as earlier memesets.

It uses isOverwrite to check that a memset completely overwrites a later
store. The candidate store must store the same bytewise value as the
byte stored by the memset.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D112321
2021-10-29 22:19:53 +01:00
Nikita Popov cdf45f98ca [BasicAA] Extract linear expression multiplication (NFC)
Extract a common method for multiplying a linear expression by a
factor.
2021-10-29 22:41:40 +02:00
Sam Clegg 3b039c68f2 Revert "[WebAssembly] Fix debug locations for ExplicitLocals pass"
This reverts commit a66451ebbe.

This caused a failure when integrated with emscripten:
https://ci.chromium.org/ui/p/emscripten-releases/builders/try/linux/b8832019855439718609/overview
2021-10-29 13:34:18 -07:00
Nikita Popov 7cf7378a9d [BasicAA] Don't treat non-inbounds GEP as nsw
The scale multiplication is only guaranteed to be nsw if the GEP
is inbounds (or the multiplication is trivial). Previously we were
only considering explicit muls in GEP indices.
2021-10-29 22:30:44 +02:00
Nick Desaulniers 39e5dd113f [SparcISelLowering] avoid emitting libcalls to __muloti4 and __mulodi4
These compiler-rt-only symbols aren't available in libgcc.  Similar to
D108842, D108844, and D108926.

Fixes: pr/52043

Reviewed By: craig.topper, rengolin

Differential Revision: https://reviews.llvm.org/D112750
2021-10-29 13:14:09 -07:00
Sanjay Patel 285b8abce4 [x86] limit vector increment fold to allow load folding
The tests are based on the example from:
https://llvm.org/PR52032

I suspect that it looks worse than it actually is. :)
That is, llvm-mca says there's no uop/timing difference with the
load folding and pcmpeq vs. broadcast on Haswell (and probably
other targets).
The load-folding definitely makes the code smaller, so it's good
for that at least. So this requires carving a narrow hole in the
transform to get just this case without changing others that look
good as-is (in other words, the transform still seems good for
most examples).

Differential Revision: https://reviews.llvm.org/D112464
2021-10-29 15:48:35 -04:00
Sanjay Patel 837518d6a0 [x86] make mayFold* helpers visible to more files; NFC
The first function is needed for D112464, but we might
as well keep these together in case the others can be
used someday.
2021-10-29 15:48:35 -04:00
Sanjay Patel 8f786b4618 [InstCombine] fix comments to match code; NFC 2021-10-29 15:48:35 -04:00
modimo 5caad9b5d3 [InlineAdvisor] Add fallback/format switches and negative remark processing to Replay Inliner
Adds the following switches:

1. --sample-profile-inline-replay-fallback/--cgscc-inline-replay-fallback: controls what the replay advisor does for inline sites that are not present in the replay. Options are:

 1. Original: defers to original advisor
 2. AlwaysInline: inline all sites not in replay
 3. NeverInline: inline no sites not in replay

2. --sample-profile-inline-replay-format/--cgscc-inline-replay-format: controls what format should be generated to match against the replay remarks. Options are:

  1. Line
  2. LineColumn
  3. LineDiscriminator
  4. LineColumnDiscriminator

Adds support for negative inlining decisions. These are denoted by "will not be inlined into" as compared to the positive "inlined into" in the remarks.

All of these together with the previous `--sample-profile-inline-replay-scope/--cgscc-inline-replay-scope` allow tweaking in how to apply replay. In my testing, I'm using:
1. --sample-profile-inline-replay-scope/--cgscc-inline-replay-scope = Function to only replay on a function
2. --sample-profile-inline-replay-fallback/--cgscc-inline-replay-fallback = NeverInline since I'm feeding in only positive remarks to the replay system
3. --sample-profile-inline-replay-format/--cgscc-inline-replay-format = Line since I'm generating the remarks from DWARF information from GCC which can conflict quite heavily in column number compared to Clang

An alternative configuration could be to do Function, AlwaysInline, Line fallback with negative remarks which closer matches the final call-sites. Note that this can lead to unbounded inlining if a negative remark doesn't match/exist for one reason or another.

Updated various tests to cover the new switches and negative remarks

Testing:
ninja check-all

Reviewed By: wenlei, mtrofin

Differential Revision: https://reviews.llvm.org/D112040
2021-10-29 12:32:03 -07:00
Duncan P. N. Exon Smith 9902362701 Support: Use sys::path::is_style_{posix,windows}() in a few places
Use the new sys::path::is_style_posix() and is_style_windows() in a few
places that need to detect the system's native path style.

In llvm/lib/Support/Path.cpp, this patch removes most uses of the
private `real_style()`, where is_style_posix() and is_style_windows()
are just a little tidier.

Elsewhere, this removes `_WIN32` macro checks. Added a FIXME to a
FileManagerTest that seemed fishy, but maintained the existing
behaviour.

Differential Revision: https://reviews.llvm.org/D112289
2021-10-29 12:09:41 -07:00
modimo 51ce567b38 [SampleProfile] Add all callsites to AllCandidates if InlineReplay is in effect
Replay in sample profiling needs to be asked on candidates that may not have counts or below the threshold. If replay is in effect for a function make sure these are captured and also imported during thinLTO.

Testing:
ninja check-all

Reviewed By: wenlei

Differential Revision: https://reviews.llvm.org/D112033
2021-10-29 12:04:52 -07:00
Roman Lebedev 0ae7bf124a
[NFC][LoopDeletion] Count the number of broken backedges
Those don't contribute to the number of deleted loops.
2021-10-29 21:58:16 +03:00
Amara Emerson 5dd9e019dd [AArch64][GlobalISel] Fix an crash in RBS due to a new regclass being added.
rdar://84674985
2021-10-29 11:47:00 -07:00
Duncan P. N. Exon Smith 4e4883e1f3 Support: Expose sys::path::is_style_{posix,windows,native}()
Expose three helpers in namespace llvm::sys::path to detect the
path rules followed by sys::path::Style.

- is_style_posix()
- is_style_windows()
- is_style_native()

This are constexpr functions that that will allow a bunch of
path-related code to stop checking `_WIN32`.

Originally I looked at adding system_style(), analogous to
sys::endian::system_endianness(), but future patches (from others) will
add more Windows style variants for slash preferences. These helpers
should be resilient to that change, allowing callers to detect basic
path rules.

Differential Revision: https://reviews.llvm.org/D112288
2021-10-29 11:46:44 -07:00
Sanjay Patel d0e9879d96 [InstCombine] allow vector splat matching for bitwise logic folds
These transforms are also likely missing a one-use check,
but that's another patch.
2021-10-29 14:22:50 -04:00
Stanislav Mekhanoshin a905c54b76 [InstCombine] Fold `(~(a | b) & c) | ~(a | c)` into `~((b & c) | a)`
```
----------------------------------------
define i4 @src(i4 %a, i4 %b, i4 %c) {
  %or1 = or i4 %b, %a
  %not1 = xor i4 %or1, -1
  %or2 = or i4 %a, %c
  %not2 = xor i4 %or2, -1
  %and = and i4 %not2, %b
  %or3 = or i4 %and, %not1
  ret i4 %or3
}

define i4 @tgt(i4 %a, i4 %b, i4 %c) {
  %and = and i4 %c, %b
  %or = or i4 %and, %a
  %or3 = xor i4 %or, -1
  ret i4 %or3
}
Transformation seems to be correct!
```

Differential Revision: https://reviews.llvm.org/D112338
2021-10-29 10:58:09 -07:00
Matt Morehouse 33cc0cfd46 [X86] Don't affect jump tables under +tagged-globals.
`classifyLocalReference(nullptr)` is called to get the appropriate
relocation type for jump tables.  We should not use @GOTPCREL for this
case.

The new test cases trigger assertions without this patch.

Reviewed By: eugenis

Differential Revision: https://reviews.llvm.org/D112832
2021-10-29 10:37:43 -07:00
Fraser Cormack 8314a04ede [SelectionDAG] Allow FindMemType to fail when widening loads & stores
This patch removes an internal failure found in FindMemType and "bubbles
it up" to the users of that method: GenWidenVectorLoads and
GenWidenVectorStores. FindMemType -- renamed findMemType -- now returns
an optional value, returning None if no such type is found.

Each of the aforementioned users now pre-calculates the list of types it
will use to widen the memory access. If the type breakdown is not
possible they will signal a failure, at which point the compiler will
crash as it does currently.

This patch is preparing the ground for alternative legalization
strategies for vector loads and stores, such as using vector-predication
versions of loads or stores.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D112000
2021-10-29 18:27:31 +01:00
Craig Topper aefcd59895 [RISCV] Teach RISCVInsertVSETVLI::needVSETVLI to handle mask register instructions better.
If the VL operand of a mask register instruction comes from an
explicit vsetvli with a different VTYPE, we can still avoid needing
a vsetvli as long as the SEW/LMUL ratio is the same and policy bits
match.

Differential Revision: https://reviews.llvm.org/D112762
2021-10-29 09:49:36 -07:00
Simon Pilgrim 6102e5d56b [CostModel][X86] Remove old TODO comment
BMI (TZCNT) scalar handling was added at rGa2db388dce77c2f23f2009d7363a0b63bb54523c
2021-10-29 17:28:45 +01:00
Mircea Trofin d6790a0a3c [NFC] ProfileSummary: const most of the fields.
This simplifies readability / maintainability.
2021-10-29 08:36:08 -07:00
Bradley Smith 86972f1114 [AArch64][SVE] Use TargetFrameIndex in more SVE load/store addressing modes
Add support for generating TargetFrameIndex in complex patterns for
indexed addressing modes in SVE. Additionally, add missing load/stores
to getMemOpInfo and getLoadStoreImmIdx.

Differential Revision: https://reviews.llvm.org/D112617
2021-10-29 14:44:16 +00:00
Jay Foad 56f03d25b4 [IR] Remove createReplacementInstr. NFC.
It is unused since D112791.

Differential Revision: https://reviews.llvm.org/D112795
2021-10-29 15:03:19 +01:00
Jay Foad 1b758925ad [IR] Merge createReplacementInstr into ConstantExpr::getAsInstruction
createReplacementInstr was a trivial wrapper around
ConstantExpr::getAsInstruction, which also inserted the new instruction
into a basic block. Implement this directly in getAsInstruction by
adding an InsertBefore parameter and change all callers to use it. NFC.

A follow-up patch will remove createReplacementInstr.

Differential Revision: https://reviews.llvm.org/D112791
2021-10-29 15:02:58 +01:00
Jay Foad 21a1d4cf71 [AMDGPU] Change numBitsSigned for simplicity and document it. NFC.
Change numBitsSigned to return the minimum size of a signed integer that
can hold the value. This is different by one from the previous result
but is more consistent with numBitsUnsigned. Update all callers. All
callers are now more consistent between the signed and unsigned cases,
and some callers get simpler, especially the ones that deal with
quantities like numBitsSigned(LHS) + numBitsSigned(RHS).

Differential Revision: https://reviews.llvm.org/D112813
2021-10-29 14:22:06 +01:00
Chen Zheng 7591d21032 [PowerPC] fix a miscompile for Solaris build 2021-10-29 12:06:25 +00:00
Bradley Smith bf72a469ba [AArch64][SVE] Fix build failure introduced in 13faa5f440 2021-10-29 11:57:02 +00:00
David Green 11630dbbc3 [InstCombine] Fold BW/2+1 tops bits are same pattern
Match "icmp eq (trunc (lsr A, BW), (ashr (trunc A), BW-1))", which checks
the top BW/2 + 1 bits are all the same. Create "A >=s INT_MIN && A <=s
INT_MAX", which we generate as "icmp ult (add A, 2^BW-1), 2^BW" to skip
a few steps of instcombining.
https://alive2.llvm.org/ce/z/NjH6Ty
https://alive2.llvm.org/ce/z/_fEQ9P

Differential Revision: https://reviews.llvm.org/D109155
2021-10-29 12:30:20 +01:00
Simon Pilgrim 154c036ebb [X86] combineX86GatherScatter - only fold scale if the index isn't extended
As mentioned on D108539, when the gather indices are smaller than the pointer size, they are sign-extended BEFORE scale is applied, making the general fold unsafe.

If the index have sufficient sign-bits then folding the scale could be safe - I'll investigate this.
2021-10-29 11:48:05 +01:00
David Green 9020e22a87 [InstCombine] Convert xor (ashr X, BW-1), C -> select(X >=s 0, C, ~C)
The sequence of instructions `xor (ashr X, BW-1), C` (or with a truncation
`xor (trunc (ashr X, BW-1)), C)` takes a value, produces all zeros or all
ones and with it optionally inverts a constant depending on whether the
original input was positive or negative. This is the same as checking if
the value is positive, and selecting between the constant and ~constant.
https://alive2.llvm.org/ce/z/NJ85qY

This is a fairly general version of a fold that helps pull saturating
arithmetic into a canonical form.

Differential Revision: https://reviews.llvm.org/D109151
2021-10-29 11:19:20 +01:00
Bradley Smith 13faa5f440 [AArch64][SVE] Generate SVE >1 element structured load/stores from fixed types
This adds support for SVE structured loads/stores to the relevant target
hooks, such that we can support these instructions in the InterleavedAccess
pass.

Depends on D112078

Differential Revision: https://reviews.llvm.org/D112303
2021-10-29 09:35:57 +00:00
Cullen Rhodes 8686626244 [Sparc] NFC: Remove unused tblgen template args
Identified in D109359.

Reviewed By: sdesmalen

Differential Revision: https://reviews.llvm.org/D109712
2021-10-29 09:16:15 +00:00
Neubauer, Sebastian c78640ee6a [TailDuplicator] Fix merging block with terminator
The TailDuplicator merged two blocks, even if the first one ended with
a terminator, resulting in invalid MIR, where a terminator is in the
middle of a block.

Abort merging if the first block ends with a terminator.

Differential Revision: https://reviews.llvm.org/D112226
2021-10-29 10:52:46 +02:00
Vang Thao 52b43d1549 [AMDGPU] Fix cvt_f32_ubyte combine with shl
Shift node is still needed to check if the shift is shr or shl to increment/decrement offset. Do not override the node.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D112733
2021-10-28 21:43:06 -07:00
Chuanqi Xu bb16e83932 [NFC] [Coroutines] Use llvm::make_scope_exit to replace self-defined RTTIHelper 2021-10-29 12:14:20 +08:00
Kazu Hirata 01b4789b62 [AMDGPU] Remove hasDefinedInitializer (NFC)
The last use was removed on Sep 16, 2021 in commit
7a62a5b56d.
2021-10-28 20:33:34 -07:00
Kazu Hirata dd5d46b009 [AMDGPU] Remove unused BBSelectRegister in AMDGPUMachineCFGStructurizer (NFC)
This field seems to be unused for at least one year.
2021-10-28 20:33:32 -07:00
Kazu Hirata 309357c01a [AMDGPU] Remove unused declaration eliminateDeadBranchOperands (NFC) 2021-10-28 20:33:30 -07:00
Abinav Puthan Purayil db8d7b6e2d [DAGCombine][NFC] s/it's/its in the comment of hasNoInfs(). 2021-10-29 07:36:38 +05:30
Lang Hames 12b2cc2294 [ORC] Rename SupportFunctionCall to WrapperFunctionCall.
The new name better suits the type.

This patch also changes the signature of the run method (it now returns a
WrapperFunctionResult), and adds runWithSPSRet methods that deserialize the
function result using SPS.

Together these chages bring this type into close alignment with its ORC runtime
counterpart.
2021-10-28 17:48:54 -07:00
Lang Hames 999c6a235e Reapply e32b1eee6a "[ORC] Change SPSExecutorAddr serialization,..." with fixes.
This re-applies e32b1eee6a, which was reverted in
20675d8f7d due to broken unit tests. This patch
includes fixes for the tests.
2021-10-28 16:40:25 -07:00
Thomas Lively fb67f3d969 [WebAssembly] Add prototype relaxed float to int trunc instructions
Add i32x4.relaxed_trunc_f32x4_s, i32x4.relaxed_trunc_f32x4_u,
i32x4.relaxed_trunc_f64x2_s_zero, i32x4.relaxed_trunc_f64x2_u_zero.

These are only exposed as builtins, and require user opt-in.

Differential Revision: https://reviews.llvm.org/D112186
2021-10-28 14:01:53 -07:00
Daniel Kiss d8075e8781 Reland "[ARM] __cxa_end_cleanup should be called instead of _UnwindResume."
This is relanding commit da1d1a0869 .
This patch additionally addresses failures found in buildbots & post review comments.

ARM EHABI[1] specifies the __cxa_end_cleanup to be called after cleanup.
It will call the UnwindResume.
__cxa_begin_cleanup will be called from libcxxabi while __cxa_end_cleanup is never called.
This will trigger a termination when a foreign exception is processed while UnwindResume is called
because the global state will be wrong due to the missing __cxa_end_cleanup call.

Additional test here: D109856
[1] https://github.com/ARM-software/abi-aa/blob/main/ehabi32/ehabi32.rst#941compiler-helper-functions

Reviewed By: logan

Differential Revision: https://reviews.llvm.org/D111703
2021-10-28 21:45:09 +02:00
Wouter van Oortmerssen a66451ebbe [WebAssembly] Fix debug locations for ExplicitLocals pass
Differential Revision: https://reviews.llvm.org/D112487
2021-10-28 12:35:46 -07:00
Stanislav Mekhanoshin f7f430c913 [InstCombine] Fixed non-determinisctic order of new instructions
Fixes non-determinisctic order of XOR instructions created after
5a7a458306. The order of call argument evaluation is not
defined, so create one Value before the call.
2021-10-28 12:14:02 -07:00
Stanislav Mekhanoshin 5a7a458306 [InstCombine] Fold `(c & ~(a | b)) | (b & ~(a | c))` to `~a & (b ^ c)`
```
----------------------------------------
define i4 @src(i4 %a, i4 %b, i4 %c) {
%0:
  %or1 = or i4 %a, %b
  %not1 = xor i4 %or1, 15
  %and1 = and i4 %not1, %c
  %or2 = or i4 %a, %c
  %not2 = xor i4 %or2, 15
  %and2 = and i4 %not2, %b
  %or3 = or i4 %and1, %and2
  ret i4 %or3
}
=>
define i4 @tgt(i4 %a, i4 %b, i4 %c) {
%0:
  %xor = xor i4 %b, %c
  %not = xor i4 %a, 15
  %or3 = and i4 %xor, %not
  ret i4 %or3
}
Transformation seems to be correct!
```

Differential Revision: https://reviews.llvm.org/D112276
2021-10-28 11:54:30 -07:00
Ahmed Bougacha bef777206e [AArch64] Rename some timm predicates for consistency. NFC.
timm isn't the common case, and TImmLeafs should make it clear what
they are.  We're adding a plain ImmLeaf for 0_65535, so rename
i64_imm0_65535 to timm64_0_65535, and imm32_0_7 to timm32_0_7.
2021-10-28 11:41:29 -07:00
Yuanfang Chen ac02bcad56 [IRSymTab] Mark __stack_chk_guard used
`__stack_chk_guard` is a global variable that has no uses before the LLVM code generation phase (how it is defined is platform-dependent). LTO needs to preserve this symbol for that reason. Currently, legacy LTO API preserves it by hardcoding the logic in Internalizer, but this symbol is not preserved by regular LTO API in thinlink phase. This patch marks `__stack_chk_guard` used during IR symbol table writing since this is how builtin functions are preserved by thinlink by using `RuntimeLibcalls.def`.

Reviewed By: MaskRay, tejohnson

Differential Revision: https://reviews.llvm.org/D112595
2021-10-28 11:22:26 -07:00
Yuanfang Chen c18ed69873 [Internalize] Preserve __stack_chk_fail in Internalizer correctly
Move the section collecting `AlwaysPreserved` up before any
`maybeInternalize` is called. Otherwise, functions in `AlwaysPreserved` (in this case, `__stack_chk_fail`)
are not preserved.

Reviewed By: MaskRay, tejohnson

Differential Revision: https://reviews.llvm.org/D112684
2021-10-28 11:22:26 -07:00
Guozhi Wei 1e46dcb77b [TwoAddressInstructionPass] Put all new instructions into DistanceMap
In function convertInstTo3Addr, after converting a two address instruction into
three address instruction, only the last new instruction is inserted into
DistanceMap. This is wrong, DistanceMap should track all instructions from the
beginning of current MBB to the working instruction. When a two address
instruction is converted to three address instruction, multiple instructions may
be generated (usually an extra COPY is generated), all of them should be
inserted into DistanceMap.

Similarly when unfolding memory operand in function tryInstructionTransform
DistanceMap is not maintained correctly.

Differential Revision: https://reviews.llvm.org/D111857
2021-10-28 11:11:59 -07:00
Matthias Braun e2c7ee0743 X86InstrInfo: Support immediates that are +1/-1 different in optimizeCompareInstr
This extends `optimizeCompareInstr` to re-use previous comparison
results if the previous comparison was with an immediate that was 1
bigger or smaller. Example:

    CMP x, 13
    ...
    CMP x, 12   ; can be removed if we change the SETg
    SETg ...    ; x > 12  changed to `SETge` (x >= 13) removing CMP

Motivation: This often happens because SelectionDAG canonicalization
tends to add/subtract 1 often when optimizing for fallthrough blocks.
Example for `x > C` the fallthrough optimization switches true/false
blocks with `!(x > C)` --> `x <= C` and canonicalization turns this into
`x < C + 1`.

Differential Revision: https://reviews.llvm.org/D110867
2021-10-28 10:33:56 -07:00
Matthias Braun 97a1570d8c X86InstrInfo: Optimize more combinations of SUB+CMP
`X86InstrInfo::optimizeCompareInstr` would only optimize a `SUB`
followed by a `CMP` in `isRedundantFlagInstr`. This extends the code to
also look for other combinations like `CMP`+`CMP`, `TEST`+`TEST`, `SUB
x,0`+`TEST`.

- Change `isRedundantFlagInstr` to run `analyzeCompareInstr` on the
  candidate instruction and compare the results. This normalizes things
  and gives consistent results for various comparisons (`CMP x, y`,
  `SUB x, y`) and immediate cases (`TEST x, x`, `SUB x, 0`,
  `CMP x, 0`...).
- Turn `isRedundantFlagInstr` into a member function so it can call
  `analyzeCompare`.  - We now also check `isRedundantFlagInstr` even if
  `IsCmpZero` is true, since we now have cases like `TEST`+`TEST`.

Differential Revision: https://reviews.llvm.org/D110865
2021-10-28 10:33:56 -07:00
Florian Hahn c45045bfd0
[VPlan] Keep induction recipes in header.
This patch updates recipe creation to ensure all
VPWidenIntOrFpInductionRecipes are in the header block. At the moment,
new induction recipes can be created in different blocks when trying to
optimize casts and induction variables.

Having all induction recipes in the header makes it easier to
analyze/transform them in VPlan.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D111300
2021-10-28 18:22:05 +01:00
Nicolai Hähnle b437aaa672 MachineDominators: Define MachineDomTree type alias
This is a (very) small move towards making the machine dominators more
aligned with the IR dominators:

* DominatorTree / MachineDomTree is the class holding the dominator tree
* DominatorTreeWrapperPass / MachineDominatorTree is the corresponding
  (machine) function pass

This alignment will be used by analyses that are designed as templates
that work with LLVM IR as well as Machine IR.

Reviewed By: critson

Differential Revision: https://reviews.llvm.org/D112690
2021-10-28 22:30:35 +05:30
Leonard Grey 793b481f54 [CGProfile] Don't emit call graph profile edges with zero weight
With D112160 and D112164, on a Chrome Mac build this reduces the total
size of CGProfile sections by 78% (around 25% eliminated entirely) and
total size of object files by 0.14%.

Differential Revision: https://reviews.llvm.org/D112655
2021-10-28 11:32:49 -04:00
Daniel Kiss 66e03db814 Revert "Reland "[ARM] __cxa_end_cleanup should be called instead of _UnwindResume.""
This reverts commit b6420e575f.
2021-10-28 17:24:53 +02:00
Daniel Kiss b6420e575f Reland "[ARM] __cxa_end_cleanup should be called instead of _UnwindResume."
This is relanding commit da1d1a0869 .
This patch additionally addresses failures found in buildbots & post review comments.

ARM EHABI[1] specifies the __cxa_end_cleanup to be called after cleanup.
It will call the UnwindResume.
__cxa_begin_cleanup will be called from libcxxabi while __cxa_end_cleanup is never called.
This will trigger a termination when a foreign exception is processed while UnwindResume is called
because the global state will be wrong due to the missing __cxa_end_cleanup call.

Additional test here: D109856
[1] https://github.com/ARM-software/abi-aa/blob/main/ehabi32/ehabi32.rst#941compiler-helper-functions

Reviewed By: logan

Differential Revision: https://reviews.llvm.org/D111703
2021-10-28 16:49:19 +02:00
David Green 9358384fd6 [InstCombine] Extend canonicalizeClampLike to handle truncated inputs
This extends the canonicalizeClampLike function to allow cases where the
input is truncated, but still matching on the types of the ICmps. For
example
  %t = trunc i32 %X to i8
  %a = add i32 %X, 128
  %cmp = icmp ult i32 %a, 256
  %c = icmp sgt i32 %X, -1
  %f = select i1 %c, i8 High, i8 Low
  %r = select i1 %cmp, i8 %t, i8 %f
becomes
  %c1 = icmp slt i32 %X, -128
  %c2 = icmp sge i32 %X, 128
  %s1 = select i1 %c1, i32 sext(Low), i32 %X
  %s2 = select i1 %c2, i32 sext(High), i32 %s1
  %t = trunc i32 %s2 to i8
https://alive2.llvm.org/ce/z/vPzfxH

We limit the transform to constant High and Low values, where we know
the sext are free.

Differential Revision: https://reviews.llvm.org/D108049
2021-10-28 15:46:58 +01:00
Dawid Jurczak f87e0c68d7 [DSE] Eliminates redundant store of an exisiting value (PR16520)
That's https://reviews.llvm.org/D90328 follow-up.

This change eliminates writes to variables where the value that is being written is already stored in the variable.
This achieves the goal by looping through all memory definitions in the current state and getting defining access from each of them.
When there is defining access where the write instruction is identical to the original instruction it will remove this redundant write.

For example:

void f() {

x = 1;
if foo() {
   x = 1;
   g();
} else {
  h();
}

}
void g();
void h();

The second x=1 will be eliminated since it is rewriting 1 to x. This pass will produce this:

void f() {

x = 1;
if foo() {
   g();
} else {
  h();
}

}
void g();
void h();

Differential Revision: https://reviews.llvm.org/D111727
2021-10-28 16:20:09 +02:00
David Green 79011c705b [InstCombine] Fix rare condition violation in canonicalizeClampLike
With a "ult x, 0", the fold in canonicalizeClampLike does not validate
with undef inputs. This condition will usually have been simplified
away, but we should ensure the code is correct in case.
https://alive2.llvm.org/ce/z/S8HQ6H vs https://alive2.llvm.org/ce/z/h2XBJ_

See: https://reviews.llvm.org/D108049
2021-10-28 15:03:07 +01:00
Simon Pilgrim d29ccbecd0 [X86][AVX] Attempt to fold a scaled index into a gather/scatter scale immediate (PR13310)
If the index operand for a gather/scatter intrinsic is being scaled (self-addition or a shl-by-immediate) then we may be able to fold that scaling into the intrinsic scale immediate value instead.

Fixes PR13310.

Differential Revision: https://reviews.llvm.org/D108539
2021-10-28 14:07:17 +01:00
Alexey Bataev 07ef9f513f [SLP]Improve/fix reordering of the gathered graph nodes.
Gathered loads/extractelements/extractvalue instructions should be
checked if they can represent a vector reordering node too and their
order should ve taken into account for better graph reordering analysis/
Also, if the gather node has reused scalars, they must be reordered
instead of the scalars themselves.

Differential Revision: https://reviews.llvm.org/D112454
2021-10-28 05:45:09 -07:00
Sanjay Patel e8535fa784 [InstCombine] allow Negator to fold multi-use select with constant arms
The motivating test is reduced from:
https://llvm.org/PR52261

Note that the more general problem of folding any binop into a multi-use
select of constants is still there. We need to ease the restriction in
InstCombinerImpl::FoldOpIntoSelect() to catch those. But these examples
never reach that code because Negator exclusively handles negation
patterns within visitSub().

Differential Revision: https://reviews.llvm.org/D112657
2021-10-28 08:35:58 -04:00
Peter Waller 98f08752f7 [InstCombine][ConstantFolding] Make ConstantFoldLoadThroughBitcast TypeSize-aware
The newly added test previously caused the compiler to fail an
assertion. It looks like a strightforward TypeSize upgrade.

Reviewed By: paulwalker-arm

Differential Revision: https://reviews.llvm.org/D112142
2021-10-28 12:15:15 +00:00
Abinav Puthan Purayil 2da6ef3664 [AMDGPU] Add 24-bit mulhi intrinsics in INTRINSIC_WO_CHAIN combine.
mul24 intrinsic's operands are simplified by
AMDGPUTargetLowering::performIntrinsicWOChainCombine(). This change adds
the mul24hi intrinsics in the combine since its operands can be
simplified like that of the mul24 intrinsics.

Differential Revision: https://reviews.llvm.org/D112702
2021-10-28 16:57:48 +05:30
Sebastian Neubauer fd1cfc9094 [AMDGPU][GlobalISel] Fix waterfall loops
- Move the `s_and exec` to its correct position before the content of
  the waterfall loop
- Use the SI_WATERFALL pseudo instruction, like for sdag, to benefit
  from optimizations
- Add support for indirect function calls

To support indirect calls, add a G_SI_CALL instruction without register
class restrictions and insert a waterfall loop when applying register
banks.

Differential Revision: https://reviews.llvm.org/D109052
2021-10-28 10:30:55 +02:00
Neubauer, Sebastian 50d8d963e3 [GlobalISel] Simplify RegBankSelect
Save the instruction list of a block before selecting banks.
This allows to cope with moved instructions, even if they are reordered
or splitted into multiple basic blocks.

Differential Revision: https://reviews.llvm.org/D111223
2021-10-28 10:30:55 +02:00
Caroline Concatto 2186b011e9 [Driver][AArch64]Add driver support for neoverse-512tvb target
The support for  neoverse-512tvb mirrors the same option available in GCC[1].
There is no functional effect for this option yet.
This patch ensures the driver accepts "-mcpu=neoverse-512tvb", and enough
plumbing is in place to allow the new option to be used in the future.

[1]https://gcc.gnu.org/onlinedocs/gcc/AArch64-Options.html

Differential Revision: https://reviews.llvm.org/D112406
2021-10-28 09:08:40 +01:00
Martin Storsjö 177176f75c [Support] [Windows] Manually clean up temp files if not setting delete disposition
Since D81803 / 79657e2339, temp files
created on network shares don't set "Disposition.DeleteFile = true".
This flag normally takes care of removing the temp file both if the
process exits abnormally (either crashing or killed externally), and
when the file is closed cleanly.

For network shares, we voluntarily choose to not set the flag, and
if the operation to inspect the file handle (as a prerequisite to
setting the flag since 79657e2339)
fails we also error out. In both of these cases, we can at least make
sure to remove the temp files when they are closed cleanly.

Adjust the semantics of "OF_Delete" to not set the delete
disposition, but only set the access mode for allowing deletion.
Move the call to setDeleteDisposition into TempFile::create,
where we can check if it failed, and if it did, set a flag noting
that the file should be removed manually at the end.

This does leak files on crash, but at least doesn't leak files
in regular successful runs. (Technically, the alternative codepath
could use the RemoveFileOnSignal function, but that might complicate
the TempFile implementation further.)

This fixes https://github.com/mstorsjo/llvm-mingw/issues/233 and
https://bugs.llvm.org/show_bug.cgi?id=52080.

Differential Revision: https://reviews.llvm.org/D111875
2021-10-28 10:33:37 +03:00
Hongtao Yu 259e4c5658 [CSSPGO] Trim cold base profiles for the CS preinliner.
Adding support to the CS preinliner to trim cold base profiles. This makes trimming consistent with the inline decision made by the preinliner. Also disable the existing profile merger when preinliner is on unless explicitly specified.

Reviewed By: wenlei, wlei

Differential Revision: https://reviews.llvm.org/D112489
2021-10-27 22:50:27 -07:00
Hsiangkai Wang 7051f73d69 [RISCV] Sync Zvlsseg register order as the same as vector registers.
Sync the order of Zvlsseg registers with vector registers to avoid
unnecessary register copies between vector instructions and zvlsseg
instructions.

Differential Revision: https://reviews.llvm.org/D110250
2021-10-28 13:34:53 +08:00
Kazu Hirata cee3419d65 [AMDGPU] Remove unused declaration findNumUsedRegistersSI (NFC) 2021-10-27 21:24:02 -07:00
Phoebe Wang 2bc28c6f82 [X86] Add a dependency breaking xor before any gathers with an undef passthru value.
In the instruction encoding, the passthru register is always
tied to the destination register. The CPU scheduler has to wait
for the last writer of this register to finish executing before
the gather can start. This is true even if the initial mask is
all ones so that the passthru will never be used.

By explicitly zeroing the register we can break the false
dependency. The zero idiom is executed completing by the
register renamer and so is immedately considered ready.

Authored by Craig.

Reviewed By: lebedev.ri

Differential Revision: https://reviews.llvm.org/D112505
2021-10-28 11:44:52 +08:00
Hsiangkai Wang 0a9b82960c [RISCV] Use vmv.v.[v|i] if we know COPY is under the same vl and vtype.
If we know the source operand of COPY is defined by a vector instruction
with tail agnostic and the same LMUL and there is no vsetvli between
COPY and the define instruction to change the vl and vtype, we could use
vmv.v.v or vmv.v.i to copy vector registers to get better performance than
the whole vector register move instructions.

If the source of COPY is from vmv.v.i, we could use vmv.v.i for the
COPY.

This patch only considers all these instructions within one basic block.

Case 1:
```
bb.0:
  ...
  VSETVLI          # The first VSETVLI before COPY and VOP.
  ...              # Use this VSETVLI to check LMUL and tail agnostic.
  ...
  vy = VOP va, vb  # Define vy.
  ...              # There is no vsetvli between VOP and COPY.
  vx = COPY vy
```

Case 2:
```
bb.0:
  ...
  VSETVLI          # The first VSETVLI before VOP.
  ...              # Use this VSETVLI to check LMUL and tail agnostic.
  ...
  vy = VOP va, vb  # Define vy.
  ...              # There is no vsetvli to change vl between VOP and COPY.
  ...
  VSETVLI          # The first VSETVLI before COPY.
  ...              # This VSETVLI does not change vl and vtype.
  ...
  vx = COPY vy
```

Co-Authored-by: Zakk Chen <zakk.chen@sifive.com>
Co-Authored-by: Kito Cheng <kito.cheng@sifive.com>

Differential Revision: https://reviews.llvm.org/D103510
2021-10-28 11:39:04 +08:00
Max Kazantsev 513914e1f3 [SCEV] Invalidate user SCEVs along with operand SCEVs to avoid cache corruption
Following discussion in D110390, it seems that we are suffering from unability
to traverse users of a SCEV being invalidated. The result of that is that ScalarEvolution's
inner caches may store obsolete data about SCEVs even if their operands are
forgotten. It creates problems when we try to verify the contents of those caches.

It's also a frequent situation when messing with cache causes very sneaky and
hard-to-analyze bugs related to corruption of memory when dealing with cached
data. They are lurking there because ScalarEvolution's veirfication is not powerful
enough and misses many problematic cases. I plan to make SCEV's verification
much stricter in follow-ups, and this requires dangling-pointers-free caches.

This patch makes sure that, whenever we forget cached information for a SCEV,
we also forget it for all SCEVs that (transitively) use it.

This may have negative compile time impact. It's a sacrifice we are more
than willing to make to enforce correctness. We can also save some time by
reworking invokers of forgetMemoizedResults (maybe we can forget multiple
SCEVs with single query).

Differential Revision: https://reviews.llvm.org/D111533
Reviewed By: reames
2021-10-28 09:39:24 +07:00
Craig Topper 1387483e72 [RISCV] Replace most uses of RISCVSubtarget::hasStdExtV. NFCI
Add new hasVInstructions() which is currently equivalent.

Replace vector uses of hasStdExtZfh/F/D with new vector specific
versions. The vector spec no longer requires that the vectors implement the
same types as scalar. It only requires that the scalar type is
the maximum size the vectors can support. This is currently
implemented using the scalar rule we were using before.

Add new hasVInstructionsI64() begin using to qualify code that
requires i64 vector elements.

This is all NFC for now, but we can start using this to better
implement D112408 which introduces the Zve extensions.

Reviewed By: frasercrmck, eopXD

Differential Revision: https://reviews.llvm.org/D112496
2021-10-27 19:33:48 -07:00
Johannes Doerfert acf3093117 [Attributor][FIX] Do not ignore memory writes in AAMemoryBehavior
Even if we look for `nocapture` we need to bail on escaping pointers.
The crucial thing is that we might not look at a big enough scope when
we derive the memory behavior. Thus, it might be `nocapture` in a larger
context while it is "captured" in a smaller context.
2021-10-27 21:04:32 -05:00
Johannes Doerfert 734f91441d [Attributor][NFC] Improve debug messages 2021-10-27 21:04:31 -05:00
Ard Biesheuvel d7e089f2d6 [ARM] Use hardware TLS register in Thumb2 mode when -mtp=cp15 is passed
In ARM mode, passing -mtp=cp15 forces the use of an inline MRC system register read to move the thread pointer value into a register.

Currently, in Thumb2 mode, -mtp=cp15 is ignored, and a call to the __aeabi_read_tp helper is emitted instead.

This is inconsistent, and breaks the Linux/ARM build for Thumb2 targets, as the Linux kernel does not provide an implementation of __aeabi_read_tp,.

Reviewed By: nickdesaulniers, peter.smith

Differential Revision: https://reviews.llvm.org/D112600
2021-10-27 16:42:11 -07:00
Lang Hames 20675d8f7d Revert "[ORC] Change SPSExecutorAddr serialization, SupportFunctionCall struct."
This reverts commit e32b1eee6a.

Reverting while I fix some broken unit tests.
2021-10-27 16:39:56 -07:00
Johannes Doerfert 8a4551b893 [Attributor][FIX] Use right address space to avoid assertion
When we strip and accumulate constant offsets we need to pick the right
address space such that the offset APInt has the right bit width.

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D112544
2021-10-27 18:22:37 -05:00
Lang Hames e32b1eee6a [ORC] Change SPSExecutorAddr serialization, SupportFunctionCall struct.
SPSExecutorAddr will now be serializable to/from ExecutorAddr, rather than
uint64_t. This improves type safety when working with serialized addresses.

Also updates the SupportFunctionCall to use an ExecutorAddrRange (rather than
a separate ExecutorAddr addr and uint64_t size field), and updates the
tpctypes::*Write data structures to use ExecutorAddr rather than
JITTargetAddress.
2021-10-27 16:20:46 -07:00
Roman Lebedev b291597112
Revert rest of `IRBuilderBase`'s short-circuiting folds
Upon further investigation and discussion,
this is actually the opposite direction from what we should be taking,
and this direction wouldn't solve the motivational problem anyway.

Additionally, some more (polly) tests have escaped being updated.
So, let's just take a step back here.

This reverts commit f3190dedee.
This reverts commit 749581d21f.
This reverts commit f3df87d57e.
This reverts commit ab1dbcecd6.
2021-10-28 02:15:14 +03:00
Michael Liao e6a4ba3aa6 [amdgpu] Handle the case where there is no scavenged register.
- When an unconditional branch is expanded into an indirect branch, if
  there is no scavenged register, an SGPR pair needs spilling to enable
  the destination PC calculation. In addition, before jumping into the
  destination, that clobbered SGPR pair need restoring.
- As SGPR cannot be spilled to or restored from memory directly, the
  spilling/restoring of that SGPR pair reuses the regular SGPR spilling
  support but without spilling it into memory. As that spilling and
  restoring points are fully controlled, we only need to spill that SGPR
  into the temporary VGPR, which needs spilling into its emergency slot.
- The target-specific hook is revised to take additional restore block,
  where the restoring code is filled. After that, the relaxation will
  place that restore block directly before the destination block and
  insert an unconditional branch in any fall-through block into the
  destination block.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D106449
2021-10-27 18:37:27 -04:00
Ben Langmuir 3d13ee2891 [ORC][ORC-RT] Enable the MachO platform for arm64
Enables the arm64 MachO platform, adds basic tests, and implements the
missing TLV relocations and runtime wrapper function. The TLV
relocations are just handled as GOT accesses.

rdar://84671534

Differential Revision: https://reviews.llvm.org/D112656
2021-10-27 13:36:03 -07:00
Nikita Popov ea7be26045 [ConstantRange] Optimize smul_sat() (NFC)
Base the implementation on the APInt smul_sat() implementation,
which is much more efficient than performing calculations in
double the bitwidth.
2021-10-27 21:01:09 +02:00
Nikita Popov 665060ea45 [BasicAA] Remove misleading overflow check
GEP decomposition currently checks whether the multiplication of
the linear expression offset and GEP scale overflows. However, if
everything else works correctly, this overflow check is both
unnecessary and dangerously misleading. While it will avoid an
overflow in Scale * Offset in particular, other parts of the
calculation (including those on dynamic values) may still overflow.
The code working on the decomposed GEPs is responsible for ensuring
that it remains correct in the presence of overflow. D112611 fixes
the last issue of that kind that I'm aware of (in fact, the overflow
check was originally introduced to work around precisely that issue).

Differential Revision: https://reviews.llvm.org/D112618
2021-10-27 20:56:03 +02:00
Nick Desaulniers 3ccd041af9 [LowerTypeTests] Emit cfi_jt aliases regardless of function export
A constant complaint we get is that the __typeid__ symbols in the CFI
jump tables causes confusing stack traces in applications. Emit the more
readable cfi_jt aliases regardless of function export (LTO vs Thin LTO).

Reviewed By: pcc, tejohnson

Differential Revision: https://reviews.llvm.org/D107934
2021-10-27 11:36:26 -07:00
Philip Reames 425cbbc602 [Operator] Add hasPoisonGeneratingFlags [mostly NFC]
This method parallels the dropPoisonGeneratingFlags on Instruction, but is hoisted to operator to handle constant expressions as well.

This is mostly code movement, but I did go ahead and add the inrange constexpr gep case.  This had been discussed previously, but apparently never followed up o.
2021-10-27 11:25:40 -07:00
Alexey Bataev f06e332982 Revert "[SLP]Improve/fix reordering of the gathered graph nodes."
This reverts commit 64d1617d18 to fix test
non-stability.
2021-10-27 11:16:58 -07:00
Roman Lebedev 156f10c840
[IR] `SCEVExpander::generateOverflowCheck()`: short-circuit `umul_with_overflow`-by-one
It's a no-op, no overflow happens ever: https://alive2.llvm.org/ce/z/Zw89rZ

While generally i don't like such hacks,
we have a very good reason to do this: here we are expanding
a run-time correctness check for the vectorization,
and said `umul_with_overflow` will not be optimized out
before we query the cost of the checks we've generated.

Which means, the cost of run-time checks would be artificially inflated,
and after https://reviews.llvm.org/D109368 that will affect
the minimal trip count for which these checks are even evaluated.
And if they aren't even evaluated, then the vectorized code
certainly won't be run.

We could consider doing this in IRBuilder,  but then we'd need to
also teach `CreateExtractValue()` to look into chain of `insertvalue`'s,
and i'm not sure there's precedent for that.

Refs. https://reviews.llvm.org/D109368#3089809
2021-10-27 19:45:55 +03:00
Kazu Hirata 593451bd3c [X86] Remove getSETOpc (NFC)
This function seems to be unused for at least one year.
2021-10-27 09:22:31 -07:00
Kazu Hirata e6b6190ead [X86] Remove NeedsRetpoline in X86AsmPrinter (NFC)
This field seems to be unused for at least one year.
2021-10-27 09:22:29 -07:00
Kazu Hirata cc73310a81 [X86] Remove CallOperand in X86Operand (NFC)
This field seems to be unused for at least one year.
2021-10-27 09:22:27 -07:00
Alexey Bataev 64d1617d18 [SLP]Improve/fix reordering of the gathered graph nodes.
Gathered loads/extractelements/extractvalue instructions should be
checked if they can represent a vector reordering node too and their
order should ve taken into account for better graph reordering analysis/
Also, if the gather node has reused scalars, they must be reordered
instead of the scalars themselves.

Differential Revision: https://reviews.llvm.org/D112454
2021-10-27 08:49:13 -07:00
David Sherwood 5d9318638e [NFC][LoopVectorize] Change getStepVector to take a Value* for the StartIdx
This patch changes the definition of getStepVector from:

  Value *getStepVector(Value *Val, int StartIdx, Value *Step, ...

to

  Value *getStepVector(Value *Val, Value *StartIdx, Value *Step, ...

because:

1. it seems inconsistent to pass some values as Value* and some as
   integer, and
2. future work will require the StartIdx to be an expression made up
   of runtime calculations of the VF.

In widenIntOrFpInduction I've changed the code to pass in the
value returned from getRuntimeVF, but the presence of the assert:

  assert(!VF.isScalable() && "scalable vectors not yet supported.");

means that currently this code path is only exercised for fixed-width
VFs and so the patch is still NFC.

Differential revision: https://reviews.llvm.org/D111882
2021-10-27 16:12:38 +01:00
Roman Lebedev ab1dbcecd6
[IR] `IRBuilderBase::CreateSelect()`: if cond is a constant i1, short-circuit
While we could emit such a tautological `select`,
it will stick around until the next instsimplify invocation,
which may happen after we count the cost of this redundant `select`.
Which is precisely what happens with loop vectorization legality checks,
and that artificially increases the cost of said checks,
which is bad.

There is prior art for this in `IRBuilderBase::CreateAnd()`/`IRBuilderBase::CreateOr()`.

Refs. https://reviews.llvm.org/D109368#3089809
2021-10-27 18:01:05 +03:00
Alexey Bataev 9b12975cbf Revert "[SLP]Improve/fix reordering of the gathered graph nodes."
This reverts commit f719b794bc to fix
instability in tests.
2021-10-27 07:31:36 -07:00
Kerry McLaughlin f01fafdcd4 [SVE][CodeGen] Fix incorrect legalisation of zero-extended masked loads
PromoteIntRes_MLOAD always sets the extension type to `EXTLOAD`, which
results in a sign-extended load. If the type returned by getExtensionType()
for the load being promoted is something other than `NON_EXTLOAD`, we
should instead pass this to getMaskedLoad() as the extension type.

Reviewed By: CarolineConcatto

Differential Revision: https://reviews.llvm.org/D112320
2021-10-27 14:15:41 +01:00
Alexey Bataev f719b794bc [SLP]Improve/fix reordering of the gathered graph nodes.
Gathered loads/extractelements/extractvalue instructions should be
checked if they can represent a vector reordering node too and their
order should ve taken into account for better graph reordering analysis/
Also, if the gather node has reused scalars, they must be reordered
instead of the scalars themselves.

Differential Revision: https://reviews.llvm.org/D112454
2021-10-27 06:08:40 -07:00
Caroline Concatto 1137b7207d [SelectionDAG] Widening the result of INSERT_SUBVECTOR.
Widens the result and first input vector because they have the same size.
The subvector to be inserted is widened in the operand widen function.

Differential Revision: https://reviews.llvm.org/D112187
2021-10-27 13:52:25 +01:00
Nikita Popov fbc0c308d5 [BasicAA] Handle known bits as ranges
BasicAA currently tries to determine that the offset is positive by
checking whether all variable indices are positive based on known
bits, multiplied by a positive scale. However, this is incorrect
if the scale multiplication might overflow. In the modified test
case the original value is positive, but may be negative after a
left shift.

Fix this by converting known bits into a constant range and reusing
the range-based logic, which handles overflow correctly.

Differential Revision: https://reviews.llvm.org/D112611
2021-10-27 14:41:31 +02:00
Daniel Kiss 894ddba1c9 Revert "[ARM] __cxa_end_cleanup should be called instead of _UnwindResume."
This reverts commit da1d1a0869.
2021-10-27 14:29:35 +02:00
Sanjay Patel 6c0a2c2804 [x86] enhance mayFoldLoad to check alignment
As noted in D112464, a pre-AVX target may not be able to fold an
under-aligned vector load into another op, so we shouldn't report
that as a load folding candidate. I only found one caller where
this would make a difference -- combineCommutableSHUFP() -- so
that's where I added a test to show the (minor) regression.

Differential Revision: https://reviews.llvm.org/D112545
2021-10-27 07:54:25 -04:00
Matt fc28a2f8ce [AArch64][SVE] Combine predicated FMUL/FADD into FMA
Combine FADD and FMUL intrinsics into FMA when the result of the FMUL is an FADD operand
with one only use and both use the same predicate.

Differential Revision: https://reviews.llvm.org/D111638
2021-10-27 11:41:23 +00:00
Alexandros Lamprineas 8689f5e6e7 [AArch64] Add support for the 'R' architecture profile.
This change introduces subtarget features to predicate certain
instructions and system registers that are available only on
'A' profile targets. Those features are not present when
targeting a generic CPU, which is the default processor.

In other words the generic CPU now means the intersection of
'A' and 'R' profiles. To maintain backwards compatibility we
enable the features that correspond to -march=armv8-a when the
architecture is not explicitly specified on the command line.

References: https://developer.arm.com/documentation/ddi0600/latest

Differential Revision: https://reviews.llvm.org/D110065
2021-10-27 12:32:30 +01:00
Alexey Bataev cb4feae7bd [SLP]Fix logical and/or reductions.
Need to emit select(cmp) instructions for poison-safe forms of select
ops. Currently alive reports that `Target is more poisonous than source`
for operations we generating for such instructions.

https://alive2.llvm.org/ce/z/FiNiAA

Differential Revision: https://reviews.llvm.org/D112562
2021-10-27 04:25:20 -07:00
Nikita Popov 9bc7e543b4 [BasicAA] Make range check more precise
Make the range check more precise by calculating the range of
potentially accessed bytes for both accesses and checking whether
their intersection is empty. In that case there can be no overlap
between the accesses and the result is NoAlias.

This is more powerful than the previous approach, because it can
deal with sign-wrapped ranges. In the test case the original range
is [-1, INT_MAX] but becomes [0, INT_MIN] after applying the offset.
This is a wrapping range, so getSignedMin/getSignedMax will treat
it as a full range. However, the range excludes the elements
[INT_MIN+1, -1], which is enough to prove NoAlias with an access
at offset -1.

Differential Revision: https://reviews.llvm.org/D112486
2021-10-27 12:40:58 +02:00
Jay Foad b9e3af124b [LiveInterval] Add RemoveDeadValNo argument to removeSegment(iterator)
Add an optional bool RemoveDeadValNo argument to the
removeSegment(iterator) overload, for consistency with the other
overloads. This gives clients a way to remove dead valnos while also
getting an updated iterator returned (in the manner of vector::erase).

Use this to clean up some inefficient code in
LiveIntervals::repairOldRegInRange. NFC.

Differential Revision: https://reviews.llvm.org/D110560
2021-10-27 09:43:32 +01:00
Daniel Kiss da1d1a0869 [ARM] __cxa_end_cleanup should be called instead of _UnwindResume.
ARM EHABI[1] specifies the __cxa_end_cleanup to be called after cleanup.
It will call the UnwindResume.
__cxa_begin_cleanup will be called from libcxxabi while __cxa_end_cleanup is never called.
This will trigger a termination when a foreign exception is processed while UnwindResume is called
because the global state will be wrong due to the missing __cxa_end_cleanup call.

Additional test here: D109856
[1] https://github.com/ARM-software/abi-aa/blob/main/ehabi32/ehabi32.rst#941compiler-helper-functions

Reviewed By: logan

Differential Revision: https://reviews.llvm.org/D111703
2021-10-27 10:40:00 +02:00
David Sherwood 3d706c20f8 [NFC][LoopVectorize] Remove setBestPlan in favour of getBestPlanFor
I have removed LoopVectorizationPlanner::setBestPlan, since this
function is quite aggressive because it deletes all other plans
except the one containing the <VF,UF> pair required. The code is
currently written to assume that all <VF,UF> pairs will live in the
same vplan. This is overly restrictive, since scalable VFs live in
different plans to fixed-width VFS. When we add support for
vectorising epilogue loops when the main loop uses scalable vectors
then we will the vplan for the main loop will be different to the
epilogue.

Instead I have added a new function called

  LoopVectorizationPlanner::getBestPlanFor

that returns the best vplan for the <VF,UF> pair requested and leaves
all the vplans untouched. We then pass this best vplan to

  LoopVectorizationPlanner::executePlan

which now takes an additional VPlanPtr argument.

Differential revision: https://reviews.llvm.org/D111125
2021-10-27 09:38:27 +01:00
Arthur Eubanks ae27c57b18 [InferAddressSpaces] Make pass work with opaque pointers
Avoid getPointerElementType().
2021-10-26 23:53:20 -07:00
Kazu Hirata 6af3e87d2d [Hexagon] Remove set-but-unused variables (NFC) 2021-10-26 23:38:15 -07:00
Phoebe Wang eb55c1f153 [X86][NFC] Add the missed `break;` for 79f9dfef0d 2021-10-27 13:58:31 +08:00
Craig Topper 2783a5cfaf [RISCV] Add ICmp and FCmp to shouldSinkOperands. 2021-10-26 22:23:54 -07:00
Lang Hames 91434d4469 [JITLink] Fix element-present check in MachOLinkGraphParser.
Not all symbols are added to the index-to-symbol map, so we shouldn't use the
size of the map as a proxy for the highest valid index.
2021-10-26 20:48:40 -07:00
Lang Hames bfb40e83ee [ORC] Don't try to perform empty deallocations. 2021-10-26 20:48:40 -07:00
Max Kazantsev 5961f0308f [SCEV][NFC] Verify intergity of SCEVUsers
Make sure that, for every living SCEV, we have all its direct
operand tracking it as their user.

Differential Revision: https://reviews.llvm.org/D112402
Reviewed By: reames
2021-10-27 09:54:49 +07:00
Ben Shi 97e52e1c35 [RISCV] Optimize immediate materialisation with SLLI.UW in the Zba extension
Simplify "LUI+SLLI+ADDI+SLLI" and "LUI+ADDIW+SLLI+ADDI+SLLI" to
"LUI+ADDIW+SLLIUW" to reduce total instruction amount.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D111933
2021-10-27 02:48:38 +00:00
David Blaikie 3ac709b6ce llvm-dwarfdump --verify: Exit non-zero on simplified template name rebuilding failures 2021-10-26 15:57:16 -07:00
Austin Kerbow 02e60f2e77 [AMDGPU] Use max waves for scheduler's initial occupancy target
The scheduler should set critical/excess register usage thresholds that
are guided by the maximum possible occupancy for the function. This
change is focused on setting proper lower bounds on register usage which
we would typically only see when a specific number of maximum waves is
requested with the "waves-per-eu" attribute, or by setting
"amdgpu-num-vgpr|sgpr" directly. This was broken previously. I have a
follow-on patch that will address issues with the scheduler not
targeting correct upper bounds on register usage which is typical with
launch bounds and min "waves-per-eu".

Changes by this patch:

Set the initial critical register usage thresholds to minimum values
that are determined by the maximum possible occupancy for the function,
or the number of allocatable registers, whichever is lower.

Avoid unisgned overflow if register limits are lower than the register
tracking "ErrorMargin", I.e. when using stress-regalloc=2.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D112373
2021-10-26 15:30:26 -07:00
Yuanfang Chen 7c3fa52785 [DebugInfo] Skip ODRUniquing for mismatched tags
Otherwise, ODRUniquing would map some member method/variable MDNodes
to have enum type DIScope, resulting in invalid debug info and bad
DWARF.

- Add a Verifier check that when a 'scope:' operand is an ODR type that is not an enum.
- Makes ODRUniquing apply to only ODR types with the same tag so that the debuginfo/DWARF is well-formed.

Reviewed By: probinson, aprantl

Differential Revision: https://reviews.llvm.org/D111770
2021-10-26 15:28:25 -07:00
Sanjay Patel acabad9ff6 [InstCombine] try to canonicalize icmp with trunc op into mask and cmp
The motivating test is based on:
https://llvm.org/PR52260

We have better analysis for X == 0, so try harder to form that.
2021-10-26 17:43:28 -04:00
Fangrui Song 226465efe3 [ARC] Fix `undefined symbol: llvm::MachineFunction::dump() const` 2021-10-26 11:44:18 -07:00
Usman Nadeem da1318ccca [NFC][Instcombine] Cleanup some obsolete matches in visitSelectInstr
These are now redundant after https://reviews.llvm.org/D106872

Change-Id: I82edfedf1d45cac4e3368d77ce3a48c78e342c19
2021-10-26 10:07:08 -07:00
Rosie Sumpter b716d0aa94 [LoopVectorize] Clean up VPReductionRecipe::execute. NFC
Use RdxDesc->getOpcode instead of getUnderlingInstr()->getOpcode.
Move the code which finds Kind and IsOrdered to be outside the for loop
since neither of these change with the vector part.

Differential Revision: https://reviews.llvm.org/D112547
2021-10-26 17:18:25 +01:00
Kazu Hirata c3e698e2f5 [CodeGen, Hexagon] Use MachineBasicBlock::phis (NFC) 2021-10-26 09:01:29 -07:00
Jonas Paulsson bb506938be [SystemZ] Improvement of emitMemMemWrapper()
It was discovered that an extra register COPY remained when expanding a
(variable length) memory operation with a loop and there was another use of
the involved address register(s) afterwards.

A simple fix for this is to COPY the address registers before the loop and
use that new vreg instead.

Review: Ulrich Weigand

Differential Revision: https://reviews.llvm.org/D112065
2021-10-26 17:03:01 +02:00
Alexey Bataev ce14d1b690 [SLP]Do not reorder reduction nodes.
The final reduction nodes should not be reordered, the order does not
matter for reductions. Also, it might be profitable to vectorize smaller
reduction trees, reduction cost may compensate small tree cost.

Part of D111574

Differential Revision: https://reviews.llvm.org/D112467
2021-10-26 07:41:24 -07:00
zhijian 158083f0de [AIX][XCOFF] parsing xcoff object file auxiliary header
Summary:

The patch supports parsing the xcoff object file auxiliary header with llvm-readobj with option "auxiliary-headers"

the format of auxiliary header as
https://www.ibm.com/support/knowledgecenter/en/ssw_aix_72/filesreference/XCOFF.html#XCOFF__fyovh386shar

Reviewers: James Henderson, Jason Liu, Hubert Tong, Esme yi, Sean Fertile.

Differential Revision: https://reviews.llvm.org/D82549
2021-10-26 10:40:25 -04:00
Neubauer, Sebastian eb16570ab0 [AMDGPU] Remove unused CSR defs
CSR_AMDGPU_VGPRs_24_255 and CSR_AMDGPU_VGPRs_32_255 are not used
anywhere, so remove them.

Differential Revision: https://reviews.llvm.org/D112535
2021-10-26 16:01:49 +02:00
Abinav Puthan Purayil 61e3b9fefe [AMDGPU] Add constrained shift pattern matches.
The motivation for this is due to clang's conformance to
https://www.khronos.org/registry/OpenCL/specs/3.0-unified/html/OpenCL_C.html#operators-shift
which makes clang emit (<shift> a, (and b, <width> - 1)) for `a <shift> b`
in OpenCL where a is an int of bit width <width>.

Differential revision: https://reviews.llvm.org/D110231
2021-10-26 19:07:19 +05:30
Chen Zheng 631f44f338 [PowerPC] use right extend type for SCEV
Fix an issue caused by D108750

Reviewed By: nemanjai

Differential Revision: https://reviews.llvm.org/D112502
2021-10-26 13:32:03 +00:00
Abinav Puthan Purayil 781dd39b7b [AMDGPU] Enable 48-bit mul in AMDGPUCodeGenPrepare.
We were bailing out of creating 24-bit muls for results wider than 32
bits in AMDGPUCodeGenPrepare. With the 24-bit mulhi intrinsic, this
change teaches AMDGPUCodeGenPrepare to generate the 48-bit mul
correctly.

Differential Revision: https://reviews.llvm.org/D112395
2021-10-26 18:53:07 +05:30
Abinav Puthan Purayil 9bd5cfeb1f [AMDGPU] Implement llvm.amdgcn.mulhi.[i,u]24 intrinsics.
These intrinsics maps to the 24-bit v_mul_hi instructions.

This change also fixes an incorrect assumption on the associativity of
24-bit mulhi in its SDNode record in tblgen.

Differential Revision: https://reviews.llvm.org/D112394
2021-10-26 18:53:07 +05:30
Sanjay Patel 2ab0148c14 [x86] use cast instead of dyn_cast for unchecked usage; NFC
This was noted as an independent clean-up in D112464.
2021-10-26 08:20:19 -04:00
Neubauer, Sebastian 487f15603e [AMDGPU] Fix setcc combine for i128
The combine asserted if constants could not be represented as uint64_t.
Use APInts to fix this.

Differential Revision: https://reviews.llvm.org/D112416
2021-10-26 13:39:50 +02:00
Jay Foad c8e5aef1a0 [AMDGPU] Use standard MachineBasicBlock::getFallThrough method. NFCI.
Differential Revision: https://reviews.llvm.org/D101825
2021-10-26 12:07:54 +01:00
Jonas Paulsson 9f8872779a [SystemZ] Provide size values for PATCHPOINT, STACKMAP and FENTRY_CALL.
All instructions must have a correct size value close to emission when
SystemZLongBranch runs, or a necessary branch relaxation may be missed.

This patch also adds an assert for instruction sizes in SystemZLongBranch.

Review: Ulrich Weigand
2021-10-26 12:07:22 +02:00
Nikita Popov 11a8423dab [SCEV] Use reverse() (NFC) 2021-10-26 11:08:58 +02:00
Max Kazantsev 9bbfe0f72c [NFC] Remove obsolete simplifyOnceImpl function
The function simplifyOnce only calls simplifyOnceImpl and does nothing else.
Having this separate helper makes no sense. Removing it.

Patch by Dmitry Bakunevich!

Differential Revision: https://reviews.llvm.org/D112517
Reviewed By: mkazantsev
2021-10-26 13:51:42 +07:00
Max Kazantsev d4c74cd4e8 [NFC] [LoopPeel] Update IDoms of non-loop blocks dominated by the loop
When peeling a loop, we assume that the latch has a `br` terminator and that
all loop exits are either terminated with an `unreachable` or have a terminating
deoptimize call. So when we peel off the 1st iteration, we change the IDom of
all loop exits to the peeled copy of `NCD(IDom(Exit), Latch)`. This works now,
but if we add logic to support loops with exits that are followed by a block
with an `unreachable` or a terminating deoptimize call, changing the exit's idom
wouldn't be enough and DT would be broken.

For example, let `Exit1` and `Exit2` are loop exits, and each of them
unconditionally branches to the same `unreachable` terminated block. So neither
of the exits dominates this unreachable block. If we change the IDoms of the
exits to some peeled loop block, we don't update the dominators of the unreachable
block. Currently we just don't get to the peeling logic, saying that we can't peel
such loops.

Previously we stored exits' IDoms in a map before peeling a loop and then, after
peeling off one iteration, we changed their IDoms.
Now we use the same logic not only for exits but for all non-loop blocks dominated
by the loop.
So when we add logic to support peeling loops with exits which branch, for example,
to an unreachable-terminated block, we would update the IDoms not only for exits,
but for their successors.

Patch by Dmitry Makogon!

Differential Revision: https://reviews.llvm.org/D111611
Reviewed By: mkazantsev, nikic
2021-10-26 13:09:07 +07:00
Phoebe Wang 79f9dfef0d [X86] Move splat addends from the gather/scatter index operand to the base address
This can avoid a vector add and a constant pool load. Or an explicit broadcast in case of non-constant.

Also reverse the transform any time we encounter a constant index addend that can't be moved to base. In that case pull the constant from base into the index. This reduces code size needed for the displacement since we needed the index add anyway. Limit this to scale of 1 to avoid divisibility and wrap issues.

Authored by Craig.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D111595
2021-10-26 12:35:57 +08:00
Duncan P. N. Exon Smith b12a864c29 Bitcode: Use Expected<T>::takeError() and moveInto() more, NFC
Avoid naming some Expected<T> values in the Bitcode reader by using
takeError() and moveInto() more often. This follows the smaller set of
changes included in 2410fb4616.
2021-10-25 16:03:40 -07:00
Zarko Todorovski e9163660b1 [PPC][LLVM] Inclusive terms: remove references to sanity check in lib/Target/PowerPC
Removed references to `sanity check` in `PPCBranchCoalescing.cpp` code comments.
No word substitution made in this case, as the comments and code following illustrated are
sufficient IMO.

Reviewed By: quinnp

Differential Revision: https://reviews.llvm.org/D112452
2021-10-25 18:13:54 -04:00
Craig Topper d51e3a2139 [LegalizeTypes][TargetLowering] Merge getShiftAmountTyForConstant into TargetLowering::getShiftAmountTy.
getShiftAmountTyForConstant is a special helper that changes the
shift amount to i32 if the type chosen by
TargetLowering::getShiftAmountTy can't represent all possible values.
This is needed to satisfy an assert in SelectionDAG::getNode.

It requires additional consideration to know when this helper should be used.
I'm not sure that we are always using it when we should.

This patch merges the getShiftAmountTyForConstant handling into
TargetLowering::getShiftAmountTy so we don't need to think about it
anymore.

Technically this may slightly increase compile times since the majority
of callers of getShiftAmountTy won't need this. Hopefully, this isn't
an issue in practice.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D112469
2021-10-25 14:06:53 -07:00
Nikita Popov 3a995c918e [SCEV] Move SCEVLostPoisonFlags() check into SCEVExpander
Always insert values into ExprValueMap, and instead skip using them
in SCEVExpander if poison-generating flags have been lost. This
ensures that all values that are in ValueExprMap are also in
ExprValueMap, so we can use the latter to invalidate the former.

This change is probably not entirely NFC for the case where
originally the SCEV had no nowrap flags but they were inferred
later, in which case that would now allow reusing the existing
value for expansion.

Differential Revision: https://reviews.llvm.org/D112389
2021-10-25 22:37:20 +02:00
Arthur Eubanks 4a9db7367d [AlwaysInliner] Invalidate analyses when we delete functions
Fixes PR52292.

Reviewed By: asbirlea

Differential Revision: https://reviews.llvm.org/D112473
2021-10-25 13:36:32 -07:00
Zarko Todorovski 9769e97c35 [LLVM] Inclusive terms: remove/replace references to sanity in RewriteStatepointsForGC.cpp and test
Part of work to have the LLVM backend to use more inclusive terms.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D112461
2021-10-25 16:17:41 -04:00
Stefan Gränitz cdb335ffaf [JITLink] Fix warning 'shift count exceeds width' in AArch64 backend 2021-10-25 20:44:07 +02:00
Jeremy Morse 4136897bd4 [DebugInfo][InstrRef][NFC] Switch to using DenseMaps and similar
There are a few STL containers hanging around that can become DenseMaps,
SmallVectors and similar. This recovers a modest amount of compile time
performance.

While I'm here, adjust the bit layout of ValueIDNum: this was always
supposed to act like a value type, however it seems that clang doesn't
compile the comparison functions to act that way. Add a uint64_t to a
union that explicitly aliases the bitfields, so that we can compare the
whole value as a single integer.

Differential Revision: https://reviews.llvm.org/D112333
2021-10-25 18:07:17 +01:00
Wouter van Oortmerssen 5694dbccc3 [WebAssembly] support Memory64 in target_features section
Differential Revision: https://reviews.llvm.org/D112266
2021-10-25 09:31:45 -07:00
Jeremy Morse 97ddf49e43 [DebugInfo][InstrRef] Recover stack-slot tracking performance
This patch is like D111627 -- instead of calculating IDF for every location
on the stack, only do it for the smallest units of interference, and copy
the PHIs for those units to any aliases.

The test added runs placeMLocPHIs directly, and tests that:
 * A def of the lower 8 bits of a stack slot causes all aliasing regs to
   have PHIs placed,
 * It doesn't cause the equivalent location to x86's $ah, which isn't
   aliased, to have a PHI placed.

Differential Revision: https://reviews.llvm.org/D112324
2021-10-25 17:31:09 +01:00
Philip Reames f82cf6187f [indvars] Fix pr52276 (missing one use check)
The recently added logic to canonicalize exit conditions to unsigned relies on facts which hold about the use (i.e. exit test).  Applying this blindly to the icmp is not legal, as there may be another use which never reaches the exit.  Restrict ourselves to case where we have a single use.
2021-10-25 09:26:55 -07:00
Craig Topper e2b7aabb57 [RISCV] Reduce the number of RISCV vector builtins by an order of magnitude.
All but 2 of the vector builtins are only used by clang_builtin_alias.
When using clang_builtin_alias, the type string of the builtin is never
checked. Only the types in the function definition used for the alias
are checked.

This patch takes advantage of this to share a single builtin for
many different types. We already used type overloads on the IR intrinsic
so the codegen for the builtins that are being merge were already
the same. This extends the type overloading to the builtins.

I had to make a few tweaks to make this work.
-Floating point vector-vector vmerge now uses the vmerge intrinsic
 instead of the vfmerge intrinsic. New isel patterns and tests are
 added to support this.
-The SemaChecking for the immediate of vset_v/vget_v has been removed.
 Determining the valid range is harder now. I've added masking to
 ManualCodegen to ensure valid IR for invalid input.

This reduces the number of builtins from ~25000 to ~1100.

Reviewed By: HsiangKai

Differential Revision: https://reviews.llvm.org/D112102
2021-10-25 09:03:59 -07:00
Craig Topper 210b586a85 [RISCV] Add vcsr CSR name for V extension.
Reviewed By: frasercrmck, kito-cheng

Differential Revision: https://reviews.llvm.org/D112342
2021-10-25 08:56:25 -07:00
Danila Malyutin 7b102fcc91 [CodeGen] Fix dependence breaking for tied operands
Differential Revision: https://reviews.llvm.org/D107582
2021-10-25 18:52:27 +03:00
Jeremy Morse ee3eee71e4 [DebugInfo][InstrRef] Track values fused into stack spills
During register allocation, some instructions can have stack spills fused
into them. It means that when vregs are allocated on the stack we can
convert:

    SETCCr %0
    DBG_VALUE %0

to

    SETCCm %stack.0
    DBG_VALUE %stack.0

Unfortunately instruction referencing finds this harder: a store to the
stack doesn't have a specific operand number, therefore we don't substitute
the old operand for a new operand, and the location is dropped. This patch
implements a solution: just recognise the memory operand attached to an
instruction with a Special Number (TM), and record a substitution between
the old value and the new one.

This patch adds substitution code to InlineSpiller to record such fused
spills, and tracking in InstrRefBasedLDV to recognise such values, and
produce the value numbers for them. Everything to do with the movement of
stack-defined values is already handled in InstrRefBasedLDV.

Differential Revision: https://reviews.llvm.org/D111317
2021-10-25 15:14:53 +01:00
Danila Malyutin 2d9ee590b6 [AArch64] Handle ST1iN instructions in isAArch64FrameOffsetLegal
Before the code would crash with "unhandled opcode in
isAArch64FrameOffsetLegal" when there was a spill from extractelement.
Fixes pr52249

Differential Revision: https://reviews.llvm.org/D112311
2021-10-25 17:05:12 +03:00
Nikita Popov 0d20ebf686 [BasicAA] Use ranges for more than one index
D109746 made BasicAA use range information to determine the
minimum/maximum GEP offset. However, it was limited to the case of
a single variable index. This patch extends support to multiple
indices by adding all the ranges together.

Differential Revision: https://reviews.llvm.org/D112378
2021-10-25 15:30:50 +02:00
Alexey Bataev eb9b75dd4d [SLP]Change the order of the reduction/binops args pair vectorization attempts.
Need to change the order of the reduction/binops args pair vectorization
attempts. Need to try to find the reduction at first and postpone
vectorization of binops args. This may help to find more reduction
patterns and vectorize them.
Part of D111574.

Differential Revision: https://reviews.llvm.org/D112224
2021-10-25 06:27:14 -07:00
Jeremy Morse 2eb96e1711 [DebugInfo][NFC] Avoid a use-after-free
This patch swaps two lines -- the CurSucc reference can be invalidated
by the call to DFS.push_back, therefore that should happen last. The
usual hat-tip to asan for catching this.

This patch also swaps an ealier call to ToAdd.insert and DFS.push_back,
where a stable iterator (from successors()) is being used. This isn't
strictly necessary, but is good for consistency and avoiding readers
asking themselves why the two code portions have a different order.
2021-10-25 14:16:30 +01:00
Sanjay Patel 6e46b66e2a [DAGCombiner] make matching bit-hack form of usubsat more flexible
(i8 X ^ 128) & (i8 X s>> 7) --> usubsat X, 128

As suggested in D112085, we can substitute 'xor' with 'add'
in this pattern, and it is logically equivalent:
https://alive2.llvm.org/ce/z/eJtWWC

We canonicalize to 'xor' in IR, but SDAG does not do that
(and it probably should not - https://llvm.org/PR52267 ), so
it is possible to see either pattern in codegen. Note that
'sub' is a another potential pattern, but that is
canonicalized to 'add' in DAGCombiner, so we don't need to
worry about that variation.

Differential Revision: https://reviews.llvm.org/D112377
2021-10-25 09:01:52 -04:00
Tim Northover f9089accba CodeGenPrep: remove all copies of GEP from list if there are duplicates.
Unfortunately ToT has changed enough from the revision where this actually
caused problems that the test no longer triggers an assertion failure.
2021-10-25 14:00:02 +01:00
Kerry McLaughlin 1f49b71fe5 [SVE][CodeGen] Enable reciprocal estimates for scalable fdiv/fsqrt
This patch enables the use of reciprocal estimates for SVE
when both the -Ofast and -mrecip flags are used.

Reviewed By: david-arm, paulwalker-arm

Differential Revision: https://reviews.llvm.org/D111657
2021-10-25 11:30:44 +01:00
Max Kazantsev a9b0776a81 [SimplifyCFG] Sanity assert in iterativelySimplifyCFG
We observe a hang within iterativelySimplifyCFG due to infinite
loop execution. Currently, there is no limit to this loop, so
in case of bug it just works forever. This patch adds an assert
that will break it after 1000 iterations if it didn't converge.
2021-10-25 17:10:17 +07:00
Nikita Popov 75384ecdf8 [InstSimplify] Refactor invariant.group load folding
Currently strip.invariant/launder.invariant are handled by
constructing constant expressions with the intrinsics skipped.
This takes an alternative approach of accumulating the offset
using stripAndAccumulateConstantOffsets(), with a flag to look
through invariant.group intrinsics.

Differential Revision: https://reviews.llvm.org/D112382
2021-10-25 10:56:25 +02:00