Commit Graph

152407 Commits

Author SHA1 Message Date
Kazu Hirata cba40c4ede [llvm] Use MachineBasicBlock::{successors,predecessors} (NFC) 2021-11-09 07:11:14 -08:00
Sergei Larin a721ddbae9 Update MaxMinLatency even if dependencies have been already scheduled.
Covers an extremely rare corner case on internal book keeping.
2021-11-09 06:47:49 -08:00
Kerry McLaughlin 0d748b4d32 [LoopVectorize] Extract the last lane from a uniform store
Changes VPReplicateRecipe to extract the last lane from an unconditional,
uniform store instruction. collectLoopUniforms will also add stores to
the list of uniform instructions where Legal->isUniformMemOp is true.

setCostBasedWideningDecision now sets the widening decision for
all uniform memory ops to Scalarize, where previously GatherScatter
may have been chosen for scalable stores.

This fixes an assert ("Cannot yet scalarize uniform stores") in
setCostBasedWideningDecision when we have a loop containing a
uniform i1 store and a scalable VF, which we cannot create a scatter for.

Reviewed By: sdesmalen, david-arm, fhahn

Differential Revision: https://reviews.llvm.org/D112725
2021-11-09 14:43:16 +00:00
Sanjay Patel c36b7e21bd [InstCombine] enhance vector bitwise select matching
(Cond & C) | (~bitcast(Cond) & D) --> bitcast (select Cond, (bc C), (bc D))

This is part of fixing:
https://llvm.org/PR34047

That report shows a case where a bitcast is sitting between the select condition
candidate and its 'not' value due to current cast canonicalization rules.

There's a bitcast type restriction that might be violated in existing matching,
but I still need to investigate if that is possible -
Alive2 shows we can only do this transform safely when the bitcast is from
narrow to wide vector elements (otherwise poison could leak into elements
that were safe in the original code):
https://alive2.llvm.org/ce/z/Hf66qh

Differential Revision: https://reviews.llvm.org/D113035
2021-11-09 08:54:59 -05:00
Chris Jackson 116dc70cf3 [DebugInfo][LSR] Add more stringent checks on IV selection and salvage
attempts

Prevent the selection of IVs that have a SCEV containing an undef. Also
prevent salvaging attempts for values for which a SCEV could not be
created by ScalarEvolution and have only SCEVUknown.

Reviewed by: Orlando

Differential Revision: https://reviews.llvm.org/D111810
2021-11-09 13:09:37 +00:00
Florian Hahn 2ead34716a
[SimplifyCFG] Add early bailout if Use is not in same BB.
Without this patch, passingValueIsAlwaysUndefined will iterate over all
instructions from I to the end of the basic block, even if the use is
outside the block.

This patch adds an early bail out, if the use instruction is outside I's
BB. This can greatly reduce compile-time in cases where very large basic
blocks are involved, with a large number of PHI nodes and incoming
values.

Note that the refactoring makes the handling of the case where I is a
phi and Use is in PHI more explicit  as well: for phi nodes, we can also
directly bail out. In the existing code, we would iterate until we reach
the end and return false.

Based on an earlier patch by Matt Wala.

Reviewed By: lebedev.ri

Differential Revision: https://reviews.llvm.org/D113293
2021-11-09 12:57:03 +00:00
Andrew Savonichev b702276ad0 [AArch64] Add Machine InstCombiner patterns for FMUL indexed variant
This patch adds DUP+FMUL => FMUL_indexed pattern to InstCombiner.
FMUL_indexed is normally selected during instruction selection, but it
does not work in cases when VDUP and VMUL are in different basic
blocks.

Differential Revision: https://reviews.llvm.org/D99662
2021-11-09 15:30:19 +03:00
Simon Pilgrim 58c01ef270 [SelectionDAG] Merge FoldConstantVectorArithmetic into FoldConstantArithmetic (PR36544)
This patch merges FoldConstantVectorArithmetic back into FoldConstantArithmetic.

Like FoldConstantVectorArithmetic we now handle vector ops with any operand count, but we currently still only handle binops for scalar types - this can be improved in future patches - in particular some common unary/trinary ops still have poor constant folding.

There's one change in functionality causing test changes - FoldConstantVectorArithmetic bails early if the build/splat vector isn't all constant (with some undefs) elements, but FoldConstantArithmetic doesn't - it instead attempts to fold the scalar nodes and bails if they fail to regenerate a constant/undef result, allowing some additional identity/undef patterns to be handled.

Differential Revision: https://reviews.llvm.org/D113300
2021-11-09 11:31:01 +00:00
Alexey Lapshin c8ae08987d [llvm-dwarfdump] dump link to the immediate parent.
It is often useful to know which die is the parent of the current die.
This patch adds information about parent offset into the dump:

0x0000000b: DW_TAG_compile_unit
              DW_AT_producer    ("by_hand")

0x00000014:   DW_TAG_base_type (0x0000000b)  <<<<<<<<<<<<<<
                DW_AT_name      ("int")

Now it is easy to see which die is the parent of the current die.
This patch makes that behaviour to be default.
We can make it to be opt-in if neccessary.

This functionality differs from already existed "--show-parents"
in that sence that parent information is shown for all dies and
only link to the immediate parent is shown.

Differential Revision: https://reviews.llvm.org/D113406
2021-11-09 14:14:06 +03:00
Max Kazantsev cb728cb8a9 [NFC] Get rid of hardcoded magical constant and use Optionals instead
Refactor calculateIterationsToInvariance so that it doesn't need a magical
constant to signify unknown answer.
2021-11-09 18:13:19 +07:00
Roman Lebedev d484cc152b
[TTI] Adjust `getReplicationShuffleCost()` interface
It is trivial to produce DemandedSrcElts given DemandedReplicatedElts,
so don't pass the former. Also, it isn't really useful so far
to have the overload taking the Mask, so just inline it.
2021-11-09 14:07:59 +03:00
Simon Pilgrim 32a4a883f6 Revert rGe1eec7601b6988b35ae3cdc8d67cf3cf4e1361dd "[XCOFF][yaml2obj] support for the auxiliary file header."
This is failing on MSVC builds: https://lab.llvm.org/buildbot/#/builders/86/builds/23436
2021-11-09 11:02:13 +00:00
Dmitry Makogon 5ec2386332 Reapply db28934 "[IndVars] Pass TTI to replaceCongruentIVs"
This reapplies patch db289340c8.

The test failures on build with expensive checks caused by the patch happened due
to the fact that we sorted loop Phis in replaceCongruentIVs using llvm::sort,
which shuffles the given container if the expensive checks are enabled,
so equivalent Phis in the sorted vector had different mutual order from run
to run. replaceCongruentIVs tries to replace narrow Phis with truncations
of wide ones. In some test cases there were several Phis with the same
width, so if their order differs from run to run, the narrow Phis would
be replaced with a different Phi, depending on the shuffling result.

The patch ae14fae0ff fixed this issue by
replacing llvm::sort with llvm::stable_sort.
2021-11-09 17:42:29 +07:00
Florian Hahn acbefbf19f [VPlan] Guard code to dump instructions after d9361bfbe2.
This should fix build failures when built without assertions enabled,
e.g.
    https://lab.llvm.org/buildbot/#/builders/205/builds/172
2021-11-09 10:29:05 +00:00
Florian Hahn d9361bfbe2 [VPlan] Add initial inner-loop VPlan verification.
This patch adds a function to verify general properties of VPlans. The
first check makes sure that all phi-like recipes are at the beginning of
a block, with no other recipes in between.

Note that this currently may not hold for VPBlendRecipes at the moment,
as other recipes may be inserted before the VPBlendRecipe during mask
creation.

Note that this patch depends on D111300 and D111301, which fix code that
breaks the checked invariant.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D111302
2021-11-09 10:18:28 +00:00
Esme-Yi e1eec7601b [XCOFF][yaml2obj] support for the auxiliary file header.
Summary:
  This patch adds yaml2obj supporting for the auxiliary
  file header of XCOFF.

Reviewed By: DiggerLin, jhenderson

Differential Revision: https://reviews.llvm.org/D111487
2021-11-09 09:48:40 +00:00
Dmitry Makogon ae14fae0ff [SCEVExpander] Use stable_sort to sort loop Phis in SCEVExpander::replaceCongruentIVs
This is a fix for test failures on expensive checks build caused by db289340c8.

With LLVM_ENABLE_EXPENSIVE_CHECKS enabled the llvm::sort shuffles the given container.
However, the sort is only called when the TTI is passed to replaceCongruentIVs.
In the mentioned patch we pass it TTI, so the sort happens. But due to shuffling
equivalent Phis may appear in different order from run to run.
With the stable_sort instead of sort this is impossible - the order of sorted Phis
is preserved.
2021-11-09 16:29:57 +07:00
Jay Foad 5c3c7adf3a [CodeGen] Fix assertion failure in TwoAddressInstructionPass::rescheduleMIBelowKill
This fixes an assertion failure with -early-live-intervals when trying
to update the live intervals for a debug instruction, which don't even
have slot indexes.

Differential Revision: https://reviews.llvm.org/D113116
2021-11-09 09:24:21 +00:00
Shao-Ce SUN 1c81941f19 [NFC][RISCV] Fix wrong predicates of vfwredsum 2021-11-09 17:19:50 +08:00
Kazu Hirata c375cdc932 [Hexagon] Use MachineBasicBlock::{successors,predecessors} (NFC) 2021-11-09 00:26:06 -08:00
Carlos Galvez 7ecec3f0f5 [CUDA] Bump supported CUDA version to 11.5
Differential Revision: https://reviews.llvm.org/D113249
2021-11-09 08:20:53 +00:00
Akira Hatanaka 1fe8993ad8 [ObjC][ARC] Replace uses of ObjC intrinsics that are arguments of
operand bundle "clang.arc.attachedcall" with ObjC runtime functions

The existing code only handles the case where the intrinsic being
rewritten is used as the called function pointer of a call/invoke.
2021-11-08 21:19:07 -08:00
Liqiang Tao 6cad45d5c6 [llvm][Inline] Add a module level inliner
Add module level inliner, which is a minimum viable product at this point.
Also add some tests for it.

RFC: https://lists.llvm.org/pipermail/llvm-dev/2021-August/152297.html

Reviewed By: kazu

Differential Revision: https://reviews.llvm.org/D106448
2021-11-09 11:03:29 +08:00
Akira Hatanaka 8f8d9f743d [ObjC][ARC] Handle operand bundle "clang.arc.attachedcall" on targets
that don't use the inline asm marker

This patch makes the changes to the ARC middle-end passes that are
needed to handle operand bundle "clang.arc.attachedcall" on targets that
don't use the inline asm marker for the retainRV/autoreleaseRV
handshake (e.g., x86-64).

Note that anyone who wants to use the operand bundle on their target has
to teach their backend to handle the operand bundle. The x86-64 backend
already knows about the operand bundle (see
https://reviews.llvm.org/D94597).

Differential Revision: https://reviews.llvm.org/D111334
2021-11-08 18:38:39 -08:00
River Riddle 7480efd6f0 [Tablegen] Collect all global state into one managed static
Tablegen uses copious amounts of global state for uniquing various records.
This was fine under the original vision where tablegen was a tool, and not a
library, but there are various usages of tablegen that want to use it as a library.
One concrete example is that downstream we have a kythe indexer for tablegen
constructs that allows for IDEs to serve go-to-definition/references/and more.
We currently (kind of hackily) keep the tablegen parts in a shared library that
gets loaded/unloaded.

This revision starts to remedy this by globbing all of the static state into a
managed static so that they can at least be unloaded with llvm_shutdown.
A better solution would be to feed in a context variable (much like how
the IR in LLVM/MLIR do), but that is a more invasive change that can come later.

Differential Revision: https://reviews.llvm.org/D108934
2021-11-09 01:24:54 +00:00
Wouter van Oortmerssen 62eeb3e57e [WebAssembly] fix __stack_pointer being added to .debug_aranges
When emitting a reloc for the Wasm global __stack_pointer, it was inadvertedly added to the symbols used for generating aranges, which caused some aranges to use it as the end symbol in a symbol diff, which caused a reloc for it to be emitted, which then caused an assert in `wasm64` since we have no 64-bit relocs for Wasm globals.
Fixes: https://bugs.llvm.org/show_bug.cgi?id=52376
Differential Revision: https://reviews.llvm.org/D113438
2021-11-08 16:30:31 -08:00
Wouter van Oortmerssen 4a0c89a6cf [WebAssembly] Fix fixBrTableIndex removing instruction without checking uses
Fixes: https://bugs.llvm.org/show_bug.cgi?id=52352
Differential Revision: https://reviews.llvm.org/D113230
2021-11-08 15:53:44 -08:00
Craig Topper 376233113e [RISCV] Use TargetConstant for CSR number for READ_CSR/WRITE_CSR.
This is consistent with what we do for other operands that are required
to be constants.

I don't think this results in any real changes. The pattern match
code for isel treats ConstantSDNode and TargetConstantSDNode the same.
2021-11-08 15:10:24 -08:00
Arthur Eubanks 28a06a1b87 [NFC][FuncAttrs] Keep track of modified functions
This is in preparation for only invalidating analyses on changed
functions.

Reviewed By: asbirlea

Differential Revision: https://reviews.llvm.org/D113303
2021-11-08 15:04:56 -08:00
Akira Hatanaka f2c7c3c7c7 [ObjC][ARC] Invalidate an entry of UnderlyingObjCPtrCache when the
instruction the key points to is deleted

Use weak value handles for both the key and the value. The entry is
invalid if either value handle is null.

This fixes an assertion failure in BasicAAResult::alias that is caused
by UnderlyingObjCPtrCache returning a wrong value.

I don't have a test case for this patch that fails reliably.

rdar://83984790
2021-11-08 14:41:06 -08:00
Ard Biesheuvel 2caf85ad7a [ARM] implement LOAD_STACK_GUARD for remaining targets
Currently, LOAD_STACK_GUARD on ARM is only implemented for Mach-O targets, and
other targets rely on the generic support which may result in spilling of the
stack canary value or address, or may cause it to be kept in a callee save
register across function calls, which means they essentially get spilled as
well, only by the callee when it wants to free up this register.

So let's implement LOAD_STACK GUARD for other targets as well. This ensures
that the load of the stack canary is rematerialized fully in the epilogue.

This code was split off from

  D112768: [ARM] implement support for TLS register based stack protector

for which it is a prerequisite.

Reviewed By: nickdesaulniers

Differential Revision: https://reviews.llvm.org/D112811
2021-11-08 22:59:15 +01:00
Michael Liao bf225939bc [InferAddressSpaces] Support assumed addrspaces from addrspace predicates.
- CUDA cannot associate memory space with pointer types. Even though Clang could add extra attributes to specify the address space explicitly on a pointer type, it breaks the portability between Clang and NVCC.
- This change proposes to assume the address space from a pointer from the assumption built upon target-specific address space predicates, such as `__isGlobal` from CUDA. E.g.,

```
  foo(float *p) {
    __builtin_assume(__isGlobal(p));
    // From there, we could assume p is a global pointer instead of a
    // generic one.
  }
```

This makes the code portable without introducing the implementation-specific features.

Note that NVCC starts to support __builtin_assume from version 11.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D112041
2021-11-08 16:51:57 -05:00
Martin Storsjö 46ec93a457 [Support] [VirtualFileSystem] Detect the windows_slash path style
This fixes the following clang VFS tests, if `windows_slash` is the
default style:

  Clang :: VFS/implicit-include.c
  Clang :: VFS/relative-path.c
  Clang-Unit :: Frontend/./FrontendTests.exe/CompilerInstance.DefaultVFSOverlayFromInvocation

Also clarify a couple references to `Style::windows` into
`Style::windows_backslash`, to make it clearer that each of them are
opinionated in different directions (even if it doesn't matter for
calls to e.g. `is_absolute`).

Differential Revision: https://reviews.llvm.org/D113272
2021-11-08 22:21:29 +02:00
Nikita Popov 1376301c87 [InstCombine] Canonicalize range test idiom
InstCombine converts range tests of the form (X > C1 && X < C2) or
(X < C1 || X > C2) into checks of the form (X + C3 < C4) or
(X + C3 > C4). It is possible to express all range tests in either
of these forms (with different choices of constants), but currently
neither of them is considered canonical. We may have equivalent
range tests using either ult or ugt.

This proposes to canonicalize all range tests to use ult. An
alternative would be to canonicalize to either ult or ugt depending
on the specific constants involved -- e.g. in practice we currently
generate ult for && style ranges and ugt for || style ranges when
going through the insertRangeTest() helper. In fact, the "clamp like"
fold was relying on this, which is why I had to tweak it to not
assume whether inversion is needed based on just the predicate.

Proof: https://alive2.llvm.org/ce/z/_SP_rQ

Differential Revision: https://reviews.llvm.org/D113366
2021-11-08 21:15:46 +01:00
Adrian Prantl 8bd8dd16e2 Extend obj2yaml to optionally preserve raw __LINKEDIT/__DATA segments.
I am planning to upstream MachOObjectFile code to support Darwin
chained fixups. In order to test the new parser features we need a way
to produce correct (and incorrect) chained fixups. Right now the only
tool that can produce them is the Darwin linker. To avoid having to
check in binary files, this patch allows obj2yaml to print a hexdump
of the raw LINKEDIT and DATA segment, which both allows to
bootstrap the parser and enables us to easily create malformed inputs
to test error handling in the parser.

This patch adds two new options to obj2yaml:

  -raw-data-segment
  -raw-linkedit-segment

Differential Revision: https://reviews.llvm.org/D113234
2021-11-08 11:30:12 -08:00
Florian Hahn e3bfb6a146
[VPlan] Make sure recurrence splice is not inserted between phis.
All phi-like recipes should be at the beginning of a VPBasicBlock with
no other recipes in between. Ensure that the recurrence-splicing recipe
is not added between phi-like recipes, but after them.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D111301
2021-11-08 17:42:32 +00:00
Craig Topper 304edbb553 [RISCV] SMUL_LOHI/UMUL_LOHI should expand for RVV.
These and MULHS/MULHU both default to Legal. Targets need to set
the ones they don't support to Expand.

I think MULHS/MULHU likely has priority in most places so this
change probably isn't directly testable. I found it while looking
at disabling MULHS/MULHU for nxvXi64 as required for Zve64x.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D113325
2021-11-08 09:38:36 -08:00
Kazu Hirata 3c06920cd1 [llvm] Use make_early_inc_range (NFC) 2021-11-08 09:09:39 -08:00
Sander de Smalen 2829376bb2 [LV] Use VScaleForTuning to fine-tune the cost per lane.
When targeting a specific CPU with scalable vectorization, the knowledge
of that particular CPU's vscale value can be used to tune the cost-model
and make the cost per lane less pessimistic.

If the target implements 'TTI.getVScaleForTuning()', the cost-per-lane
is calculated as:

  Cost / (VScaleForTuning * VF.KnownMinLanes)

Otherwise, it assumes a value of 1 meaning that the behavior
is unchanged and calculated as:

  Cost / VF.KnownMinLanes

Reviewed By: kmclaughlin, david-arm

Differential Revision: https://reviews.llvm.org/D113209
2021-11-08 16:59:46 +00:00
Joe Nash 79f52af4cd [AMDGPU] Make getInstSizeInBytes more generic
NFC. This check mainly handles size affecting literals. Make it check all
explicit operands instead of a few by name. Also make the isLiteral
check handle the KIMM operands, see https://reviews.llvm.org/D111067

Change-Id: I1a362d55b2a10f5c74d445272e8b7829a8b77597

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D113318

Change-Id: Ie6c688f30a71e0335d1c6dd1ff65019bd7ce684e
2021-11-08 10:34:49 -05:00
Anton Afanasyev ce4fa93db8 [SCCP] Tune cast instruction handling for overdefined operand
Extended value is known to be inside range smaller than full one.
Prevent SCCP to mark such value as overdefined.

Fixes PR52253

Differential Revision: https://reviews.llvm.org/D112721
2021-11-08 18:34:30 +03:00
Simon Pilgrim 7f32edea23 [X86] combineMulToPMADDWD - use ComputeMinSignedBits(). NFCI.
Use ComputeMinSignedBits() to ensure the mul source operands at least sign-extend down from the bottom 16 bits.

This will make it easier if/when we try to support handling of source types larger than 32-bits.
2021-11-08 15:28:31 +00:00
David Sherwood c63b0f471b [NFC][LoopVectorize] Make the createStepForVF interface more caller-friendly
The common use case for calling createStepForVF is currently something
like:

  Value *Step = createStepForVF(Builder, ConstantInt::get(Ty, UF), VF);

and it makes more sense to reduce overall lines of code and change the
function to let it create the constant instead. With my patch this
becomes:

  Value *Step = createStepForVF(Builder, Ty, VF, UF);

and the ConstantInt is created instead createStepForVF. A side-effect of
this is that the code in createStepForVF is also becomes simpler.

As part of this patch I've also replaced some calls to getRuntimeVF
with calls to createStepForVF, i.e.

  getRuntimeVF(Builder, Count->getType(), VFactor * UFactor) ->
  createStepForVF(Builder, Count->getType(), VFactor, UFactor)

because this feels semantically better.

Differential Revision: https://reviews.llvm.org/D113122
2021-11-08 15:14:14 +00:00
Mindong Chen 495e258fd7 [AArch64][SVE] Add FP types to the supported SVE structure load/stores vector type list
This adds FP type support to the SVE Container type list as a supplement to D112303.

Reviewed By: peterwaller-arm, paulwalker-arm

Differential Revision: https://reviews.llvm.org/D113333
2021-11-08 22:29:08 +08:00
Simon Pilgrim f059b04f7b [DAG] Add SelectionDAG::ComputeMinSignedBits helper
As suggested on D113371, this adds a wrapper to SelectionDAG::ComputeNumSignBits, similar to the llvm::ComputeMinSignedBits wrapper.

I've included some usage, its not exhaustive, just the more obvious cases where the intention is obvious.

Differential Revision: https://reviews.llvm.org/D113396
2021-11-08 14:12:45 +00:00
David Sherwood 8d38c24fb6 [SVE][CodeGen] Improve codegen for some FP insert_subvector cases
When inserting an unpacked FP subvector into a packed vector we
can simply cast the unpacked value into a packed value, since
both types are legal for SVE. We can then use this as the input
for the UZP instruction. This avoids us expanding the operation
by going through the stack.

Differential Revision: https://reviews.llvm.org/D113270
2021-11-08 13:45:55 +00:00
Anastasia Stulova a10a69fe9c [SPIR-V] Add SPIR-V triple and clang target info.
Add new triple and target info for ‘spirv32’ and ‘spirv64’ and,
thus, enabling clang (LLVM IR) code emission to SPIR-V target.

The target for SPIR-V is mostly reused from SPIR by derivation
from a common base class since IR output for SPIR-V is mostly
the same as SPIR. Some refactoring are made accordingly.

Added and updated tests for parts that are different between
SPIR and SPIR-V.

Patch by linjamaki (Henry Linjamäki)!

Differential Revision: https://reviews.llvm.org/D109144
2021-11-08 13:34:10 +00:00
Dmitry Makogon 8d4eba6c0d Revert "[IndVars] Pass TTI to replaceCongruentIVs"
This reverts commit db289340c8.

The patch caused 2 crashes with expensive checks enabled.
2021-11-08 19:35:14 +07:00
Matt 4a59694ba1 [AArch64][SVE] Combine FADD and FMUL aarch64 intrinsics to FMLA
This is a refinement to the work in
https://reviews.llvm.org/D111638

Fold (fadd p a (fmul p b c)) into (fma p a b c)

Differential Revision: https://reviews.llvm.org/D113095
2021-11-08 12:22:38 +00:00
Dmitry Makogon db289340c8 [IndVars] Pass TTI to replaceCongruentIVs
In IndVarSimplify after simplifying and extending loop IVs we call 'replaceCongruentIVs'.
This function optionally takes a TTI argument to be able to replace narrow IVs uses
with truncates of the widest one.
For some reason the TTI wasn't passed to the function, so it couldn't perform such
transform.
This patch fixes it.

Reviewed By: mkazantsev

Differential Revision: https://reviews.llvm.org/D113024
2021-11-08 19:20:53 +07:00
Simon Pilgrim f60d3ec0c7 [DAG] Add BuildVectorSDNode::getConstantRawBits helper
We have several places where we need to extract the raw bits data from a BUILD_VECTOR node, so consolidate this to a single helper function that handles Undefs and Integer/FP constants, including implicit truncation.

This should make it easier to extend D113202 to handle more constant folding of bitcasted constant data.

Differential Revision: https://reviews.llvm.org/D113351
2021-11-08 12:07:38 +00:00
Simon Moll c2b91eef27 [VE] default to integrated asm in AsmInfo
VE integrated asm has been the default in Clang. Also use the default setting for integrated asm in the backend.

Reviewed By: kaz7

Differential Revision: https://reviews.llvm.org/D113384
2021-11-08 11:58:29 +01:00
David Green a982940eb5 [AArch64] Combine fptoi.sat(fmul) to fixed point cvtf
We already have patterns for fptosi and fptoui plus fmul to fixed point
convert, this adds equivalent patterns for fptosi.sat and fptoui.sat,
which should apply equally well for the legal saturating variants.

Differential Revision: https://reviews.llvm.org/D113199
2021-11-08 10:07:34 +00:00
David Sherwood c42bb30b9e [LoopVectorize] Permit fixed-width epilogue loops for scalable vector bodies
At the moment in LoopVectorizationCostModel::selectEpilogueVectorizationFactor
we bail out if the main vector loop uses a scalable VF. This patch adds
support for generating epilogue vector loops using a fixed-width VF when the
main vector loop uses a scalable VF.

I've changed LoopVectorizationCostModel::selectEpilogueVectorizationFactor
so that we convert the scalable VF into a fixed-width VF and do profitability
checks on that instead. In addition, since the scalable and fixed-width VFs
live in different VPlans that means I had to change the calls to
LVP.hasPlanWithVFs so that we only pass in the fixed-width VF.

New tests added here:

  Transforms/LoopVectorize/AArch64/sve-epilog-vect.ll

Differential Revision: https://reviews.llvm.org/D109432
2021-11-08 09:41:13 +00:00
Qiu Chaofan 9b5e2b5261 [PowerPC] Implement basic macro fusion in Power10
Including basic fusion types around arithmetic and logical instructions.

Reviewed By: jsji

Differential Revision: https://reviews.llvm.org/D111693
2021-11-08 17:23:56 +08:00
Andrew Wei bf3784b882 [AArch64] Canonicalize X*(Y+1) or X*(1-Y) to madd/msub
Performing the rearrangement for add/sub and mul instructions to match the madd/msub pattern

Reviewed By: dmgreen, sdesmalen, david-arm

Differential Revision: https://reviews.llvm.org/D111862
2021-11-08 16:49:31 +08:00
skc7 a0633f5ccb [AMDGPU] Test Commit. NFC
Reviewed By: hsmhsm

Differential Revision: https://reviews.llvm.org/D113379
2021-11-08 07:09:09 +00:00
Ben Shi e32cf690df [RISCV] Optimize (add (mul r, c0), c1)
Optimize (add (mul x, c0), c1) ->
         (add (mul (add x, c1/c0+1), c0), c1%c0-c0),
if c1/c0+1 and c1%c0-c0 are simm12, while c1 is not.

Optimize (add (mul x, c0), c1) ->
         (add (mul (add x, c1/c0-1), c0), c1%c0+c0),
if c1/c0-1 and c1%c0+c0 are simm12, while c1 is not.

Reviewed By: craig.topper, asb

Differential Revision: https://reviews.llvm.org/D111141
2021-11-08 02:58:25 +00:00
Chen Zheng 7c6f5950f0 [PowerPC] comment for different input register classes; nfc
Add comments to explain why XXPERMDIs and XXPERMDI have different input register
classes, vsfrc for XXPERMDIs and vsrc for XXPERMDI.

This addresses the comments in abandoned patch D113178, we keep using `f0` instead
of using `vs0` for XXPERMDIs on purpose.
2021-11-08 02:21:30 +00:00
Zi Xuan Wu 4fb282fec5 [CSKY] Add CSKY 16-bit instruction format and encoding
CSKY is a ARCH which supports mixture of 16-bit and 32-bit instructions natively,
and there is not an indivual predictor or feature to enable/disable 16-bit instruction.
So I think it's better to add 16-bit instruction early, and naturally to use 16-bit and 32-bit instructions.

Differential Revision: https://reviews.llvm.org/D112919
2021-11-08 10:02:15 +08:00
Chen Zheng 50acbbe3cd [AsmPrinter][ORE] use correct opcode name
Reviewed By: jsji

Differential Revision: https://reviews.llvm.org/D113173
2021-11-08 01:51:24 +00:00
Kazu Hirata 0d182d9d1e [Transforms] Use make_early_inc_range (NFC) 2021-11-07 17:03:15 -08:00
Simon Pilgrim 55e4cd8485 [X86][AVX2] Recognise 256-bit truncation shuffles and mask 256-bit source
For v8i16 shuffle patterns that are lowered with AND+PACKUS, check to see if the sources are from a 256-bit vector and perform the masking using BLENDW at the 256-bit level.

With the test changes we can see more examples of duplicate XMM/YMM zero vectors (PR26018) :(
2021-11-07 21:24:55 +00:00
Nikita Popov 2060895c9c [ConstantRange] Add exact union/intersect (NFC)
For some optimizations on comparisons it's necessary that the
union/intersect is exact and not a superset. Add methods that
return Optional<ConstantRange> only if the result is exact.

For the sake of simplicity this is implemented by comparing
the subset and superset approximations for now, but it should be
possible to do this more directly, as unionWith() and intersectWith()
already distinguish the cases where the result is imprecise for the
preferred range type functionality.
2021-11-07 21:46:06 +01:00
Nikita Popov cf71a5ea8f [ConstantRange] Support zero size in isSizeLargerThan()
From an API perspective, it does not make a lot of sense that 0
is not a valid argument to this function. Add the exact check needed
to support it.
2021-11-07 21:22:45 +01:00
Nikita Popov a8c318b50e [BasicAA] Use index size instead of pointer size
When accumulating the GEP offset in BasicAA, we should use the
pointer index size rather than the pointer size.

Differential Revision: https://reviews.llvm.org/D112370
2021-11-07 18:56:11 +01:00
Kazu Hirata aee86f9b6c [AMDGPU] Remove unused declaration selectSMRD (NFC)
The function body proper was removed on Feb 20, 2019 in commit
79b5c3842b.
2021-11-07 09:53:18 -08:00
Kazu Hirata 41ef3187e0 [ARM, X86] Use MachineBasicBlock::{predecessors,successors} (NFC) 2021-11-07 09:53:16 -08:00
Simon Pilgrim f057756a1a [SLP] Fix Wdocumentation warning - remove \returns from void function. NFC. 2021-11-07 15:08:39 +00:00
Simon Pilgrim d391e4fe84 [X86] Update RET/LRET instruction to use the same naming convention as IRET (PR36876). NFC
Be more consistent in the naming convention for the various RET instructions to specify in terms of bitwidth.

Helps prevent future scheduler model mismatches like those that were only addressed in D44687.

Differential Revision: https://reviews.llvm.org/D113302
2021-11-07 15:06:54 +00:00
Benjamin Kramer 9b8b16457c Put implementation details into anonymous namespaces. NFCI. 2021-11-07 15:18:30 +01:00
Simon Pilgrim b5ef56f0bc [X86][AVX] Add missing X86ISD::VBROADCAST(v4f32 -> v8f32) isel pattern for AVX1 targets
D109434 addressed the v2f64 -> v4f64 case, an internal test has found an equivalent crash for the v4f32 -> v8f32 case.
2021-11-07 12:59:35 +00:00
Simon Pilgrim 0ff1edeeec [DAG] SimplifyVBinOp - replace FoldConstantVectorArithmetic with FoldConstantArithmetic
Currently FoldConstantArithmetic only handles binops, so replacing other uses of FoldConstantVectorArithmetic (in particular for SETCC nodes), still require more work.
2021-11-07 12:11:46 +00:00
Kazu Hirata 22e21da47d [WebAssembly] Remove unused declaration SelectExternRefAddr (NFC) 2021-11-06 19:31:22 -07:00
Kazu Hirata e4bab21848 [AMDGPU] Use MachineBasicBlock::{predecessors,successors} (NFC) 2021-11-06 19:31:20 -07:00
Kazu Hirata 843d1eda18 [llvm] Use llvm::reverse (NFC) 2021-11-06 19:31:18 -07:00
Fangrui Song d9e2c8f54d [yaml2obj][COFF] Make some PEHeader fields optional
This makes it easy to write tests where the irrelevant fields are not needed.
2021-11-06 16:39:59 -07:00
Nikita Popov 9f0194be45 [ConstantRange] Add getEquivalentICmp() variant with offset (NFCI)
Add a variant of getEquivalentICmp() that produces an optional
offset. This allows us to create an equivalent icmp for all ranges.

Use this in the with.overflow folding code, which was doing this
adjustment separately -- this clarifies that the fold will indeed
always apply.
2021-11-06 21:59:45 +01:00
Kazu Hirata cefc01fa65 [X86] Simplify a call to MachineBasicBlock::erase (NFC) 2021-11-06 13:08:25 -07:00
Kazu Hirata 815e8b5a20 [Hexagon] Remove an extraneous variable (NFC) 2021-11-06 13:08:23 -07:00
Kazu Hirata 14d656b3d8 [Target] Use llvm::reverse (NFC) 2021-11-06 13:08:21 -07:00
Nikita Popov e3cec17b2d [InstSimplify] Remove incorrect icmp of gep fold (PR52429)
As described in https://bugs.llvm.org/show_bug.cgi?id=52429 this
fold is incorrect, because inbounds only guarantees that the
pointers don't wrap in the unsigned space: It is possible that
the sign boundary is crossed by an object.

I'm dropping the fold entirely rather than adjusting it, because
computePointerICmp() fully subsumes it (just with correct predicate
handling).

Differential Revision: https://reviews.llvm.org/D113343
2021-11-06 21:03:21 +01:00
Nikita Popov f8627877a9 [SCEV] Make eraseValueFromMap() private (NFC)
The public API for this functionality is forgetValue(). There was
only one call from LoopVectorize, which was directly next to a
forgetValue() call and as such redundant.
2021-11-06 17:14:02 +01:00
Roman Lebedev a30ec4778a
[TTI][CostModel] `getUserCost()`: recognize replication shuffles and query their cost
This finally creates proper test coverage for replication shuffles,
that are used by LV for conditional loads, and will allow to add
proper costmodel at least for AVX512.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D113324
2021-11-06 16:45:15 +03:00
Roman Lebedev f8efc5c0ac
[NFC][TTI] Add/extract `getReplicationShuffleCost()` method, deduplicate it's implementations
Hiding it in `getInterleavedMemoryOpCost()` is problematic for a number of reasons,
including testability and reuse, let's do better.

In a followup `getUserCost()` will be taught to use to to estimate the mask costs,
which will allow for better cost model tests for it.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D113313
2021-11-06 16:45:15 +03:00
Sanjay Patel 39c4c7d391 [DAGCombiner] remove vselect fold that was accidentally added
This diff snuck into the unrelated:
025a2f73a3

It's a suggested follow-up for D113212, but I need to add test
coverage first.
2021-11-06 09:34:30 -04:00
Sanjay Patel 83c2fb9f66 [InstCombine] match usub.sat from umax intrinsic
umax(X, Op1) - Op1 --> usub.sat(X, Op1)

https://alive2.llvm.org/ce/z/HpcGiJ

This happens in 2 or more steps with an icmp-select idiom
instead of an intrinsic. This is another step towards
canonicalization of the min/max intrinsics. See:
D98152
2021-11-06 08:32:52 -04:00
Sanjay Patel 025a2f73a3 [InstCombine] add tests for umax with sub; NFC 2021-11-06 08:32:52 -04:00
David Blaikie 0a5c26f2ef DebugInfo: Simplified Template Names: drop unneeded space in arrays
Matching a recent clang change I've made, now 'int[3]' is formatted
without the space between the type and array bound. This commit updates
libDebugInfoDWARF/llvm-dwarfdump to match that formatting.
2021-11-05 22:50:57 -07:00
Bin Cheng 54d891a7d5 [RISCV]: Fix typo by abstracting VWholeLoad* classes
This patch abstracts VWholeLoad* classes into VWholeLoadN, simplifies
existing code as well as fixes a typo.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D109319
2021-11-06 10:48:03 +08:00
Bin Cheng d488f1fff2 [RISCV][NFC]: Refactor classes for load/store instructions of RVV
This patch refactors classes for load/store of V extension by:
- Introduce new class for VUnitStrideLoadFF and VUnitStrideSegmentLoadFF
  so that uses of L/SUMOP* are not spread around different places.
- Reorder classes for Unit-Stride load/store in line with table
  describing lumop/sumop in riscv-v-spec.pdf.

Reviewed By: HsiangKai, craig.topper

Differential Revision: https://reviews.llvm.org/D109318
2021-11-06 10:48:03 +08:00
Kazu Hirata 87e53a0ad8 [llvm] Use make_early_inc_range (NFC) 2021-11-05 19:39:07 -07:00
David Blaikie f57d0e2726 DWARF Simplified Template Names: Narrow down the handling for operator overloads
Actually we can, for now, remove the explicit "operator" handling
entirely - since clang currently won't try to flag any of these as
rebuildable. That seems like a reasonable state for now, but it could be
narrowed down to only apply to conversion operators, most likely - but
would need more nuance for op> and op>> since they would be incorrectly
flagged as already having their template arguments (due to the trailing
'>').
2021-11-05 15:41:56 -07:00
Philip Reames d24a0e8857 [SCEV] Use constant range of RHS to prove NUW on narrow IV in trip count logic
The basic idea here is that given a zero extended narrow IV, we can prove the inner IV to be NUW if we can prove there's a value the inner IV must take before overflow which must exit the loop.

Differential Revision: https://reviews.llvm.org/D109457
2021-11-05 15:36:47 -07:00
Scott Linder f82bdf0fcc [NFC][Verifier] Remove redundant Module parameters
These `M` parameters shadow the `M` member in `VerifierSupport`, and
both always refer to the same module. Eliminate the redundant parameters
and always use the member.

Reviewed By: dexonsmith

Differential Revision: https://reviews.llvm.org/D106474
2021-11-05 21:30:02 +00:00
Jay Foad bdaa181007 [TwoAddressInstructionPass] Update existing physreg live intervals
In TwoAddressInstructionPass::processTiedPairs with
-early-live-intervals, update any preexisting physreg live intervals,
as well as virtreg live intervals. By default (without
-precompute-phys-liveness) physreg live intervals only exist for
registers that are live-in to some basic block.

Differential Revision: https://reviews.llvm.org/D113191
2021-11-05 21:20:30 +00:00
Roman Lebedev a5cd27880a
[IR] Improve member `ShuffleVectorInst::isReplicationMask()`
When we have an actual shuffle, we can impose the additional restriction
that the mask replicates the elements of the first operand, so we know
the replication factor as a ratio of output and op0 vector sizes.
2021-11-06 00:09:27 +03:00
Martin Storsjö 86c01b1bc6 [DebugInfo] [PDB] Force injected source paths to use backslashes
This fixes lld/COFF/pdb-natvis.test (which only is run on Windows)
when using paths with forward slashes on Windows.

Differential Revision: https://reviews.llvm.org/D113265
2021-11-05 21:50:42 +02:00
Sanjay Patel 7e30404c3b [DAGCombiner] add fold for vselect based on mask of signbit, part 2
This is the 'or' sibling for the fold added with:
D113212

https://alive2.llvm.org/ce/z/tgnp7K

Note that neither of these transforms is poison-safe,
but it does not seem to matter at this level. We have
had the scalar version of D113212 for a long time, so
this is just making optimizer behavior consistent.

We do not have the scalar version of *this* fold,
however, so that is another follow-up.
2021-11-05 15:02:12 -04:00
Yonghong Song 3466e00716 Reland "[Attr] support btf_type_tag attribute"
This is to revert commit f95bd18b5f (Revert "[Attr] support
btf_type_tag attribute") plus a bug fix.

Previous change failed to handle cases like below:
    $ cat reduced.c
    void a(*);
    void a() {}
    $ clang -c reduced.c -O2 -g

In such cases, during clang IR generation, for function a(),
CGCodeGen has numParams = 1 for FunctionType. But for
FunctionTypeLoc we have FuncTypeLoc.NumParams = 0. By using
FunctionType.numParams as the bound to access FuncTypeLoc
params, a random crash is triggered. The bug fix is to
check against FuncTypeLoc.NumParams before accessing
FuncTypeLoc.getParam(Idx).

Differential Revision: https://reviews.llvm.org/D111199
2021-11-05 11:25:17 -07:00
Florian Hahn f64580f8d2
[AArch64][GISel] Optimize 8 and 16 bit variants of uaddo.
Try simplify G_UADDO with 8 or 16 bit operands to wide G_ADD and TBNZ if
result is only used in the no-overflow case. It is restricted to cases
where we know that the high-bits of the operands are 0. If there's an
overflow, then the the 9th or 17th bit must be set, which can be checked
using TBNZ.

Reviewed By: paquette

Differential Revision: https://reviews.llvm.org/D111888
2021-11-05 19:11:15 +01:00
Shao-Ce SUN 5c3d7184b4 [RISCV] Support Zfhmin extension
According to RISC-V Unprivileged ISA 15.6.

Reviewed By: asb

Differential Revision: https://reviews.llvm.org/D111866
2021-11-06 01:41:02 +08:00
Kazu Hirata 2c4ba3e9d3 [Target] Use make_early_inc_range (NFC) 2021-11-05 09:14:32 -07:00
Whitney Tsang 93421108d2 Add NoOpLoopNestPass and LOOPNEST_PASS macro
Having a NoOpLoopNestPass can ensure that only outermost loop is invoked
for a LoopNestPass with a lit test.

There are some existing passes that are implemented as LoopNestPass, but
they are still using LOOP_PASS macro.
It would be easier to identify LoopNestPasses with a LOOPNEST_PASS
macro.

Differential Revision: https://reviews.llvm.org/D113185
2021-11-05 16:11:48 +00:00
Michael Liao af2ae2cf42 [BranchRelaxation] Fix warning on unused variable. NFC. 2021-11-05 11:18:27 -04:00
David Green 08056e1888 [InstCombine] Generalize sadd.sat combine to compute sign bits.
There is a combine in instcombine to transform a saturated add/sub into
a saddsat/ssubsat, currently handling inputs which are both sign
extended (https://alive2.llvm.org/ce/z/68qpTn). This can generalize to,
for example ashr of at least the bitwidth (https://alive2.llvm.org/ce/z/4TFyX-
and https://alive2.llvm.org/ce/z/qDWzFs for example). Which means it
generalizes further to "the number of sign bits", needing to be enough
to truncate to the size of the saturate. (An example using `or` for
instance: https://alive2.llvm.org/ce/z/EI_h_A).

So this patch makes use of ComputeNumSignBits (with the newly added
ComputeMinSignedBits) in matchSAddSubSat to generalize the fold to any
inputs with enough sign bits known, truncating the inputs to the new
size of the saturate.

Differential Revision: https://reviews.llvm.org/D112298
2021-11-05 15:05:09 +00:00
David Green 61225c0818 [ValueTracking][InstCombine] Introduce and use ComputeMinSignedBits
This introduces a new ComputeMinSignedBits method for ValueTracking that
returns the BitWidth - SignBits + 1 from ComputeSignBits, and represents
the minimum bit size for the value as a signed integer.  Similar to the
existing APInt::getMinSignedBits method, this can make some of the
reasoning around ComputeSignBits more natural.

See https://reviews.llvm.org/D112298
2021-11-05 14:41:37 +00:00
Simon Pilgrim 9e6506299a [DAG] FoldConstantVectorArithmetic - remove SDNodeFlags argument
Another minor step towards merging FoldConstantVectorArithmetic into FoldConstantArithmetic.

We don't use SDNodeFlags in any constant folding inside DAG, so passing the Flags argument is a waste of time - an alternative would be to wire up FoldConstantArithmetic to take SDNodeFlags just-in-case we someday start using it, but we don't have any way to test it and I'd prefer to avoid dead code.

Differential Revision: https://reviews.llvm.org/D113276
2021-11-05 14:36:17 +00:00
Roman Lebedev ad617183bb
[X86] `X86TTIImpl::getInterleavedMemoryOpCostAVX512()`: mask is i8 not i1
Even though AVX512's masked mem ops (unlike AVX1/2) have a mask
that is a `VF x i1`, replication of said masks happens after
promotion of it to `VF x i8`, so we should use `i8`, not `i1`,
when calculating the cost of mask replication.
2021-11-05 17:27:02 +03:00
Sanjay Patel 4fc1fc4005 [DAGCombiner] add fold for vselect based on mask of signbit
(X s< 0) ? Y : 0 --> (X s>> BW-1) & Y

We canonicalize to the icmp+select form in IR, and we already have this fold
for scalar select in SDAG, so I think it's an oversight that we don't have
the fold for vectors. It seems neutral for AArch64 and saves some instructions
on x86.

Whether we should also have the sibling folds for the inverse condition or
all-ones true value may depend on target-specific factors such as whether
there's an "and-not" instruction.

Differential Revision: https://reviews.llvm.org/D113212
2021-11-05 10:06:16 -04:00
Roman Lebedev 01d8759ac9
[IR][ShuffleVector] Introduce `isReplicationMask()` matcher
Avid readers of this saga may recall from previous installments,
that replication mask replicates (lol) each of the `VF` elements
in a vector `ReplicationFactor` times. For example, the mask for
`ReplicationFactor=3` and `VF=4` is: `<0,0,0,1,1,1,2,2,2,3,3,3>`.
More importantly, replication mask is used by LoopVectorizer
when using masked interleaved memory operations.

As discussed in previous installments, while it is used by LV,
and we **seem** to support masked interleaved memory operations on X86,
it's support in cost model leaves a lot to be desired:
until basically yesterday even for AVX512 we had no cost model for it.

As it has been witnessed in the recent
AVX2 `X86TTIImpl::getInterleavedMemoryOpCost()`
costmodel patches, while it is hard-enough to query the cost
of a particular assembly sequence [from llvm-mca],
afterwards the check lines LV costmodel tests must be updated manually.
This is, at the very least, boring.

Okay, now we have decent costmodel coverage for interleaving shuffles,
but now basically the same mind-killing sequence has to be performed
for replication mask. I think we can improve at least the second half
of the problem, by teaching
the `TargetTransformInfoImplCRTPBase::getUserCost()` to recognize
`Instruction::ShuffleVector` that are repetition masks,
adding exhaustive test coverage
using `-cost-model -analyze` + `utils/update_analyze_test_checks.py`

This way we can have good exhaustive coverage for cost model,
and only basic coverage for the LV costmodel.

This patch adds precise undef-aware `isReplicationMask()`,
with exhaustive test coverage.
* `InstructionsTest.ShuffleMaskIsReplicationMask` shows that
   it correctly detects all the known masks.
* `InstructionsTest.ShuffleMaskIsReplicationMask_undef`
  shows that replacing some mask elements in a known replication mask
  still allows us to recognize it as a replication mask.
  Note, with enough undef elts, we may detect a different tuple.
* `InstructionsTest.ShuffleMaskIsReplicationMask_Exhaustive_Correctness`
  shows that if we detected the replication mask with given params,
  then if we actually generate a true replication mask with said params,
  it matches element-wise ignoring undef mask elements.

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D113214
2021-11-05 16:53:47 +03:00
David Sherwood 657a1dcd0d [AArch64] Add target DAG combine for UUNPKHI/LO
When created a UUNPKLO/HI node with an undef input then the
output should also be undef. I've added a target DAG combine
function to ensure we avoid creating an unnecessary uunpklo/hi
instruction.

Differential Revision: https://reviews.llvm.org/D113266
2021-11-05 13:50:59 +00:00
Quinn Pham c71fbdd87b [NFC] Inclusive language: Remove instances of master in URLs
[NFC] This patch fixes URLs containing "master". Old URLs were either broken or
redirecting to the new URL.

Reviewed By: #libc, ldionne, mehdi_amini

Differential Revision: https://reviews.llvm.org/D113186
2021-11-05 08:48:41 -05:00
Simon Pilgrim f2703c3c33 [DAG] FoldConstantArithmetic - rename NumOps -> NumElts. NFC.
NumOps represents the number of elements for vector constant folding, rename this NumElts so in future we can the consistently use NumOps to represent the number of operands of the opcode.

Minor cleanup before trying to begin generalizing FoldConstantArithmetic to support opcodes other than binops.
2021-11-05 13:32:34 +00:00
Jingu Kang a7b1872593 [AArch64] Fix a bug from a pattern for uaddv(uaddlp(x)) ==> uaddlv
A pattern has selected wrong uaddlv MI. It should be as below.

uaddv(uaddlp(v8i8)) ==> uaddlv(v8i8)

Differential Revision: https://reviews.llvm.org/D113263
2021-11-05 12:48:18 +00:00
Alfredo Dal'Ava Junior 1cb9f37a17 [FreeBSD] Do not mark __stack_chk_guard as dso_local
This symbol is defined in libc.so so it is definitely not DSO-Local.
Marking it as such causes problems on some platforms (such as PowerPC).

Differential revision: https://reviews.llvm.org/D109090
2021-11-05 07:29:50 -05:00
Simon Pilgrim c1e7911c3b [DAG] FoldConstantArithmetic - fold bitlogic(bitcast(x),bitcast(y)) -> bitcast(bitlogic(x,y))
To constant fold bitwise logic ops where we've legalized constant build vectors to a different type (e.g. v2i64 -> v4i32), this patch adds a basic ability to peek through the bitcasts and perform the constant fold on the inner operands.

The MVE predicate v2i64 regressions will be addressed by future support for basic v2i64 type support.

One of the yak shaving fixes for D113192....

Differential Revision: https://reviews.llvm.org/D113202
2021-11-05 12:00:59 +00:00
Simon Pilgrim 5e9ac7c0a5 [X86] Enable v32i16 rotate lowering on non-BWI targets
Fixes one of the regressions in D113192
2021-11-05 11:00:31 +00:00
Chen Zheng fed2889f07 [PowerPC] use correct selection for v16i8/v8i16 splat load
Reviewed By: nemanjai

Differential Revision: https://reviews.llvm.org/D113236
2021-11-05 10:04:03 +00:00
Jay Foad 0321bd64e6 Revert "[TwoAddressInstructionPass] Update existing physreg live intervals"
This reverts commit ec0e1e88d2.

It was pushed by mistake.
2021-11-05 09:54:26 +00:00
Jay Foad c93bf53a3e [AMDGPU] NFC formatting fixes in SIMemoryLegalizer 2021-11-05 09:10:24 +00:00
Jay Foad ec0e1e88d2 [TwoAddressInstructionPass] Update existing physreg live intervals
In TwoAddressInstructionPass::processTiedPairs with
-early-live-intervals, update any preexisting physreg live intervals,
as well as virtreg live intervals. By default (without
-precompute-phys-liveness) physreg live intervals only exist for
registers that are live-in to some basic block.

Differential Revision: https://reviews.llvm.org/D113191
2021-11-05 09:10:24 +00:00
Qiu Chaofan 5fd406e254 [PowerPC] Add intrinsic to convert between ppc_fp128 and fp128
ppc_fp128 and fp128 are both 128-bit floating point types. However, we
can't do conversion between them now, since trunc/ext are not allowed
for same-size fp types.

This patch adds two new intrinsics: llvm.ppc.convert.f128.to.ppcf128 and
llvm.convert.ppcf128.to.f128, to support such conversion.

Reviewed By: shchenz

Differential Revision: https://reviews.llvm.org/D109421
2021-11-05 16:58:38 +08:00
Martin Storsjö df0ba47c36 [Support] Allow configuring the preferred type of slashes on Windows
Default to preferring forward slashes when built for MinGW, as
many usecases, when e.g. Clang is used as a drop-in replacement
for GCC, requires the compiler to output paths with forward slashes.

Not all tests pass yet, if configuring to prefer forward slashes though.

Differential Revision: https://reviews.llvm.org/D112787
2021-11-05 10:42:02 +02:00
Martin Storsjö f4d83c56c9 [Support] [Windows] Convert paths to the preferred form
This normalizes most paths (except ones input from the user as command
line arguments) into the preferred form, if `real_style()` evaluates to
`windows_forward`.

Differential Revision: https://reviews.llvm.org/D111880
2021-11-05 10:41:51 +02:00
Martin Storsjö a8b54834a1 [Support] Add a new path style for Windows with forward slashes
This behaves just like the regular Windows style, with both separator
forms accepted, but with get_separator() returning forward slashes.

Add a more descriptive name for the existing style, keeping the old
name around as an alias initially.

Add a new function `make_preferred()` (like the C++17
`std::filesystem::path` function with the same name), which converts
windows paths to the preferred separator form (while this one works on
any platform and takes a `path::Style` argument).

Contrary to `native()` (just like `make_preferred()` in `std::filesystem`),
this doesn't do anything at all on Posix, it doesn't try to reinterpret
backslashes into forward slashes there.

Differential Revision: https://reviews.llvm.org/D111879
2021-11-05 10:41:51 +02:00
Martin Storsjö f95bd18b5f Revert "[Attr] support btf_type_tag attribute"
This reverts commits 737e4216c5 and
ce7ac9e66a.

After those commits, the compiler can crash with a reduced
testcase like this:

$ cat reduced.c
void a(*);
void a() {}
$ clang -c reduced.c -O2 -g
2021-11-05 10:36:40 +02:00
Chen Zheng 9695027066 [PowerPC] address post-commit comments for D106555; NFC
Address namanjai post commit comments.
2021-11-05 05:30:53 +00:00
Vitaly Buka 1caabbef8e [OpaquePtr] Fix initialization-order-fiasco
Asan detects it after D112732.
2021-11-04 19:29:06 -07:00
Shengchen Kan be08e452f3 [X86][MS-InlineAsm] Add constraint *m for memory access w/ global var
Constraint `*m` should be used when the address of a variable is passed
as a value. And the constraint is missing for MS inline assembly when sth
is written to the address of the variable.

The missing would cause FE delete the definition of the static varible,
and then result in "undefined reference to xxx" issue.

Reviewed By: xiangzhangllvm

Differential Revision: https://reviews.llvm.org/D113096
2021-11-05 09:11:41 +08:00
Kirill Stoimenov 3f1aca58df [ASan] Added stack safety support in address sanitizer.
Added and implemented -asan-use-stack-safety flag, which control if ASan would use the Stack Safety results to emit less code for operations which are marked as 'safe' by the static analysis.

Reviewed By: vitalybuka

Differential Revision: https://reviews.llvm.org/D112098
2021-11-04 17:22:31 -07:00
Arthur Eubanks 7175886a0f [NewPM] Make eager analysis invalidation per-adaptor
Follow-up change to D111575.
We don't need eager invalidation on every adaptor. Most notably,
adaptors running passes that use very few analyses, or passes that
purely invalidate specific analyses.

Also allow testing of this via a pipeline string
"function<eager-inv>()".

The compile time/memory impact of this is very comparable to D111575.
https://llvm-compile-time-tracker.com/compare.php?from=9a2eec512a29df45c90c2fcb741e9d5c693b1383&to=b9f20bcdea138060967d95a98eab87ce725b22bb&stat=instructions

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D113196
2021-11-04 17:16:11 -07:00
Yonghong Song 41860e602a BPF: Support btf_type_tag attribute
A new kind BTF_KIND_TYPE_TAG is defined. The tags associated
with a pointer type are emitted in their IR order as modifiers.
For example, for the following declaration:
  int __tag1 * __tag1 __tag2 *g;
The BTF type chain will look like
  VAR(g) -> __tag1 --> __tag2 -> pointer -> __tag1 -> pointer -> int
In the above "->" means BTF CommonType.Type which indicates
the point-to type.

Differential Revision: https://reviews.llvm.org/D113222
2021-11-04 17:01:36 -07:00
Philip Reames dec15d9a0a [indvars] Use loop guards when canonicalizing exit conditions
This extends the logic in canonicalizeExitConditions to use loop guards to specialize the SCEV of the loop invariant term before quering it's range.
2021-11-04 15:23:34 -07:00
Arthur Eubanks 13317286f8 [NewPM] Use the default AA pipeline by default
We almost always want to use the default AA pipeline. It's very easy for
users of PassBuilder to forget to customize the AAManager to use the
default AA pipeline (for example, the NewPM C API forgets to do this).

If somebody wants a custom AA pipeline, similar to what is being done
now with the default AA pipeline registration, they can

  FAM.registerPass([&] { return std::move(MyAA); });

before calling

  PB.registerFunctionAnalyses(FAM);

For example, LTOBackend.cpp and NewPMDriver.cpp do this.

Reviewed By: asbirlea

Differential Revision: https://reviews.llvm.org/D113210
2021-11-04 15:10:34 -07:00
Ben Langmuir a2639dcbe6 [ORC] Add a utility for adding missing "self" relocations to a Symbol
If a tool wants to introduce new indirections via stubs at link-time in
ORC, it can cause fidelity issues around the address of the function if
some references to the function do not have relocations. This is known
to happen inside the body of the function itself on x86_64 for example,
where a PC-relative address is formed, but without a relocation.

```
_foo:
  leaq -7(%rip), %rax ## form pointer to '_foo' without relocation

_bar:
  leaq (%rip), %rax ##  uses X86_64_RELOC_SIGNED to '_foo'
```

The consequence of introducing a stub for such a function at link time
is that if it forms a pointer to itself without relocation, it will not
have the same value as a pointer from outside the function. If the
function pointer is used as a key, this can cause problems.

This utility provides best-effort support for adding such missing
relocations using MCDisassembler and MCInstrAnalysis to identify the
problematic instructions. Currently it is only implemented for x86_64.

Note: the related issue with call/jump instructions is not handled
here, only forming function pointers.

rdar://83514317

Differential revision: https://reviews.llvm.org/D113038
2021-11-04 15:01:05 -07:00
David Blaikie 7cdd262351 DebugInfo: Fix incorrect line table lookup when resolving decl_file from a split unit
Specifically in DWARFv5 the unit for the line table entry was correct
but the context was incorrect - leading to looking up .debug_line_str in
the dwp instead of the executable.

(perhaps we could/should remove the context pointer entirely, and rely
on the one in the unit... I might try that as a separate follow-up
commit)
2021-11-04 14:54:27 -07:00
Philip Reames c0d9bf2f6a [indvars] Allow rotation (narrowing) of exit test when discovering trip count
This relaxes the one-use requirement on the rotation transform specifically for the case where we know we're zexting an IV of the loop.  This allows us to discover trip count information in SCEV, which seems worth a single extra loop invariant truncate.  Honestly, I'd prefer if SCEV could just compute the trip count directly (e.g. D109457), but this unblocks practical benefit.
2021-11-04 14:49:24 -07:00
Yonghong Song 737e4216c5 [Attr] support btf_type_tag attribute
This patch added clang codegen and llvm support
for btf_type_tag support. Currently, btf_type_tag
attribute info is preserved in DebugInfo IR only for
pointer types associated with typedef, global variable
and function declaration. Eventually, such information
is emitted to dwarf.

The following is an example:
  $ cat test.c
  #define __tag __attribute__((btf_type_tag("tag")))
  int __tag *g;
  $ clang -O2 -g -c test.c
  $ llvm-dwarfdump --debug-info test.o
  ...
  0x0000001e:   DW_TAG_variable
                  DW_AT_name      ("g")
                  DW_AT_type      (0x00000033 "int *")
                  DW_AT_external  (true)
                  DW_AT_decl_file ("/home/yhs/test.c")
                  DW_AT_decl_line (2)
                  DW_AT_location  (DW_OP_addr 0x0)

  0x00000033:   DW_TAG_pointer_type
                  DW_AT_type      (0x00000042 "int")

  0x00000038:     DW_TAG_LLVM_annotation
                    DW_AT_name    ("btf_type_tag")
                    DW_AT_const_value     ("tag")

  0x00000041:     NULL

  0x00000042:   DW_TAG_base_type
                  DW_AT_name      ("int")
                  DW_AT_encoding  (DW_ATE_signed)
                  DW_AT_byte_size (0x04)

  0x00000049:   NULL

Basically, a DW_TAG_LLVM_annotation tag will be inserted
under DW_TAG_pointer_type tag if that pointer has a btf_type_tag
associated with it.

Differential Revision: https://reviews.llvm.org/D111199
2021-11-04 14:23:31 -07:00
Philip Reames 453fdebd48 [indvars] Extend canonicalizeExitConditions to inverted operands
As discussed in the original reviews, but done in a follow on.
2021-11-04 14:20:37 -07:00
Thomas Symalla 76cbe62262 [AMDGPU] Changes the AMDGPU_Gfx calling convention by making the SGPRs 4..29 callee-save. This is to avoid superfluous s_movs when executing amdgpu_gfx function calls as the callee is likely not going to change the argument values.
This patch changes the AMDGPU_Gfx calling convention. It defines the SGPR registers s[4:29] as callee-save and leaves some SGPRs usable for callers. The intention is to avoid unneccessary s_mov instructions for arguments the caller would otherwise save and restore in these registers.

Reviewed By: sebastian-ne

Differential Revision: https://reviews.llvm.org/D111637
2021-11-04 21:50:18 +01:00
Noah Shutty d788c44f5c [Support] Improve Caching conformance with Support library behavior
This diff makes several amendments to the local file caching mechanism
which was migrated from ThinLTO to Support in
rGe678c51177102845c93529d457b020f969125373 in response to follow-up
discussion on that commit.

Patch By: noajshu

Differential Revision: https://reviews.llvm.org/D113080
2021-11-04 13:00:44 -07:00
David Green 091244023a [ARM] Move VPTBlock pass after post-ra scheduling
Currently when tail predicating loops, vpt blocks need to be created
with the vctp predicate in case we need to revert to non-tail predicated
form. This has the unfortunate side effect of severely hampering post-ra
scheduling at times as the instructions are already stuck in vpt blocks,
not allowed to be independently ordered.

This patch addresses that by just moving the creation of VPT blocks
later in the pipeline, after post-ra scheduling has been performed. This
allows more optimal scheduling post-ra before the vpt blocks are
created, leading to more optimal tail predicated loops.

Differential Revision: https://reviews.llvm.org/D113094
2021-11-04 18:42:12 +00:00
Wouter van Oortmerssen a320f877ce [WebAssembly] Fix debug locations for ExplicitLocals pass
This is a reworked version of the reverted patch: https://reviews.llvm.org/D112487
Note that
a) it doesn't need the test changes anymore, and
b) I checked at least locally it passes other.test_pthread_lsan_leak

Differential Revision: https://reviews.llvm.org/D113208
2021-11-04 11:38:03 -07:00
Rahman Lavaee f533ec37eb Make the BBAddrMap struct binary-format-agnostic.
The only binary-format-related field in the BBAddrMap structure is the function address (`Addr`), which will use uint64_t in 64B format and uint32_t in 32B format. This patch changes it to use uint64_t in both formats.
This allows non-templated use of the struct, at the expense of a marginal additional size overhead for the 32-bit format. The size of the BB address map section does not change.

Differential Revision: https://reviews.llvm.org/D112679
2021-11-04 10:27:24 -07:00
Zakk Chen 0649dfebba [RISCV] Rename some assembler mnemonic and intrinsic functions for RVV 1.0.
Rename vpopc/vmandnot/vmornot to vcpop/vmandn/vmorn assembler mnemonic.

Reviewed By: frasercrmck, jrtc27, craig.topper

Differential Revision: https://reviews.llvm.org/D111062
2021-11-04 10:08:01 -07:00
Kazu Hirata 2887117d2c [Hexagon] Use make_early_inc_range (NFC) 2021-11-04 08:51:05 -07:00
Jamie Schmeiser 8720149d9b Remove unused function from print-changed=dot-cfg code
Summary:
Remove unused function from print-changed=dot-cfg code to silence a
gcc compiler warning.

Author: Jamie Schmeiser <schmeise@ca.ibm.com>
Reviewed By: uabelho(Mikael Holmen)
Differential Revision: https://reviews.llvm.org/D113188
2021-11-04 10:40:50 -04:00
Sander de Smalen 1ea4296208 [NFC] Remove from UnivariateLinearPolyBase::getValue().
This interface should not have existed in the first place, let alone
be a public member.

It allows calling `ElementCount::get(..)->getValue()`, which is ambiguous.
The interfaces to be used are either getFixedValue() or getKnownMinValue().
2021-11-04 14:32:08 +00:00
Sjoerd Meijer 3fd1902ad8 [FuncSpec] Enable it only with -O3
Function specialisation was running at all optimisation levels (if enabled on
the command line, it is not on by default). That was an oversight and not
something we want to do. Function specialisation duplicates functions when it
triggers, so the backend is processing more functions/instructions resulting in
compile-time increases, which seems more appropriate with -O3 and inline with
GCC. Please note that since function specialisation is not enabled by default,
this didn't require updating any pass manager tests.

Differential Revision: https://reviews.llvm.org/D112129
2021-11-04 13:59:00 +00:00