Commit Graph

32929 Commits

Author SHA1 Message Date
Amara Emerson fb60e50c78 [GlobalISel] Fix a combine crash due to a negative G_INSERT_VECTOR_ELT idx.
These should really be folded away to undef but we shouldn't crash in any case.
2022-09-05 12:10:17 +01:00
Simon Pilgrim 4e6783f866 [DAG] getFreeze()/getNode() - account for operand depth when calling isGuaranteedNotToBeUndefOrPoison (PR57554)
Similar to #57402 - we were calling isGuaranteedNotToBeUndefOrPoison on the freeze operand (with Depth = 0), but wasn't accounting for the fact that a later isGuaranteedNotToBeUndefOrPoison assertion will call from the new node (with Depth = 0 as well) - which will then recursively call isGuaranteedNotToBeUndefOrPoison for its operands with Depth = 1

Fixes #57554
2022-09-05 11:46:46 +01:00
Craig Topper e529c0a2a0 [TargetLowering] Use ComputeMaxSignificantBits instead of ComputeNumSignBits in expandMUL_LOHI. NFC
The way ComputeNumSignBits was being used was only correct if
OuterBitSize is exactly 2x InnerBitSize. Which is always true,
but not obviously so. Comparing ComputeMaxSignificantBits to
InnerBitSize feels more correct.
2022-09-04 22:35:16 -07:00
Craig Topper 45e2809f71 [TargetLowering] Use getShiftAmountConstant. NFC 2022-09-04 20:22:32 -07:00
Kazu Hirata 32aa35b504 Drop empty string literals from static_assert (NFC)
Identified with modernize-unary-static-assert.
2022-09-03 11:17:47 -07:00
Kazu Hirata fedc59734a [llvm] Use range-based for loops (NFC) 2022-09-03 11:17:40 -07:00
Kazu Hirata 89f1433225 Use llvm::lower_bound (NFC) 2022-09-03 11:17:37 -07:00
Kazu Hirata bc96b36a41 [CodeGen] Use std::lcm (NFC) 2022-09-03 11:17:33 -07:00
Simon Pilgrim 62cdfdab4d [DAG] canCreateUndefOrPoison - add freeze(insert_subvector(x,y,c)) -> insert_subvector(freeze(x),freeze(y),c) support
We already have plenty of assertions in place to ensure that the insertion index is constant and inrange
2022-09-03 13:41:33 +01:00
Simon Pilgrim e2d140e9c3 [TTI] Add isExpensiveToSpeculativelyExecute wrapper
CGP uses a raw `getInstructionCost(I, TargetTransformInfo::TCK_SizeAndLatency) >= TCC_Expensive` check to see if its better to move an expensive instruction used in a select behind a branch instead.

This is causing issues with upcoming improvements to TCK_SizeAndLatency costs on X86 as we need to use TCK_SizeAndLatency as an uop count (so its compatible with various target-specific buffer sizes - see D132288), but we can have instructions that have a low TCK_SizeAndLatency value but should still be treated as 'expensive' (FDIV for example) - by adding a isExpensiveToSpeculativelyExecute wrapper we can keep the current behaviour but still add an x86 override in a future patch when the cost tables are updated to compensate.
2022-09-03 13:12:22 +01:00
Daniil Fukalov b4e1b0e00d [LiveIntervals] Split live intervals on any dead def
Each dead def of the same virtual register is required to be split into multiple
virtual registers with separate live intervals to avoid MachineVerifier error.

Partially fixes https://github.com/llvm/llvm-project/issues/56050 and
https://github.com/llvm/llvm-project/issues/56051

Reviewed By: qcolombet

Differential Revision: https://reviews.llvm.org/D130477
2022-09-02 20:00:22 +03:00
David Green 5073499b69 [TypePromotionPass] Rename variable to avoid name conflict. NFC 2022-09-02 12:35:15 +01:00
Fangrui Song 8d95fd7e56 [MachineFunctionPass] Support -filter-passes for -print-changed
[MachineFunctionPass] Support -filter-passes for -print-changed

-filter-passes specifies a `PassID` (a lower-case dashed-separated pass name,
also used by -print-after, -stop-after, etc) instead of a CamelCasePass.

`-filter-passes=CamelCaseNewPMPass` seems like a workaround for new PM passes before
we can use lower-case dashed-separated pass names (as used by `-passes=`).

Example:
```
# getPassName() is "IRTranslator". PassID is "irtranslator"
llc -mtriple=aarch64 -print-changed -filter-passes=irtranslator < print-changed-machine.ll
```

Close https://github.com/llvm/llvm-project/issues/57453

Reviewed By: aeubanks

Differential Revision: https://reviews.llvm.org/D133055
2022-09-01 11:06:06 -07:00
Nikita Popov 5134bd432f [DwarfEhPrepare] Assign dummy debug location for inserted _Unwind_Resume calls (PR57469)
DwarfEhPrepare inserts calls to _Unwind_Resume into landing pads.
If _Unwind_Resume happens to be defined in the same module and
debug info is used, then this leads to a verifier error:

  inlinable function call in a function with debug info must
    have a !dbg location
  call void @_Unwind_Resume(ptr %exn.obj) #0

Fix this by assigning a dummy location to the call. (As this
happens in the backend, inlining is not actually relevant here.)

Fixes https://github.com/llvm/llvm-project/issues/57469.

Differential Revision: https://reviews.llvm.org/D133095
2022-09-01 16:35:49 +02:00
Nikita Popov c635ea5c50 [CombinerHelper] Avoid deprecated method (NFC) 2022-09-01 16:09:05 +02:00
Stephen Tozer 211efaa1ce Reapply "[DebugInfo] Extend the InstrRef LDV to support DbgValues with many Ops"
Re-landing with an erroneous assert removed.

This reverts commit 58d104b352.
2022-09-01 14:20:24 +01:00
Amara Emerson 4cf3db41da [GlobalISel] Add sdiv exact (X, constant) -> mul combine.
This port of the SDAG optimization is only for exact sdiv case.

Differential Revision: https://reviews.llvm.org/D130517
2022-09-01 13:34:00 +01:00
Craig Topper 77dbc5200b [MachineCSE] Use TargetInstrInfo::isAsCheapAsAMove in isPRECandidate.
Some targets like RISC-V require operands to be inspected to
determine if an instruction is similar to a move.

Spotted while investigating code differences between using an ADDI
vs an ADDIW. RISC-V has the isAsCheapAsAMove flag for ADDI, but
the TII hook checks the immediate is 0 or the register is X0. ADDIW
is never generated with X0 or with an immediate of 0 so it doesn't
have the isAsCheapAsAMove flag.

I don't know enough about the PRE code to write a test for this yet.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D132981
2022-08-31 15:39:41 -07:00
Sam Clegg c5c4ba37b1 [WebAssembly][MC] Avoid the need for .size directives for functions
Warn if `.size` is specified for a function symbol.  The size of a
function symbol is determined solely by its content.

I noticed this simplification was possible while debugging #57427, but
this change doesn't fix that specific issue.

Differential Revision: https://reviews.llvm.org/D132929
2022-08-31 14:28:56 -07:00
Nick Desaulniers d7474bef77 [llvm][TailDuplicator] don't taildup isInlineAsmBrIndirectTargets
This fixes a crash observed after
https://reviews.llvm.org/D129997.

Similar to D88823.

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D130127
2022-08-31 13:07:10 -07:00
Simon Pilgrim eaede4b5b7 [DAG] extractShiftForRotate - replace assertion for shift opcode with an early-out
We feed the result from the first extractShiftForRotate call into the second, and that result might no longer be a shift op (usually due to constant folding).

NOTE: We REALLY need to stop creating nodes on the fly inside extractShiftForRotate!

Fixes Issue #57474
2022-08-31 15:50:48 +01:00
Simon Pilgrim 9d22800275 [DAG] visitFreeze - account for operand depth when calling isGuaranteedNotToBeUndefOrPoison (PR57402)
We were calling isGuaranteedNotToBeUndefOrPoison on operands (with Depth = 0), but wasn't accounting for the fact that a later isGuaranteedNotToBeUndefOrPoison assertion will call from the new node (with Depth = 0 as well) - which will then recursively call isGuaranteedNotToBeUndefOrPoison for its operands with Depth = 1

Fixes #57402
2022-08-31 12:20:30 +01:00
Kai Luo ad2f7fd286 [AtomicExpand] Make floating point conversion happens before fence insertion
IIUC, the conversion part is not part of atomic operations and fences should be put around converted atomic operations.
This also fixes atomic load of floating point values which requires fence on PowerPC.

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D127609
2022-08-31 09:54:58 +08:00
Markus Böck 2fdf963daf [GlobalISel] Explicitly fail trying to translate `gc.statepoint` and related intrinsics
The provided testcase would previously fail with an assertion due to later down below trying to allocate registers for `token` return types and arguments. This is especially problematic as the process would then exit instead of falling back to using FastIsel.

This patch fixes that by simply explicitly failing translation if either of these intrinsics are encountered.

Fixes https://github.com/llvm/llvm-project/issues/57349

Differential Revision: https://reviews.llvm.org/D132974
2022-08-31 00:47:17 +02:00
David Penry 9aca7b0217 [ModuloScheduler] Fix missing LLVM_DEBUG
Guard a debug message with LLVM_DEBUG

Differential Revision: https://reviews.llvm.org/D132895
2022-08-30 09:20:37 -07:00
Tomas Matheson 9a390d6692 [AArch64][GISel] fix G_ADD*/G_SUB* legalization
widenScalarDst updates the insert point to after MI, so
widenScalarSrc must be called before widenScalarDst. Otherwise
The updated Src values will appear after MI and break SSA. e.g.:

  %14:_(s64), %15:_(s1) = G_UADDE %9:_, %11:_, %13:_

becomes

  %14:_(s64), %16:_(s32) = G_UADDE %9:_, %11:_, %17:_
  %15:_(s1) = G_TRUNC %16:_(s32)
  %17:_(s32) = G_ZEXT %13:_(s1)

Differential Revision: https://reviews.llvm.org/D132547

Change-Id: Ie3458747a6879433f4d5ab9939d2bd102dd0f2db
2022-08-30 10:59:32 +01:00
Xiang1 Zhang a808ac2e42 [NFC] Clang-format for CodeGenPrepare.cpp 2022-08-30 13:42:36 +08:00
wanglian e2bb9774b1 [LegalizeTypes] Support widen result for VECTOR_REVERSE.
Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D132359
2022-08-30 10:01:26 +08:00
Craig Topper 2f811a6c7f [VP][RISCV] Add vp.fabs intrinsic and RISC-V support.
Mostly just modeled after vp.fneg except there is a
"functional instruction" for fneg while fabs is always an
intrinsic.

Reviewed By: fakepaper56

Differential Revision: https://reviews.llvm.org/D132793
2022-08-29 09:32:06 -07:00
Kazu Hirata 267f21a21b Use std::gcd (NFC)
This patch replaces calls to greatestCommonDivisor with std::gcd where
two arguments are of the same type.  This means that
std::common_type_t of the argument type is the same as the argument
type.

We could drop calls to std::abs in some cases, but that's left for
another patch.
2022-08-28 10:41:51 -07:00
Kazu Hirata d1688e9ddf [llvm] Use std::gcd (NFC)
This patch replaces calls to greatestCommonDivisor with std::gcd where
both arguments are known to be of unsigned.  This means that
std::common_type_t of the two argument types should just be the wider
one of the two.
2022-08-27 23:54:29 -07:00
Kazu Hirata 9d6ab7230b [GlobalISel] Use std::lcm (NFC)
This patch replaces getLCMSize with std::lcm, a C++17 feature.

Note that all the arguments are of unsigned with no implicit type
conversion as they are passed to getLCMSize.
2022-08-27 09:53:16 -07:00
Kazu Hirata 21de2888a4 Use llvm::is_contained (NFC) 2022-08-27 09:53:11 -07:00
Matthias Gehre 3e39b27101 [llvm/CodeGen] Add ExpandLargeDivRem pass
Adds a pass ExpandLargeDivRem to expand div/rem instructions
with more than 128 bits into a loop computing that value.

As discussed on https://reviews.llvm.org/D120327, this approach has the advantage
that it is independent of the runtime library. This also helps the clang driver,
which otherwise would need to understand enough about the runtime library
to know whether to allow _BitInts with more than 128 bits.

Targets are still free to disable this pass and instead provide a faster
implementation in a runtime library.

Fixes https://github.com/llvm/llvm-project/issues/44994

Differential Revision: https://reviews.llvm.org/D126644
2022-08-26 11:55:15 +01:00
Simon Pilgrim 88c7b16bed [DAG] Strip poison generating flags in freeze(op()) -> op(freeze()) fold
This patch follows the InstCombine approach of stripping poison generating flags (nsw/nuw from add/sub etc.) to allow us to push a freeze() through the op. Unlike InstCombine it doesn't retain any flags, but we have plenty of DAG folds that do the same thing already. We assert that the newly generated op isGuaranteedNotToBeUndefOrPoison.

Similar to the ValueTracking approach, isGuaranteedNotToBeUndefOrPoison has been updated to confirm that if an op can't create undef/poison and its operands are guaranteed not to be undef/poison - then its not undef/poison. This is just for the generic opcodes - target specific opcodes will need to do this manually just in case they have some special cases.

Differential Revision: https://reviews.llvm.org/D132333
2022-08-26 11:47:51 +01:00
Matthias Gehre 6d13b80fcb Revert "[SelectionDAG] Emit calls to __divei4 and friends for division/remainder of large integers"
This reverts https://reviews.llvm.org/D120329.
I abandoned the PR [0] to add __divei4 functions to compiler-rt
in favor of adding a pass to transform div/rem [1].

This removes the backend code that was supposed to emit calls to the __divei4 functions.

[0] https://reviews.llvm.org/D120327
[1] https://reviews.llvm.org/D130076

Differential Revision: https://reviews.llvm.org/D130079
2022-08-26 10:52:56 +01:00
Alex Richardson 0483b00875 Mark the $local function begin symbol as a function
While this does not matter for most targets, when building for Arm Morello,
we have to mark the symbol as a function and add size information, so that
LLD can correctly evaluate relocations against the local symbol.
Since Morello is an out-of-tree target, I tried to reproduce this with
in-tree backends and with the previous reviews applied this results in
a noticeable difference when targeting Thumb.

Background: Morello uses a method similar Thumb where the encoding mode is
specified in the LSB of the symbol. If we don't mark the target as a
function, the relocation will not have the LSB set and calls will end up
using the wrong encoding mode (which will almost certainly crash).

Reviewed By: MaskRay

Differential Revision: https://reviews.llvm.org/D131429
2022-08-26 09:34:04 +00:00
wanglian 2887d7786f [DAGCombiner] Use FoldConstantArithmetic instead of dyn_cast in visitFP_ROUND.
Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D132439
2022-08-25 11:29:05 +08:00
Matthias Braun 5364f49407 Fix CSR update check
D132080 introduced a bug leading to `RegisterClassInfo` caches not
getting invalidated when there was exactly one more CSR register added.

Differential Revision: https://reviews.llvm.org/D132606
2022-08-24 18:09:49 -07:00
Sami Tolvanen cff5bef948 KCFI sanitizer
The KCFI sanitizer, enabled with `-fsanitize=kcfi`, implements a
forward-edge control flow integrity scheme for indirect calls. It
uses a !kcfi_type metadata node to attach a type identifier for each
function and injects verification code before indirect calls.

Unlike the current CFI schemes implemented in LLVM, KCFI does not
require LTO, does not alter function references to point to a jump
table, and never breaks function address equality. KCFI is intended
to be used in low-level code, such as operating system kernels,
where the existing schemes can cause undue complications because
of the aforementioned properties. However, unlike the existing
schemes, KCFI is limited to validating only function pointers and is
not compatible with executable-only memory.

KCFI does not provide runtime support, but always traps when a
type mismatch is encountered. Users of the scheme are expected
to handle the trap. With `-fsanitize=kcfi`, Clang emits a `kcfi`
operand bundle to indirect calls, and LLVM lowers this to a
known architecture-specific sequence of instructions for each
callsite to make runtime patching easier for users who require this
functionality.

A KCFI type identifier is a 32-bit constant produced by taking the
lower half of xxHash64 from a C++ mangled typename. If a program
contains indirect calls to assembly functions, they must be
manually annotated with the expected type identifiers to prevent
errors. To make this easier, Clang generates a weak SHN_ABS
`__kcfi_typeid_<function>` symbol for each address-taken function
declaration, which can be used to annotate functions in assembly
as long as at least one C translation unit linked into the program
takes the function address. For example on AArch64, we might have
the following code:

```
.c:
  int f(void);
  int (*p)(void) = f;
  p();

.s:
  .4byte __kcfi_typeid_f
  .global f
  f:
    ...
```

Note that X86 uses a different preamble format for compatibility
with Linux kernel tooling. See the comments in
`X86AsmPrinter::emitKCFITypeId` for details.

As users of KCFI may need to locate trap locations for binary
validation and error handling, LLVM can additionally emit the
locations of traps to a `.kcfi_traps` section.

Similarly to other sanitizers, KCFI checking can be disabled for a
function with a `no_sanitize("kcfi")` function attribute.

Relands 67504c9549 with a fix for
32-bit builds.

Reviewed By: nickdesaulniers, kees, joaomoreira, MaskRay

Differential Revision: https://reviews.llvm.org/D119296
2022-08-24 22:41:38 +00:00
Sami Tolvanen a79060e275 Revert "KCFI sanitizer"
This reverts commit 67504c9549 as using
PointerEmbeddedInt to store 32 bits breaks 32-bit arm builds.
2022-08-24 19:30:13 +00:00
Sami Tolvanen 67504c9549 KCFI sanitizer
The KCFI sanitizer, enabled with `-fsanitize=kcfi`, implements a
forward-edge control flow integrity scheme for indirect calls. It
uses a !kcfi_type metadata node to attach a type identifier for each
function and injects verification code before indirect calls.

Unlike the current CFI schemes implemented in LLVM, KCFI does not
require LTO, does not alter function references to point to a jump
table, and never breaks function address equality. KCFI is intended
to be used in low-level code, such as operating system kernels,
where the existing schemes can cause undue complications because
of the aforementioned properties. However, unlike the existing
schemes, KCFI is limited to validating only function pointers and is
not compatible with executable-only memory.

KCFI does not provide runtime support, but always traps when a
type mismatch is encountered. Users of the scheme are expected
to handle the trap. With `-fsanitize=kcfi`, Clang emits a `kcfi`
operand bundle to indirect calls, and LLVM lowers this to a
known architecture-specific sequence of instructions for each
callsite to make runtime patching easier for users who require this
functionality.

A KCFI type identifier is a 32-bit constant produced by taking the
lower half of xxHash64 from a C++ mangled typename. If a program
contains indirect calls to assembly functions, they must be
manually annotated with the expected type identifiers to prevent
errors. To make this easier, Clang generates a weak SHN_ABS
`__kcfi_typeid_<function>` symbol for each address-taken function
declaration, which can be used to annotate functions in assembly
as long as at least one C translation unit linked into the program
takes the function address. For example on AArch64, we might have
the following code:

```
.c:
  int f(void);
  int (*p)(void) = f;
  p();

.s:
  .4byte __kcfi_typeid_f
  .global f
  f:
    ...
```

Note that X86 uses a different preamble format for compatibility
with Linux kernel tooling. See the comments in
`X86AsmPrinter::emitKCFITypeId` for details.

As users of KCFI may need to locate trap locations for binary
validation and error handling, LLVM can additionally emit the
locations of traps to a `.kcfi_traps` section.

Similarly to other sanitizers, KCFI checking can be disabled for a
function with a `no_sanitize("kcfi")` function attribute.

Reviewed By: nickdesaulniers, kees, joaomoreira, MaskRay

Differential Revision: https://reviews.llvm.org/D119296
2022-08-24 18:52:42 +00:00
spupyrev 8d5b694da1 extending code layout alg
The diff modifies ext-tsp code layout algorithm in the following ways:
(i) fixes merging of cold block chains (this is a port of D129397);
(ii) adjusts the cost model utilized for optimization;
(iii) adjusts some APIs so that the implementation can be used in BOLT; this is
a prerequisite for D129895.

The only non-trivial change is (ii). Here we introduce different weights for
conditional and unconditional branches in the cost model. Based on the new model
it is slightly more important to increase the number of "fall-through
unconditional" jumps, which makes sense, as placing two blocks with an
unconditional jump next to each other reduces the number of jump instructions in
the generated code. Experimentally, this makes a mild impact on the performance;
I've seen up to 0.2%-0.3% perf win on some benchmarks.

Reviewed By: hoy

Differential Revision: https://reviews.llvm.org/D129893
2022-08-24 09:40:25 -07:00
Simon Pilgrim f9de13232f [X86] Promote i8/i16 CTTZ (BSF) instructions and remove speculation branch
This patch adds a Type operand to the TLI isCheapToSpeculateCttz/isCheapToSpeculateCtlz callbacks, allowing targets to decide whether branches should occur on a type-by-type/legality basis.

For X86, this patch proposes to allow CTTZ speculation for i8/i16 types that will lower to promoted i32 BSF instructions by masking the operand above the msb (we already do something similar for i8/i16 TZCNT). This required a minor tweak to CTTZ lowering - if the src operand is known never zero (i.e. due to the promotion masking) we can remove the CMOV zero src handling.

Although BSF isn't very fast, most CPUs from the last 20 years don't do that bad a job with it, although there are some annoying passthrough EFLAGS dependencies. Additionally, now that we emit 'REP BSF' in most cases, we are tending towards assuming this will most likely be executed as a TZCNT instruction on any semi-modern CPU.

Differential Revision: https://reviews.llvm.org/D132520
2022-08-24 17:28:18 +01:00
Stephen Tozer 58d104b352 Revert "[DebugInfo] Extend the InstrRef LDV to support DbgValues with many Ops"
Reverting due to reported errors when running Linux kernel builds with
KMSAN -gdwarf-4.

This reverts commit 2cb9e1ac42.
2022-08-24 15:24:32 +01:00
Simon Pilgrim 5377abcde2 [DAG] matchRotateHalf - constify SelectionDAG arg. NFC.
Based off Issue #57283 - we need to try harder to ensure we're not creating nodes on-the-fly - so make sure we're just using SelectionDAG for analysis where possible
2022-08-24 10:57:38 +01:00
Simon Pilgrim e624f8a3bb [DAG] MatchRotate - bail if we fail to match a shl/srl pair
extractShiftForRotate may fail to return canonicalized shifts due to constant folding or other simplification that can occur in getNode()

Fixes Issue #57283
2022-08-24 03:05:07 +01:00
Sanjay Patel f8dfbea324 [SDAG] expand more is-power-of-2 patterns that use popcount
(ctpop x) == 1 --> (x != 0) && ((x & x-1) == 0)

Adjust the legality check to avoid the poor codegen on AArch64.
We probably only want to use popcount on this pattern when it
is a single instruction.

fixes #57225

Differential Revision: https://reviews.llvm.org/D132237
2022-08-23 17:53:53 -04:00
Stephen Tozer 2cb9e1ac42 [DebugInfo] Extend the InstrRef LDV to support DbgValues with many Ops
This patch builds on prior support patches to enable support for
variadic debug values in InstrRefLDV, allowing DBG_VALUE_LISTs to
have their ranges extended.

Differential Revision: https://reviews.llvm.org/D128212
2022-08-23 20:17:09 +01:00
Arthur Eubanks d6cc7a5b46 [FastISel] Respect musttail over "disable-tail-calls"
musttail should be honored even in the presence of attributes like "disable-tail-calls". SelectionDAG properly handles this.

Update LangRef to explicitly mention that this is the semantics of musttail.

Reviewed By: rnk

Differential Revision: https://reviews.llvm.org/D132193
2022-08-23 08:55:40 -07:00
Jakub Kuderski 6fa87ec10f [ADT] Deprecate is_splat and replace all uses with all_equal
See the discussion thread for more details:
https://discourse.llvm.org/t/adt-is-splat-and-empty-ranges/64692

Reviewed By: dblaikie

Differential Revision: https://reviews.llvm.org/D132335
2022-08-23 11:36:27 -04:00
Stephen Tozer 89d0cc99ec [DebugInfo][InstrRef] Handle transfers of variadic debug values in LDV
This patch adds the last of the changes required to enable
DBG_VALUE_LIST handling in InstrRefLDV, handling variadic debug values
during the transfer tracking step. Most of the changes are fairly
straightforward, and based around tracking multiple locations per
variable in TransferTracker::VLocTracker.

Differential Revision: https://reviews.llvm.org/D128211
2022-08-23 15:01:28 +01:00
Thomas Symalla 562accddaa
[NFC] Fix typo in dbg message in RegisterCoalescer.
funcion => function
2022-08-23 15:14:45 +02:00
Stephen Tozer b12e5c884f [DebugInfo][InstrRef][NFC] Emit variadic debug values from InstrRefLDV
In preparation for supporting DBG_VALUE_LIST in InstrRefLDV, this patch
adds the logic for emitting DBG_VALUE_LIST instructions from
InstrRefLDV. The logical changes here are fairly simple, with the main
change being that instead of directly prepending offsets to the DIExpr,
we use appendOpsToArg to modify the expression for individual debug
operands in the expression. The function emitLoc is also changed to take
a list of debug ops, with an empty list meaning an undef value.

Differential Revision: https://reviews.llvm.org/D128209
2022-08-23 13:22:56 +01:00
Denis Antrushin d1fd791e72 [TwoAddressInstruction] Handle pointer compare sunk past statepoint.
CodeGenPrepare pass can sink pointer comparison across statepoint
to the point of use (see comment in IR/SafepointIRVerifier.cpp)
Due to specifics of statepoints, it is still legal to have tied
def and use rewritten to the same register in TwoAddress pass.
However, properly updating LiveIntervals and LiveVariables becomes
complicated. For simplicity, let's fall back to generic handling of
tied registers when we detect such case.
TODO: This fixes functional (assertion) failure. Ideally we should
try to recompute new live range/liveness in place.

Reviewed By: skatkov

Differential Revision: https://reviews.llvm.org/D132255
2022-08-23 12:34:11 +03:00
Stephen Tozer 53fd5af689 [DebugInfo] Let InstrRefBasedLDV handle joins for lists of debug ops
In preparation for adding support for DBG_VALUE_LIST instructions in
InstrRefLDV, this patch updates the logic for joining variables at block
joins to support joining variables that use multiple debug operands.
This is one of the more meaty "logical" changes, although the line count
isn't too high - this changes pickVPHILoc to find a valid joined
location for every operand, with part of the function being split off
into pickValuePHILoc which finds a location for a single operand.

Differential Revision: https://reviews.llvm.org/D128180
2022-08-22 20:22:22 +01:00
David Penry ced705c440 [ModuloSchedule] Add interface call to accept/reject SMS schedules
This interface allows a target to reject a proposed
SMS schedule.  For Hexagon/PowerPC, all schedules
are accepted, leaving behavior unchanged.  For ARM,
schedules which exceed register pressure limits are
rejected.

Also, two RegisterPressureTracker methods now need to be public so
that register pressure can be computed by more callers.

Reapplication of D128941/(reversion:D132037) with small fix.

Differential Revision: https://reviews.llvm.org/D132170
2022-08-22 12:10:13 -07:00
Philip Reames 274f86e7a6 [TTI] Remove OperandValueKind/Properties from getArithmeticInstrCost interface [nfc]
This completes the client side transition to the OperandValueInfo version of this routine.  Backend TTI implementations still use the prior versions for now.
2022-08-22 11:06:32 -07:00
Kazu Hirata 36ec4deca5 [LiveDebugValues] Fix a warning
This patch fixes:

  llvm/lib/CodeGen/LiveDebugValues/InstrRefBasedImpl.h:330:5: error:
  anonymous types declared in an anonymous union are an extension
  [-Werror,-Wnested-anon-types]
2022-08-22 10:57:16 -07:00
Stephen Tozer b5ba5d2aab [DebugInfo][NFC] Represent DbgValues with multiple ops in IRefLDV
In preparation for allowing InstrRefBasedLDV to handle DBG_VALUE_LIST,
this patch updates the internal representation that it uses to represent
debug values to store a list of values. This is one of the more
significant changes in terms of line count, but is fairly simple and
should not affect the output of this pass.

Differential Revision: https://reviews.llvm.org/D128177
2022-08-22 18:04:38 +01:00
Matthias Braun b2542c40b9 RegisterClassInfo: Fix CSR cache invalidation
`RegisterClassInfo` caches information like allocation orders and reuses
it for multiple machine functions where possible. However the `MCPhysReg
*CalleeSavedRegs` field used to test whether the set of callee saved
registers changed did not work: After D28566
`MachineRegisterInfo::getCalleeSavedRegs()` can return dynamically
computed CSR sets that are only valid while the `MachineRegisterInfo`
object of the current function exists.

This changes the code to make a copy of the CSR list instead of keeping
a possibly invalid pointer around.

Differential Revision: https://reviews.llvm.org/D132080
2022-08-22 09:28:26 -07:00
Stephen Tozer 11ce014a12 [DebugInfo][NFC] Update LDV to use generic DBG_VALUE* MI interface
Currently, InstrRefLDV only handles DBG_VALUE instructions, not
DBG_VALUE_LIST, and as a result of this it handles these instructions
using functions that only work for that type of debug value, i.e. using
getOperand(0) to get the debug operand. This patch changes this to use
the generic debug value functions, such as getDebugOperand and
isDebugOffsetImm, as well as adding an IsVariadic field to the
DbgValueProperties class and a few other minor changes to acknowledge
DBG_VALUE_LISTs. Note that this patch does not add support for
DBG_VALUE_LIST here, but is a precursor to other patches that do add
that support.

Differential Revision: https://reviews.llvm.org/D128174
2022-08-22 16:28:12 +01:00
Stephen Tozer 53125e7d91 [DebugInfo] Handle joins PHI+Def values in InstrRef LiveDebugValues
In the InstrRefBasedImpl for LiveDebugValues, we attempt to propagate
debug values through basic blocks in part by checking to see whether all
a variable's incoming debug values to a BB "agree", i.e. whether their
properties match and they refer to the same underlying value.

Prior to this patch, the check for agreement between incoming values
relied on exact equality, which meant that a VPHI and a Def DbgValue
that referred to the same underlying value would be seen as disagreeing.
This patch changes this behaviour to treat them as referring to the same
value, allowing the shared value to propagate into the BB.

Differential Revision: https://reviews.llvm.org/D125953
2022-08-22 14:51:27 +01:00
Kazu Hirata ec5eab7e87 Use range-based for loops (NFC) 2022-08-20 21:18:32 -07:00
Kazu Hirata 258531b7ac Remove redundant initialization of Optional (NFC) 2022-08-20 21:18:28 -07:00
Luo, Yuanke 5159be3c9b (Reland) [fastalloc] Support allocating specific register class in fastalloc
This reverts commit 853bb192c4.
2022-08-20 13:25:34 +08:00
Lorenzo Albano 98117fe208 [VP] Add splitting for VP_STRIDED_STORE and VP_STRIDED_LOAD
Following the comment's thread of D117235, I added checks for the widening + splitting case, which also causes a split with one of the resulting vectors to be empty. Due to the same issues described in that same thread, the `fixed-vectors-strided-store.ll` test is missing the widening + splitting case, while the same case in the `strided-vpload.ll` test requires to manually split the loaded vector.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D121784
2022-08-19 18:15:56 -07:00
Eli Friedman 8f826fe723 Fix reverse-iteration buildbot.
A couple of instances of iterating over maps snuck in while the bot was
down; fix them to use maps with deterministic iteration.
2022-08-19 14:21:05 -07:00
Nick Desaulniers e412bac912 [MachineVerifier] add checks for INLINEASM_BR
Test for a case we observed after the initial implementation of D129997
landed, in which case we observed a crash while building the ppc64le
Linux kernel. In that case, we had one block with two exits, both to the
same successor. Removing one of the exits corrupted the
successor/predecessor lists.

So when we have an INLINEASM_BR, check a few things for each indirect
target:
1. that it exists.
2. that it is listed in our successors.
3. that its predecessor list contains the parent MBB of INLINEASM_BR.

This would have caught the regression discovered after D129997 landed,
after the pass that was problematic (early-tailduplication) rather than
getting a stack trace in a later pass (regalloc) that doesn't understand
the anomaly and crashes.

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D130290
2022-08-19 12:52:26 -07:00
Bill Wendling ac6a0cdc2e [X86][AArch64][NFC] Simplify querying used argument registers
Registers used for arguments are listed as "live-ins" into the starting
basic block. This means we don't have to go through a potentially
expensive search through all possible argument registers when we only
care about used argument registers.

Differential Revision: https://reviews.llvm.org/D132181
2022-08-19 11:39:05 -07:00
wanglian fc2b4dfef2 [DAGCombiner] Add use check for VSCALE in visitSUB.
Reviewed By: david-arm

Differential Revision: https://reviews.llvm.org/D132115
2022-08-19 09:46:18 +08:00
Eric Wang ad8eb85545 [NFC][MLGO] ML Regalloc Priority Advisor
This patch introduces the priority analysis and the priority advisor,
the default implementation, and the scaffolding for introducing the
other implementations of the advisor.

Reviewed By: mtrofin

Differential Revision: https://reviews.llvm.org/D131220
2022-08-18 15:33:48 -07:00
Paul Walker 96c8d615d6 [SVE] Extend findMoreOptimalIndexType so BUILD_VECTORs do not force 64bit indices.
Extends findMoreOptimalIndexType to allow ISD::BUILD_VECTOR based
indices to be truncated when such truncation is lossless. This can
enable the use of 32bit gather/scatter indices thus making it less
likely to have to split a gather/scatter in two.

Depends on D125194

Differential Revision: https://reviews.llvm.org/D130533
2022-08-18 18:00:53 +01:00
Simon Pilgrim fdec50182d [CostModel] Replace getUserCost with getInstructionCost
* Replace getUserCost with getInstructionCost, covering all cost kinds.
* Remove getInstructionLatency, it's not implemented by any backends, and we should fold the functionality into getUserCost (now getInstructionCost) to make it easier for targets to handle the cost kinds with their existing cost callbacks.

Original Patch by @samparker (Sam Parker)

Differential Revision: https://reviews.llvm.org/D79483
2022-08-18 11:55:23 +01:00
wanglian 989ebc1783 [DAGCombiner][NFC] Tidy up unnecessary brackets in visitADD.
Reviewed By: sdesmalen

Differential Revision: https://reviews.llvm.org/D132107
2022-08-18 15:48:22 +08:00
wanglian 230e277dfe [DAGCombiner][NFC] Merge two if statement into one.
Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D131941
2022-08-18 10:12:35 +08:00
Daniil Fukalov 7ed3d81333 [NFCI] Move cost estimation from TargetLowering to TargetTransformInfo.
TragetLowering had two last InstructionCost related `getTypeLegalizationCost()`
and `getScalingFactorCost()` members, but all other costs are processed in TTI.

E.g. it is not comfortable to use other TTI members in these two functions
overrided in a target.

Minor refactoring: `getTypeLegalizationCost()` now doesn't need DataLayout
parameter - it was always passed from TTI.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D117723
2022-08-18 00:38:55 +03:00
Sanjay Patel 7f72a0f5bb [SDAG] avoid generating libcall to function with same name
This is a potentially better alternative to D131452 that also
should avoid the infinite loop bug from:
issue #56403

This is again a minimal fix to reduce merging pain for the
release. But if this makes sense, then we might want to guard
all of the RTLIB generation (and other libcalls?) with a
similar name check.

Differential Revision: https://reviews.llvm.org/D131521
2022-08-17 16:19:34 -04:00
Matthias Braun 19ce5e515f RAGreedyStats: Ignore identity COPYs; count COPYs from/to physregs
Improve copy statistics:

- Count copies from or to physical registers: They are used to model function parameters and calling conventions and the register allocator optimizes for them.
- Check physical registers assigned to virtual registers and stop counting "identity" `COPY`s where source and destination is the same physical registers; they will be removed in the `virtregmap` pass anyway.

Differential Revision: https://reviews.llvm.org/D131932
2022-08-17 12:53:29 -07:00
Archit Saxena e170d955fe Split EH code by default
The current machine function splitter is reliant on profile data to do profile summary analysis to split blocks into cold section. This may sometimes limit the usage of machine function splitter especially in cases where we could do some form of static analysis to split out cold blocks if profile data is absent or profile data which may be faulty (Consider Sample PGO).

Of all code that could statically be marked cold Exception handling blocks are one of them (In fact BFI framework also tends to mark them as cold), and the most in size contribution. In my experiments I found out Exception handling pads and all code reachable from there account for up to 6-8% of the .text section on modern production binaries. This patch introduces a flag to split out all Exception handling blocks and blocks only reachable from Exceptional Handling pad to cold section. This flag has shown to give a performance win of up to 0.1% in terms of average cycles and instructions executed on internal facebook search service.

Reviewed By: snehasish

Differential Revision: https://reviews.llvm.org/D131824
2022-08-17 12:40:31 -07:00
Nick Desaulniers 6b0e2fa6f0 [SelectionDAG] make INLINEASM_BR use MachineBasicBlocks instead of BlockAddresses
As part of re-architecting callbr to no longer use blockaddresses
(https://reviews.llvm.org/D129288), we don't really need them in MIR.
They make comparing MachineBasicBlocks of indirect targets during
MachineVerifier a PITA.

Suggested by @efriedma from the discussion:
https://reviews.llvm.org/D130290#3669531

Reviewed By: efriedma, void

Differential Revision: https://reviews.llvm.org/D130316
2022-08-17 09:34:31 -07:00
David Penry 1c9f0408bc Revert "[ModuloSchedule] Add interface call to accept/reject SMS schedules"
This reverts commit 8c4aea438c.

Needed because buildbot failures (warnings) gave a clue that there was
a functional bug in the ARM rejection logic.

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D132037
2022-08-17 09:32:43 -07:00
David Penry 8c4aea438c [ModuloSchedule] Add interface call to accept/reject SMS schedules
This interface allows a target to reject a proposed
SMS schedule.  For Hexagon/PowerPC, all schedules
are accepted, leaving behavior unchanged.  For ARM,
schedules which exceed register pressure limits are
rejected.

Also, two RegisterPressureTracker methods now need to be public so
that register pressure can be computed by more callers.

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D128941
2022-08-17 08:13:26 -07:00
Andre Vieira 49223e0a2d [TypePromotion] Don't promote PHI + ZExt if wider than RegisterBitWidth
Differential Revision: https://reviews.llvm.org/D131966
2022-08-17 09:54:15 +01:00
Eli Friedman cfd2c5ce58 Untangle the mess which is MachineBasicBlock::hasAddressTaken().
There are two different senses in which a block can be "address-taken".
There can be a BlockAddress involved, which means we need to map the
IR-level value to some specific block of machine code.  Or there can be
constructs inside a function which involve using the address of a basic
block to implement certain kinds of control flow.

Mixing these together causes a problem: if target-specific passes are
marking random blocks "address-taken", if we have a BlockAddress, we
can't actually tell which MachineBasicBlock corresponds to the
BlockAddress.

So split this into two separate bits: one for BlockAddress, and one for
the machine-specific bits.

Discovered while trying to sort out related stuff on D102817.

Differential Revision: https://reviews.llvm.org/D124697
2022-08-16 16:15:44 -07:00
Nicolas Miller ccfabfbb1f Fix subrange liveness checking at rematerialization
This patch fixes an issue where an instruction reading a whole register would be moved during register allocation into a spot where one of the subregisters was dead.

The code to check whether an instruction can be rematerialized at a given point or not was already checking for subranges to ensure that subregisters are live, but only when the instruction being moved was using a subregister, this patch changes that so the subranges are checked even when the moved instruction uses the full register.

This patch also adds a case to the original test for the subrange checking that trigger the issue described above.

The original subrange checking code was introduced in this revision: https://reviews.llvm.org/D115278

And I've encountered this issue on AMDGPUs while working with DPC++: https://github.com/intel/llvm/issues/6209

Essentially the greedy register allocator attempts to move the following instruction:

```
%3961:vreg_64 = V_LSHLREV_B64_e64 3, %3078:vreg_64, implicit $exec
```

From `@3440` into the body of a loop `@16312`, but `%3078` has the following live ranges:

```
%3078 [2224r,2240r:0)[2240r,3488B:1)[16192B,38336B:1) 0@2224r 1@2240r  L0000000000000003 [2224r,3440r:0) 0@2224r  L000000000000000C [2240r,3488B:0)[16192B,38336B:0) 0@2240r
```

So `@16312e` `%3078.sub1` is alive but `%3078.sub0` is dead, so this instruction being moved there leads to invalid memory accesses as `3078.sub0` ends up being trashed and the result of this instruction is used as part of an address calculation for a load.

On the original ticket this issue showed up on gfx906 and gfx90a but not on gfx908, this turned out to be because on gfx908 instead of moving the shift instruction into the loop, its value is spilled into an ACC register, gfx906 doesn't have ACC registers and for gfx90a ACC registers are used like regular vector registers and so aren't used for spilling.

With this patch the original application from the DPC++ ticket works properly on gfx906, and the result of the shift instruction is correctly spilled instead of moving the instruction in the loop.

Original Author: npmiller

Reviewed by: rampitec

Submitted by: rampitec

Differential Revision: https://reviews.llvm.org/D131884
2022-08-16 10:50:09 -07:00
Arthur Eubanks 9181ce623f [Windows] Put init_seg(compiler/lib) in llvm.global_ctors
Currently we treat initializers with init_seg(compiler/lib) as similar
to any other init_seg, they simply have a global variable in the proper
section (".CRT$XCC" for compiler/".CRT$XCL" for lib) and are added to
llvm.used. However, this doesn't match with how LLVM sees normal (or
init_seg(user)) initializers via llvm.global_ctors. This
causes issues like incorrect init_seg(compiler) vs init_seg(user)
ordering due to GlobalOpt evaluating constructors, and the
ability to remove init_seg(compiler/lib) initializers at all.

Currently we use 'A' for priorities less than 200. Use 200 for
init_seg(compiler) (".CRT$XCC") and 400 for init_seg(lib) (".CRT$XCL"),
which do not append the priority to the section name. Priorities
between 200 and 400 use ".CRT$XCC${Priority}". This allows for
some wiggle room for people/future extensions that want to add
initializers between compiler and lib.

Fixes #56922

Reviewed By: rnk

Differential Revision: https://reviews.llvm.org/D131910
2022-08-16 08:16:18 -07:00
Steve Merritt ec60fca752 [CodeView] Use non-qualified names for static local variables
Static variables declared within a routine or lexical block should
be emitted with a non-qualified name.  This allows the variables to
be visible to the Visual Studio watch window.

Differential Revision: https://reviews.llvm.org/D131400
2022-08-16 10:33:43 -04:00
Andre Vieira c6b5a13b7a [TypePromotion] Only search for PHI + ZExt promotion of Integers
Differential Revision: https://reviews.llvm.org/D131948
2022-08-16 10:15:32 +01:00
wanglian fbc4c26e9a [SelectionDAG][NFC] Fix return type when used isConstantIntBuildVectorOrConstantInt
and isConstantFPBuildVectorOrConstantFP

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D131870
2022-08-16 10:07:24 +08:00
Rahman Lavaee df2213f345 [EHStreamer] Omit @LPStart when function has no landing pads
When no landing pads exist for a function, `@LPStart` is undefined and must be omitted.

EH table is generally not emitted for functions without landing pads, except when the personality function is uknown (`!isNoOpWithoutInvoke(classifyEHPersonality(Per))`). In that case, we must omit `@LPStart` even when machine function splitting is enabled.

Reviewed By: MaskRay

Differential Revision: https://reviews.llvm.org/D131626
2022-08-15 17:09:46 -07:00
Aiden Grossman 24cdf97d63 [mlgo] Add ability to create feature-gated development features in regalloc advisor
Currently there is no way to add in development features to the ML
regalloc evict advisor which is useful to have when working on feature
engineering/improving the current model. This patch adds in the ability
to add in development features to the ML regalloc evict advisor which
are gated by a runtime flag and not added in at all if not compiled in
LLVM development mode. This sets the stage for future work where we are
planning on upstreaming some of the newer features that we are currently
experimenting with.

Reviewed By: mtrofin

Differential Revision: https://reviews.llvm.org/D131209
2022-08-15 16:01:37 -07:00
David Green dfc95bab07 [DAG] Ensure more Legal BUILD_VECTOR elements types in shuffle->And combine
This is a followup to D131350, which caused another problem for i64
types being split into i32 on i32 targets. This patch tries to make sure
that either Illegal types are OK, or that the element types of a
buildvector are legal and bigger than or equal to the size of the
original elements.

Differential Revision: https://reviews.llvm.org/D131883
2022-08-15 14:41:45 +01:00
Luo, Yuanke 853bb192c4 Revert "(Reland) [fastalloc] Support allocating specific register class in fastalloc"
This reverts commit 30f9e6ebd3.
2022-08-15 20:33:15 +08:00
Ayke van Laethem de48717fcf
[AVR] Support unaligned store
This patch really just extends D39946 towards stores as well as loads.
While the patch is in SelectionDAGBuilder, it only applies to AVR (the
only target that supports unaligned atomic operations).

Differential Revision: https://reviews.llvm.org/D128483
2022-08-15 14:29:37 +02:00
Simon Pilgrim 3a73133217 [DAG] canCreateUndefOrPoison - add freeze(sign_extend_inreg(x,vt)) -> sign_extend_inreg(freeze(x),vt) support
Guaranteed not to create undef/poison
2022-08-15 12:18:59 +01:00
Peter Waller 6e85db7293 [DAGCombine] Combine signext_inreg of extract-extend
The outer signext_inreg is redundant in the following:

  Fold (signext_inreg (extract_subvector (zext|anyext|sext iN_value to _) _) from iN)
       -> (extract_subvector (signext iN_value to iM))

Tests are precommitted and clone those by analogy from the AND case in
the same file. Add a negative test to check extension width is handled
correctly.

This patch supersedes D130700.

Differential Revision: https://reviews.llvm.org/D131503
2022-08-15 10:58:07 +00:00
Simon Pilgrim 7e294e676e [DAG] canCreateUndefOrPoison - add freeze(assertsext/zext(x,bt)) -> assertsext/zext(freeze(x),vt) support
These are guaranteed not to create undef/poison (although they may pass through) - the associated ISD::VALUETYPE node is also guaranteed never to generate poison
2022-08-15 11:13:43 +01:00
Kazu Hirata f5a68feab3 Use llvm::none_of (NFC) 2022-08-14 16:25:39 -07:00
Simon Pilgrim e2d13fd096 [DAG] canCreateUndefOrPoison - add freeze(shl(x,y)) -> shl(freeze(x),y) support
These are guaranteed not to create undef/poison if the shift amount is known to be in range
2022-08-14 14:38:10 +01:00