Commit Graph

32682 Commits

Author SHA1 Message Date
Craig Topper 003b95acf2 [LegalizeTypes] Remove double map lookup in DAGTypeLegalizer::PerformExpensiveChecks. NFC
Remove repeated checks for ResId being 0.
2022-05-21 00:06:59 -07:00
Craig Topper 66875dbcc0 [LegalizeTypes] Use SmallDenseMap::count instead of SmallDenseMap::find. NFC
It's more readable and more efficient.
2022-05-21 00:06:55 -07:00
Shilei Tian ff60a0a364 [LLVM] Add a check if should cast atomic operations to integer type
Currently for atomic load, store, and rmw instructions, as long as the
operand is floating-point value, they are casted to integer. Nowadays many
targets can actually support part of atomic operations with floating-point
operands. For example, NVPTX supports atomic load and store of floating-point
values. This patch adds a series interface functions `shouldCastAtomicXXXInIR`,
and the default implementations are same as what we currently do. Later for
targets can have their specialization.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D125652
2022-05-20 17:23:53 -04:00
Zequan Wu 9886046289 [CodeView] Combine variable def ranges that are continuous.
It saves about 1.13% size for chrome.dll.pdb on chrome official build.

Reviewed By: rnk

Differential Revision: https://reviews.llvm.org/D125721
2022-05-20 12:12:14 -07:00
Craig Topper 8d3894f67e [TypePromotion] Fix another case for sext vs zext in promoted constant.
If the SafeWrap operation is a subtract, we negated the constant
to treat the subtract as an addition. The sext was based on the
operation being addition. So we really need to do (neg (sext (neg C)))
when promoting the constant. This is equivalent to (sext C) for
every value of C except the min signed value. For min signed value
we need to do (zext C) instead.

Fixes PR55490.

Differential Revision: https://reviews.llvm.org/D125653
2022-05-20 09:30:07 -07:00
Ivan Kosarev 86803008ea [MIR] Provide location of extra instruction operand when diagnosing it.
Also resolves misspelled FileCheck directives caught with D125604.

Reviewed By: foad

Differential Revision: https://reviews.llvm.org/D125965
2022-05-20 05:56:25 +01:00
Sotiris Apostolakis ca7c307d18 [SelectOpti][1/5] Setup new select-optimize pass
This is the first commit for the cmov-vs-branch optimization pass.
The goal is to develop a new profile-guided and target-independent cost/benefit analysis
for selecting conditional moves over branches when optimizing for performance.

Initially, this new pass is expected to be enabled only for instrumentation-based PGO.

RFC: https://discourse.llvm.org/t/rfc-cmov-vs-branch-optimization/6040

Reviewed By: tejohnson

Differential Revision: https://reviews.llvm.org/D120230
2022-05-19 16:31:10 +00:00
Jay Foad 6bec3e9303 [APInt] Remove all uses of zextOrSelf, sextOrSelf and truncOrSelf
Most clients only used these methods because they wanted to be able to
extend or truncate to the same bit width (which is a no-op). Now that
the standard zext, sext and trunc allow this, there is no reason to use
the OrSelf versions.

The OrSelf versions additionally have the strange behaviour of allowing
extending to a *smaller* width, or truncating to a *larger* width, which
are also treated as no-ops. A small amount of client code relied on this
(ConstantRange::castOp and MicrosoftCXXNameMangler::mangleNumber) and
needed rewriting.

Differential Revision: https://reviews.llvm.org/D125557
2022-05-19 11:23:13 +01:00
Lian Wang 530bab1f93 [RISCV][SelectionDAG] Support VECREDUCE_ADD mask operation
Re-landed D125206

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D125206
2022-05-19 09:53:33 +00:00
Lian Wang f035068bb3 [LegalizeVectorTypes][VP] Add widen and split support for VP_SETCC
Reviewed By: craig.topper, frasercrmck

Differential Revision: https://reviews.llvm.org/D125446
2022-05-19 07:42:39 +00:00
Lian Wang bbc6834e26 [LegalizeTypes][VP] Add integer promotions support for VP_TRUNCATE
Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D125739
2022-05-19 07:36:10 +00:00
Lian Wang 993070d11f [LegalizeTypes][VP][NFC] Use an if and two returns instead of ?: operator
Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D125858
2022-05-19 07:18:24 +00:00
Jon Roelofs d699e54ca2 Fix an or+and miscompile w/ GlobalISel
Fixes #55284
2022-05-18 19:09:47 -07:00
Matthias Braun 8d03c49f49 Extend switch condition in optimizeSwitchPhiConst when free
In a case like:

    switch((i32)x) { case 42: phi((i64)42, ...); }

replace `(i64)42` with `zext(x)` when we can do so for free.

This fixes a part of https://github.com/llvm/llvm-project/issues/55153

Differential Revision: https://reviews.llvm.org/D124897
2022-05-18 16:23:53 -07:00
Mitch Phillips 7aa1fa0a0a Reland "[dwarf] Emit a DIGlobalVariable for constant strings."
An upcoming patch will extend llvm-symbolizer to provide the source line
information for global variables. The goal is to move AddressSanitizer
off of internal debug info for symbolization onto the DWARF standard
(and doing a clean-up in the process). Currently, ASan reports the line
information for constant strings if a memory safety bug happens around
them. We want to keep this behaviour, so we need to emit debuginfo for
these variables as well.

Reviewed By: dblaikie, rnk, aprantl

Differential Revision: https://reviews.llvm.org/D123534
2022-05-18 13:56:45 -07:00
Michael Kitzan 29bebb0237 [GISel] Add new combines for G_FMINNUM/MAXNUM and G_FMINIMUM/MAXIMUM
I noticed https://reviews.llvm.org/D87415 added SDAG combines to fold
FMIN/MAX instrs with NaNs.

The patch implements the same NaN combines for GISel GMIR FMIN/MAX opcodes:
G_FMINNUM(X, NaN) -> X
G_FMAXNUM(X, NaN) -> X
G_FMINIMUM(X, NaN) -> NaN
G_FMAXIMUM(X, NaN) -> NaN

The patch adds AArch64 tests for these combines as well.

Reviewed by: arsenm

Differential revision: https://reviews.llvm.org/D125819
2022-05-18 12:08:53 -07:00
Yusra Syeda 5ac411aea8 [SystemZ][z/OS] Add the PPA1 to SystemZAsmPrinter
Differential Revision: https://reviews.llvm.org/D125725
2022-05-18 14:13:17 -04:00
Craig Topper 46eef76876 [DAGCombiner] Fix bug in MatchBSwapHWordLow.
This function tries to match (a >> 8) | (a << 8) as (bswap a) >> 16.

If the SRL isn't masked and the high bits aren't demanded, we still
need to ensure that bits 23:16 are zero. After the right shift they
will be in bits 15:8 which is where the important bits from the SHL
end up. It's only a bswap if the OR on bits 15:8 only takes the bits
from the SHL.

Fixes PR55484.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D125641
2022-05-18 09:23:18 -07:00
Yeting Kuo 00999fb6e1 [SelectionDAGBuilder] Pass fast math flags to most of VP SDNodes.
The patch does not pass math flags to float VPCmpIntrinsics because LLParser
could not identify float VPCmpIntrinsics as FPMathOperators.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D125600
2022-05-18 16:15:47 +08:00
Simon Pilgrim d40b7f0d5a [DAG] Fold (shl (srl x, c), c) -> and(x, m) even if srl has other uses
If we're using shift pairs to mask, then relax the one use limit if the shift amounts are equal - we'll only be generating a single AND node.

AArch64 has a couple of regressions due to this, so I've enforced the existing one use limit inside a AArch64TargetLowering::shouldFoldConstantShiftPairToMask callback.

Part of the work to fix the regressions in D77804

Differential Revision: https://reviews.llvm.org/D125607
2022-05-17 13:40:11 +01:00
Jay Foad 77480556c4 [RegAllocGreedy] New hook regClassPriorityTrumpsGlobalness
Add a new TargetRegisterInfo hook to allow targets to tweak the
priority of live ranges, so that AllocationPriority of the register
class will be treated as more important than whether the range is local
to a basic block or global. This is determined per-MachineFunction.

Differential Revision: https://reviews.llvm.org/D125102
2022-05-17 12:35:21 +01:00
jacquesguan 26593e7314 [SelectionDAG] Support more VP reduction mask operation.
This patch uses VP_REDUCE_AND and VP_REDUCE_OR to replace VP_REDUCE_SMAX,VP_REDUCE_SMIN,VP_REDUCE_UMAX and VP_REDUCE_UMIN for mask vector type.

Differential Revision: https://reviews.llvm.org/D125002
2022-05-17 09:14:21 +00:00
Fraser Cormack 599ff247de [StackColoring] Don't merge slots with differing StackIDs
The documentation for this specifically mentions that this should not
happen. We could think about adding target hooks to permit it (and how
to merge IDs) in the future if that is desirable.

This specific test case was merging a scalable-vector slot into a
non-scalable one and dropping the notion of scalability, meaning we
failed to allocate enough stack space for the object.

Reviewed By: arsenm, MaskRay, sdesmalen

Differential Revision: https://reviews.llvm.org/D125699
2022-05-17 08:28:49 +01:00
Mitch Phillips ed2c3218f5 Revert "[dwarf] Emit a DIGlobalVariable for constant strings."
This reverts commit 4680982b36.

Broke a fuchsia windows bot. More details in the review:
https://reviews.llvm.org/D123534
2022-05-16 19:07:38 -07:00
Mitch Phillips 4680982b36 [dwarf] Emit a DIGlobalVariable for constant strings.
An upcoming patch will extend llvm-symbolizer to provide the source line
information for global variables. The goal is to move AddressSanitizer
off of internal debug info for symbolization onto the DWARF standard
(and doing a clean-up in the process). Currently, ASan reports the line
information for constant strings if a memory safety bug happens around
them. We want to keep this behaviour, so we need to emit debuginfo for
these variables as well.

Reviewed By: dblaikie, rnk, aprantl

Differential Revision: https://reviews.llvm.org/D123534
2022-05-16 16:52:16 -07:00
Philip Reames 7dbf2e7b57 Teach PeepholeOpt to eliminate redundant copy from constant physreg (e.g VLENB on RISCV)
The existing redundant copy elimination required a virtual register source, but the same logic works for any physreg where we don't have to worry about clobbers.  On RISCV, this helps eliminate redundant CSR reads from VLENB.

Differential Revision: https://reviews.llvm.org/D125564
2022-05-16 16:38:30 -07:00
Paul Walker 7dd05ba9ed [SelectionDAG] Remove duplicate "is scaled" information from gather/scatter SDNodes.
During early gather/scatter enablement two different approaches
were taken to represent scaled indices:

* A Scale operand whereby byte_offsets = Index * Scale
* An IndexType whereby byte_offsets = Index * sizeof(MemVT.ElementType)

Having multiple representations is bad as shown by this patch which
fixes instances where the two are out of sync. The dedicated scale
operand is more flexible and pervasive so this patch removes the
UNSCALED values from IndexType. This means all indices are scaled
but the scale can be one, hence unscaled. SDNodes now use the scale
operand to answer the "isScaledIndex" question.

I toyed with the idea of keeping the UNSCALED enums and helper
functions but because they will have no uses and force SDNodes to
validate the set of supported values I figured it's best to remove
them. We can re-add them if there's a real need. For similar
reasons I've kept the IndexType enum when a bool could be used as I
think being explicitly looks better.

Depends On D123347

Differential Revision: https://reviews.llvm.org/D123381
2022-05-16 20:47:52 +01:00
Craig Topper 1c4880a2d3 [TargetLowering] Expand the last stage of i16 popcnt using shift+add+and instead of mul+shift.
If we use multiply it would be with 0x0101 which is 1 more than a power
of 2. On some targets we would expand this to shl+add. By avoiding the
multiply earlier, we can generate better code.

Note, PowerPC doesn't do the shl+add expansion of multiply so one of
the tests increased in instruction count.

Limiting to scalars because it almost always increased the number of
instructions in vector tests.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D125638
2022-05-16 09:27:44 -07:00
Craig Topper e6fc8454be [DAGCombiner] Fix incorrect indentation. NFC 2022-05-16 09:27:15 -07:00
Philip Reames 55e2df7285 [LiveIntervals] Add range accessors for value numbers [nfc] 2022-05-16 08:23:12 -07:00
Bradley Smith 7ff5148d64 [DAGCombine] Support splat_vector nodes in (and (extload)) dagcombine
Differential Revision: https://reviews.llvm.org/D125367
2022-05-16 11:25:20 +00:00
Abinav Puthan Purayil 485dd0b752 [GlobalISel] Handle constant splat in funnel shift combine
This change adds the constant splat versions of m_ICst() (by using
getBuildVectorConstantSplat()) and uses it in
matchOrShiftToFunnelShift(). The getBuildVectorConstantSplat() name is
shortened to getIConstantSplatVal() so that the *SExtVal() version would
have a more compact name.

Differential Revision: https://reviews.llvm.org/D125516
2022-05-16 16:03:30 +05:30
Yeting Kuo 26a61ab678 [SelectionDAG] Make getNode which uses single element SDVTList pass SDNodeFlags.
The patch make users not need to know getNode with SDNodeFlags argument may not
pass its flags.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D125659
2022-05-16 18:19:46 +08:00
Denis Antrushin 8903dbef8f [StatepointLowering] Properly handle local and non-local relocates of the same value.
FunctionLoweringInfo::StatepointRelocationMaps map is used to pass GC pointer
lowering information from statepoint to gc.relocate  which may appear ini
different block.
D124444 introduced different lowering for local and non-local relocates.
Local relocates use SDValue and non-local relocates use value exported to VReg.
But I overlooked the fact that StatepointRelocationMap is indexed not by
GCRelocate instruction, but by derived pointer. This works incorrectly when
we have two relocates (one local and another non-local) of the same value,
because they need different relocation records.

This patch fixes the problem by recording relocation information per relocate
instruction, not per derived pointer. This way, each gc.relocate can be lowered
differently.

Reviewed By: skatkov

Differential Revision: https://reviews.llvm.org/D125538
2022-05-16 17:02:34 +07:00
Nikita Popov 05c3fe075d [FastISel] Fix load folding for registers with fixups
FastISel tries to fold loads into the single using instruction.
However, if the register has fixups, then there may be additional
uses through an alias of the register.

In particular, this fixes the problem reported at
https://reviews.llvm.org/D119432#3507087. The load register is
(at the time of load folding) only used in a single call instruction.
However, selection of the bitcast has added a fixup between the
load register and the cross-BB register of the bitcast result.
After fixups are applied, there would now be two uses of the load
register, so load folding is not legal.

Differential Revision: https://reviews.llvm.org/D125459
2022-05-16 10:25:25 +02:00
Craig Topper b4ad450953 [TargetLowering] expandCTPOP don't create an used constant mask for i8 ctpop. NFC
Use early out for the i8 case.

I'm looking at avoiding MUL on targets that use libcalls for MUL.
So doing a little pre-refactoring.
2022-05-14 20:35:38 -07:00
Simon Pilgrim f4eac6e5f6 [DAG] visitOR - merge isa/cast<ShuffleVectorSDNode> into dyn_cast<ShuffleVectorSDNode>. NFC.
Also, initialize entire mask to -1 to simplify undefined cases.
2022-05-14 20:49:26 +01:00
Simon Pilgrim 95cdd63b87 [DAG] visitADDLike - use SelectionDAG::FoldConstantArithmetic directly to match constant operands
SelectionDAG::FoldConstantArithmetic determines if operands are foldable constants, so we don't need to bother with isConstantOrConstantVector / Opaque tests before calling it directly.
2022-05-14 18:39:41 +01:00
Simon Pilgrim 8db72d9d04 [DAG] visitMUL - pull out repeated SDLoc() calls. NFC. 2022-05-14 14:28:39 +01:00
Simon Pilgrim 8d4d4988e4 [DAG] Use SelectionDAG::FoldConstantArithmetic directly to match constant operands
SelectionDAG::FoldConstantArithmetic determines if operands are foldable constants, so we don't need to bother with isConstantOrConstantVector / Opaque tests before calling it directly.
2022-05-14 14:19:12 +01:00
Simon Pilgrim 1ecc3d86ae [DAG] Enable ISD::SHL SimplifyMultipleUseDemandedBits handling inside SimplifyDemandedBits
Pulled out of D77804 as its going to be easier to address the regressions individually.

This patch allows SimplifyDemandedBits to call SimplifyMultipleUseDemandedBits in cases where the source operand has other uses, enabling us to peek through the shifted value if we don't demand all the bits/elts.

The lost RISCV gorc2 fold shouldn't be a problem - instcombine would have already destroyed that pattern - see https://github.com/llvm/llvm-project/issues/50553

Differential Revision: https://reviews.llvm.org/D124839
2022-05-14 09:50:01 +01:00
Eli Friedman 96c2a0c9ff [GlobalIsel] Fix fallback if stack protector isn't supported.
When GlobalISel fails, we need to report the error, and we need to set
the FailedISel property.  We skipped those steps if stack protector
insertion failed, which led to a very strange miscompile.

Differential Revision: https://reviews.llvm.org/D125584
2022-05-13 14:17:27 -07:00
Simon Pilgrim 3fc33ced10 DAGCombiner.cpp - break if-else chains that always return (style) 2022-05-13 18:31:39 +01:00
Sanjay Patel e52e1dab2a [SDAG] freeze operand when expanging urem
This is a potential miscompile as discussed in issue #55291.

The related IR transform was patched with:
d428f09b2c
2022-05-13 10:55:14 -04:00
Nikita Popov ed1cb01baf [IRBuilder] Add IsInBounds parameter to CreateGEP()
We commonly want to create either an inbounds or non-inbounds GEP
based on a boolean value, e.g. when preserving inbounds from
existing GEPs. Directly accept such a boolean in the API, rather
than requiring a ternary between CreateGEP and CreateInBoundsGEP.

This change is not entirely NFC, because we now preserve an
inbounds flag in a constant expression edge-case in InstCombine.
2022-05-13 14:30:55 +02:00
Sam Parker 6d53d35efd [TypePromotion] Avoid some unnecessary truncs
Recommit.

Check for legal zext 'sinks' before inserting a trunc.

Differential Revision: https://reviews.llvm.org/D115451
2022-05-13 09:45:20 +01:00
Jay Foad 26e1ebd3ea [GlobalISel] Change ConstantFoldVectorBinop to return vector of APInt
Previously it built MIR for the results and returned a Register.

This avoids building constants for earlier elements of the vector if
later elements will fail to fold, and allows CSEMIRBuilder::buildInstr
to avoid unconditionally building a copy from the result.

Use a new helper function MachineIRBuilder::buildBuildVectorConstant
to build a G_BUILD_VECTOR of G_CONSTANTs.

Differential Revision: https://reviews.llvm.org/D117758
2022-05-13 09:33:07 +01:00
Lian Wang 693758b282 [LegalizeTypes][VP] Add integer promotion support for vp.setcc
Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D125453
2022-05-13 06:25:13 +00:00
Lian Wang 8050ba6678 [LegalizeTypes][VP] Add integer promotion support for vp.merge
Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D125452
2022-05-13 03:28:29 +00:00
Craig Topper cec249c60d [TypePromotion] Promote undef by converting to 0.
If we're promoting an undef I think that means that we expect the
upper bits are zero. undef doesn't guarantee that.

This patch replaces undef with 0 to ensure this. This matches how
a zext or sext of undef would be folded by InstCombine/InstSimplify.

I haven't found a failure from this was just thinking through the code.

Differential Revision: https://reviews.llvm.org/D123174
2022-05-12 09:09:24 -07:00
Fraser Cormack 1106bc208c [CodeGen][NFC] Move some comments from the end of lines to above them
This avoids wrapping the line itself awkwardly when it exceeds 80 chars.
It also better matches our style most other places.
2022-05-12 15:45:04 +01:00
Jeremy Morse a975472fa6 [DebugInfo][InstrRef] Describe value sizes when spilt to stack
This is a re-apply of D123599, which was reverted in 4fe2ab5279, now
with a more appropriate assertion. Original commit message follow:

InstrRefBasedLDV can track and describe variable values that are spilt to
the stack -- however it does not current describe the size of the value on
the stack. This can cause uninitialized bytes to be read from the stack if
a small register is spilt for a larger variable, or theoretically on
big-endian machines if a large value on the stack is used for a small
variable.

Fix this by using DW_OP_deref_size to specify the amount of data to load
from the stack, if there's any possibility for ambiguity. There are a few
scenarios where this can be omitted (such as when using DW_OP_piece and a
non-DW_OP_stack_value location), see deref-spills-with-size.mir for an
explicit table of inputs flavours and output expressions.

Differential Revision: https://reviews.llvm.org/D123599
2022-05-12 15:52:55 +01:00
Nikita Popov 50f846d634 [FastISel] Add some debug output (NFC)
Print a debug message when aborting isel (next to the ORE report)
and when folding a load.
2022-05-12 12:25:20 +02:00
Lian Wang 9176096c86 [LegalizeVectorTypes] Enable WidenVecRes_SETCC work for scalable vector.
Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D125359
2022-05-12 02:52:43 +00:00
Craig Topper edbf390d10 [CodeGenPrepare] Use const reference to avoid unnecessary APInt copy. NFC
Spotted while looking at Matthias' patches.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D124985
2022-05-11 12:06:45 -07:00
Matthias Braun de9ad98d2d Fix endless loop in optimizePhiConst with integer constant switch condition
Avoid endless loop in degenerate case with an integer constant as switch
condition as reported in https://reviews.llvm.org/D124552
2022-05-11 08:49:01 -07:00
David Green 5feeceddb2 [TypePromotion] Fix sext vs zext in promoted constant
As pointed out in #55342, given non-canonical IR with multiple
constants, we check the second operand in isSafeWrap, but can promote
both with sext. Fix that as suggested by @craig.topper by ensuring we
only extend the second constant if multiple are present.

Fixes #55342

Differential Revision: https://reviews.llvm.org/D125294
2022-05-11 10:47:44 +01:00
David Green 764a7f4864 [TypePromotion] Format Type Promotion. NFC
This clang-formats the TypePromotion code, with the only meaningful
change being the removal of a verifyFunction call inside a LLVM_DEBUG,
and the printing of the entire function which can be better handled
via -print-after-all.
2022-05-11 08:18:58 +01:00
Xiang1 Zhang 2ea8f203cd [CodeGen] Fix ConvertNodeToLibcall for STRICT_FPOWI
Reviewed By: PengfeiWang

Differential Revision: https://reviews.llvm.org/D125159
2022-05-11 08:58:06 +08:00
Matthias Braun f0ea9c9cec CodeGenPrepare: Replace constant PHI arguments with switch condition value
We often see code like the following after running SCCP:

    switch (x) { case 42: phi(42, ...); }

This tends to produce bad code as we currently materialize the constant
phi-argument in the switch-block. This increases register pressure and
if the pattern repeats for `n` case statements, we end up generating `n`
constant values.

This changes CodeGenPrepare to catch this pattern and revert it back to:

    switch (x) { case 42: phi(x, ...); }

Differential Revision: https://reviews.llvm.org/D124552
2022-05-10 10:00:10 -07:00
Matthias Braun cd19af74c0 Avoid 8 and 16bit switch conditions on x86
This adds a `TargetLoweringBase::getSwitchConditionType` callback to
give targets a chance to control the type used in
`CodeGenPrepare::optimizeSwitchInst`.

Implement callback for X86 to avoid i8 and i16 types where possible as
they often incur extra zero-extensions.

This is NFC for non-X86 targets.

Differential Revision: https://reviews.llvm.org/D124894
2022-05-10 10:00:10 -07:00
Lian Wang f14a1f26ad Revert "[RISCV][SelectionDAG] Support VECREDUCE_ADD mask operation"
This patch make CodeGen/test/AArch64/vecreduce-add-legalization.ll fail.

This reverts commit 17a8a1bb71.
2022-05-10 09:25:25 +00:00
Lian Wang 17a8a1bb71 [RISCV][SelectionDAG] Support VECREDUCE_ADD mask operation
Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D125206
2022-05-10 08:52:48 +00:00
Mircea Trofin c35ad9ee4f [mlgo] Support exposing more features than those supported by models
This allows the compiler to support more features than those supported by a
model. The only requirement (development mode only) is that the new
features must be appended at the end of the list of features requested
from the model. The support is transparent to compiler code: for
unsupported features, we provide a valid buffer to copy their values;
it's just that this buffer is disconnected from the model, so insofar
as the model is concerned (AOT or development mode), these features don't
exist. The buffers are allocated at setup - meaning, at steady state,
there is no extra allocation (maintaining the current invariant). These
buffers has 2 roles: one, keep the compiler code simple. Second, allow
logging their values in development mode. The latter allows retraining
a model supporting the larger feature set starting from traces produced
with the old model.

For release mode (AOT-ed models), this decouples compiler evolution from
model evolution, which we want in scenarios where the toolchain is
frequently rebuilt and redeployed: we can first deploy the new features,
and continue working with the older model, until a new model is made
available, which can then be picked up the next time the compiler is built.

Differential Revision: https://reviews.llvm.org/D124565
2022-05-09 18:01:21 -07:00
David Green 2cfb243bcd [DAG] Use isAnyConstantBuildVector. NFC
As suggested from 02f8519502, this uses the
isAnyConstantBuildVector method in lieu of separate
isBuildVectorOfConstantSDNodes calls. It should
otherwise be an NFC.
2022-05-09 14:13:03 +01:00
David Green 02f8519502 [DAG] Prevent infinite loop combining bitcast shuffle
This prevents an infinite loop from D123801, where code trying to reduce
the total number of bitcasts, but also handling constants, could create
the opposite transform. Prevent the transform in these case to let the
bitcast of a constant transform naturally.

Fixes #55345
2022-05-09 09:36:22 +01:00
Simon Pilgrim 800d36cf32 [DAG] Only perform the fold (A-B)+(C-D) --> (A+C)-(B+D) when both inner subs have one use
Fixes #51381
2022-05-08 13:51:58 +01:00
Craig Topper b81bf7bb2f [LegalizeTypes] Make use of SelectionDAG::getShiftAmountConstant. NFC
Instead of calling getShiftAmountTy and getConstant separately.
2022-05-07 12:16:53 -07:00
Craig Topper 00bfaba997 [LegalizeTypes] Don't assume fshl/fshr shift amount type matches the other operands.
Like other shifts, the type isn't required to match. We shouldn't
assume we can call ZExtPromotedInteger.

I tested the PromoteIntOp_FunnelShift locally by removing the promotion
of the shift amount from PromoteIntRes_FunnelShift. But with the final
version of this patch it is never executed on any tests.

Differential Revision: https://reviews.llvm.org/D125106
2022-05-07 11:44:07 -07:00
Amaury Séchet 06fad8bc05 [DAGCombine] Add node in the worklist in topological order in CombineTo
This is part of an ongoing effort toward making DAGCombine process the nodes in topological order.

This is able to discover a couple of new optimizations, but also causes a couple of regression. I nevertheless chose to submit this patch for review as to start the discussion with people working on the backend so we can find a good way forward.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D124743
2022-05-07 16:24:31 +00:00
Paul Walker 702c4ade22 [ISD::IndexType] Helper functions for common queries.
Add helper functions to query the signed and scaled properties
of ISD::IndexType along with functions to change them.

Remove setIndexType from MaskedGatherSDNode because it only has
one usage and typically should only be changed alongside its
index operand.

Minimise the direct use of the enum values to lay the groundwork
for more refactoring.

Differential Revision: https://reviews.llvm.org/D123347
2022-05-07 11:23:42 +01:00
David Green 5930691ee1 Revert "[DAGCombine] Make combineShuffleOfBitcast LittleEndian specific"
This reverts commit 891c3cf99e as it turns
out that the error was not caused by this commit, the error caming
from D124526 instead.
2022-05-06 21:03:22 +01:00
David Green 891c3cf99e [DAGCombine] Make combineShuffleOfBitcast LittleEndian specific
Something is going wrong with the BigEndian PowerPC bot. It is hard to
tell what is wrong from here, but attempt to fix it by disabling the
combineShuffleOfBitcast combine for bigendian.
2022-05-06 18:42:44 +01:00
Craig Topper 76f90a9d71 [SelectionDAG] Clear promoted bits before UREM on shift amount in PromoteIntRes_FunnelShift.
Otherwise we have garbage in the upper bits that can affect the
results of the UREM.

Fixes PR55296.

Differential Revision: https://reviews.llvm.org/D125076
2022-05-06 09:26:30 -07:00
Simon Pilgrim c0bebc12f0 [DAG] visitREM - merge buildOptimizedSREM into if(). NFCI. 2022-05-06 15:39:17 +01:00
David Green 115c188807 [DAG][PowerPC] Combine shuffle(bitcast(X), Mask) to bitcast(shuffle(X, Mask'))
If the mask is made up of elements that form a mask in the higher type
we can convert shuffle(bitcast into the bitcast type, simplifying the
instruction sequence. A v4i32 2,3,0,1 for example can be treated as a
1,0 v2i64 shuffle. This helps clean up some of the AArch64 concat load
combines, along with helping simplify a number of other tests.

The PowerPC combine for v16i8 splat vector loads needed some fixes to
keep it working for v16i8 vectors. This improves the handling of v2i64
shuffles to match too, hopefully improving them in general.

Differential Revision: https://reviews.llvm.org/D123801
2022-05-06 10:50:31 +01:00
Lian Wang fb0d636f28 [RISCV][SelectionDAG] Support VP_REDUCE_ADD mask operation.
Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D124986
2022-05-06 01:49:21 +00:00
Craig Topper 5140e0d219 [SelectionDAGISel] Add back a comment to MergeInputChains handling. NFC
This comment used to exist, but was lost in a refactor over 10 years
ago, but still seems relevant and improves readability.
2022-05-05 12:59:21 -07:00
Craig Topper 084f967370 [SelectionDAG] Constant fold (sext_inreg undef, VT) to 0 instead of undef.
The result of sign_extend_inreg needs to have as many sign bits
as requested by the VT argument. The easiest way to guarantee this
is to fold it to 0.

SystemZ test was modified to avoid using undef.

Fixes https://github.com/llvm/llvm-project/issues/55178

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D124696
2022-05-05 09:45:35 -07:00
Craig Topper 4e2d1a6c18 [DAGCombiner] Fold (sext/zext undef) -> 0 and aext(undef) -> undef.
Differential Revision: https://reviews.llvm.org/D124988
2022-05-05 09:34:18 -07:00
Craig Topper fd13192aa5 [DAGCombiner] Fold (max/min X, X) -> X.
Differential Revision: https://reviews.llvm.org/D124951
2022-05-05 09:34:17 -07:00
Brian Tracy 87a55137e2 Fix "the the" typo in documentation and user facing strings
There are many more instances of this pattern, but I chose to limit this change to .rst files (docs), anything in libcxx/include, and string literals. These have the highest chance of being seen by end users.

Reviewed By: #libc, Mordante, martong, ldionne

Differential Revision: https://reviews.llvm.org/D124708
2022-05-05 17:52:08 +02:00
Thomas Preud'homme 68dee83923 [MachinePipeliner] Fix unscheduled instruction
Prior to ordering instructions to be scheduled, the machine pipeliner
update recurrence node sets in groupRemainingNodes() by adding in a
given node set any node on the dependency path from a node set with
higher priority to the given node set. The function computePath() that
determine what constitutes a path follows artificial dependencies.

However, when ordering the nodes in the resulting node sets,
computeNodeOrder() calls ignoreDependence when looking at dependencies
which ignores artificial dependencies. This can cause a node not to be
scheduled which then causes wrong code generation and in the case of a
debug build will lead to an assert failure in generatePhis() in
ModuloScheduler.cpp.

This commit adds calls to ignoreDependence() in computePath() to not add
any node in groupRemainingNodes() that would not be ordered by
computeNodeOrder().

Reviewed By: sgundapa

Differential Revision: https://reviews.llvm.org/D124267
2022-05-05 16:01:41 +01:00
Xing Xue e5926906eb [XCOFF][AIX] Use unique section names for LSDA and EH info sections with -ffunction-sections
Summary:
When -ffunction-sections is on, this patch makes the compiler to generate unique LSDA and EH info sections for functions on AIX by appending the function name to the section name as a suffix. This will allow the AIX linker to garbage-collect unused function.

Reviewed by: MaskRay, hubert.reinterpretcast

Differential Revision: https://reviews.llvm.org/D124855
2022-05-05 09:01:36 -04:00
Jay Foad 9ebbe25034 RegAllocGreedy: Common up part of the priority calculation. NFC. 2022-05-05 10:35:33 +01:00
Nikita Popov 9678936f18 [DAGCombine] Fold (X & ~Y) | Y with truncated not
This extends the (X & ~Y) | Y to X | Y fold to also work if ~Y is
a truncated not (when taking into account the mask X). This is
done by exporting the infrastructure added in D124856 and reusing
it here.

I've retained the old value of AllowUndefs=false, though probably
this can be switched to true with extra test coverage.

Differential Revision: https://reviews.llvm.org/D124930
2022-05-05 11:10:11 +02:00
Craig Topper 572dfef1db [SelectionDAG] Use llvm::any_of to simplify a loop. NFC 2022-05-04 19:09:06 -07:00
Nikita Popov 451bc723ae [SDAG] Handle truncated not in haveNoCommonBitsSet()
Demanded bits analysis may replace a full-width not with a
any_extend (not (truncate X)) pattern. This patch looks through
this kind of pattern in haveNoCommonBitsSet(). Of course, we can
only do this if we only need negated bits in the non-extended part,
as the other bits may now be arbitrary. For example, if we have
haveNoCommonBitsSet(~X & Y, X) then ~X only needs to actually
negate bits set in Y.

This is only a partial solution to the problem in that it allows
add -> or conversion, but the resulting or doesn't get folded yet.
(I guess that will involve exposing getBitwiseNotOperand() as a
more general helper and using that in the relevant transform.)

Differential Revision: https://reviews.llvm.org/D124856
2022-05-04 15:30:44 +02:00
serge-sans-paille 7030654296 [iwyu] Handle regressions in libLLVM header include
Running iwyu-diff on LLVM codebase since fa5a4e1b95 detected a few
regressions, fixing them.

Differential Revision: https://reviews.llvm.org/D124847
2022-05-04 08:32:38 +02:00
Luo, Yuanke 764676b737 [fastregalloc] Fix bug when undef value is tied to def.
If the tied use is undef value, fastregalloc should free the def
register. There is no reload needed for the undef value.

Reviewed By: MatzeB

Differential Revision: https://reviews.llvm.org/D124834
2022-05-04 12:12:55 +08:00
Jon Roelofs e1c808b36e Fix zero-width bitfield extracts to emit 0
Fixes #55129
2022-05-03 14:46:42 -07:00
Simon Pilgrim faa35fc873 [DAG] Fix issue with rot(rot(x,c1),c2) -> rot(x,c1+c2) fold with unnormalized rotation amounts
Don't assume the rotation amounts have been correctly normalized - do it as part of the constant folding.

Also, the normalization should be performed with UREM not SREM.
2022-05-03 17:16:26 +01:00
Nikita Popov 2171a896ed [SDAG] Handle A and B&~A in haveNoCommonBitsSet()
This is the DAG variant of D124763. The code already handles the
general pattern, but not this degenerate case.

This allows folding A + (B&~A) to A | (B&~A) which further holds
to A | B.

Handling on the SDAG level is needed because in the motivating
case the add is actually a getelementptr, which only gets converted
into an add on the SDAG level. However, this patch is not quite
sufficient to handle the getelementptr case yet, because of an
interfering demanded bits simplification.

Differential Revision: https://reviews.llvm.org/D124772
2022-05-03 15:47:02 +02:00
Nikita Popov e0892614b1 [SDAG] Extract commutative helper from haveNoCommonBitsSet() (NFC)
To make it easier to add additional patterns, which will generally
want to handle commuted top-level operands.
2022-05-03 12:28:35 +02:00
Jeremy Morse 1d712c3818 [DebugInfo][InstrRef] Don't generate redundant DBG_PHIs
In SelectionDAG, DBG_PHI instructions are created to "read" physreg values
and give them an instruction number, when they can't be traced back to a
defining instruction. The most common scenario if arguments to a function.
Unfortunately, if you have 100 inlined methods, each of which has the same
"this" pointer, then the 100 dbg.value instructions become 100
DBG_INSTR_REFs plus 100 DBG_PHIs, where only one DBG_PHI would suffice.

This patch adds a vreg cache for MachienFunction::salvageCopySSA, if we've
already traced a value back to the start of a block and created a DBG_PHI
then it allows us to re-use the DBG_PHI, as well as reducing work.

Differential Revision: https://reviews.llvm.org/D124517
2022-05-03 09:56:12 +01:00
David Green 6f81903e89 [LV][SLP] Mark fptosi_sat as vectorizable
This adds fptosi_sat and fptoui_sat to the list of trivially
vectorizable functions, mainly so that the loop vectorizer can vectorize
the instruction. Marking them as trivially vectorizable also allows them
to be SLP vectorized, and Scalarized.

The signature of a fptosi_sat requires two type overrides
(@llvm.fptosi.sat.v2i32.v2f32), unlike other intrinsics that often only
take a single. This patch alters hasVectorInstrinsicOverloadedScalarOpd
to isVectorIntrinsicWithOverloadTypeAtArg, so that it can mark the first
operand of the intrinsic as a overloaded (but not scalar) operand.

Differential Revision: https://reviews.llvm.org/D124358
2022-05-03 09:32:34 +01:00
Hsiangkai Wang eaaa31ff2c [RISCV][TargetLowering] Special case overflow expansion for (uaddo X, C).
Follow-up to D122933.

Differential Revision: https://reviews.llvm.org/D124374
2022-05-03 03:51:36 +00:00
Craig Topper 5f057eaa0d [DAGCombiner] reassociationCanBreakAddressingModePattern should check uses of the outer add.
When looking for memory uses,
reassociationCanBreakAddressingModePattern should check uses of
the outer ADD rather than the inner ADD. We want to know if the
two ops we're reassociating are used by a load/store.

In practice, the existing check usually works because CodeGenPrepare
will make one of the load/stores have an offset of 0 relative to
split GEP. That will make the inner add have a memory use.

To test this, I've manually split the GEPs so there is no 0 offset
store.

This issue was recently discussed in the original review D60294.

Reviewed By: luismarques

Differential Revision: https://reviews.llvm.org/D124644
2022-05-02 16:38:53 -07:00
Sanjay Patel 747c6a0c73 [SDAG] fix miscompile when casting int->FP->int
This is the codegen equivalent of D124692.

As shown in https://github.com/llvm/llvm-project/issues/55150 -
the existing fold may be wrong when converting to a signed value.
This is a quick fix to avoid the miscompile.
https://alive2.llvm.org/ce/z/KtaDmd

Differential Revision: https://reviews.llvm.org/D124771
2022-05-02 14:57:27 -04:00
Simon Pilgrim ae8b10e543 [DAG] (style) Break apart if-else chain as they all return 2022-05-01 17:56:59 +01:00
Paul Walker f10a8f6752 [LegalizeDAG] Fix TypeSize conversion error when expanding SIGN_EXTEND_INREG
SIGN_EXTEND_INREG expansion can trigger a TypeSize error because
"VT.getSizeInBits() == 1" is used to detect for a boolean without
first verifying VT is a scalar.
2022-04-30 19:21:48 +01:00
Craig Topper 6affe87bda [DAGCombiner] When matching a disguised rotate by constant don't forget to apply LHSMask/RHSMask.
We try to match as a disguised rotate by constant of these forms
(shl (X | Y), C1) | (srl X, C2) --> (rotl X, C1) | (shl Y, C1)
(shl X, C1) | (srl (X | Y), C2) --> (rotl X, C1) | (srl Y, C2)

We may have also looked through an AND to find the shift. If we
did, we need to apply a mask to the result.

I'll add an AArch64 test and pre-commit it and the RISC-V test
tomorrow.

Fixes PR55201.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D124711
2022-04-30 11:02:30 -07:00
David Penry dcb77643e3 Reapply [CodeGen][ARM] Enable Swing Module Scheduling for ARM
Fixed "private field is not used" warning when compiled
with clang.

original commit: 28d09bbbc3
reverted in: fa49021c68

------

This patch permits Swing Modulo Scheduling for ARM targets
turns it on by default for the Cortex-M7.  The t2Bcc
instruction is recognized as a loop-ending branch.

MachinePipeliner is extended by adding support for
"unpipelineable" instructions.  These instructions are
those which contribute to the loop exit test; in the SMS
papers they are removed before creating the dependence graph
and then inserted into the final schedule of the kernel and
prologues. Support for these instructions was not previously
necessary because current targets supporting SMS have only
supported it for hardware loop branches, which have no
loop-exit-contributing instructions in the loop body.

The current structure of the MachinePipeliner makes it difficult
to remove/exclude these instructions from the dependence graph.
Therefore, this patch leaves them in the graph, but adds a
"normalization" method which moves them in the schedule to
stage 0, which causes them to appear properly in kernel and
prologues.

It was also necessary to be more careful about boundary nodes
when iterating across successors in the dependence graph because
the loop exit branch is now a non-artificial successor to
instructions in the graph. In additional, schedules with physical
use/def pairs in the same cycle should be treated as creating an
invalid schedule because the scheduling logic doesn't respect
physical register dependence once scheduled to the same cycle.

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D122672
2022-04-29 10:54:39 -07:00
Paul Walker 23c509754d [DAGCombiner] Stop invalid sign conversion in refineIndexType.
When looking through extends of gather/scatter indices it's safe
to convert a known positive signed index to unsigned, but unsigned
indices must remain unsigned.

Depends On D123318

Differential Revision: https://reviews.llvm.org/D123326
2022-04-29 14:20:13 +01:00
Nikita Popov 027c728f29 [SelectionDAGBuilder] Don't create MGATHER/MSCATTER with Scale != ElemSize
This is an alternative to D124530. In getUniformBase() only create
scales that match the gather/scatter element size. If targets also
support other scales, then they can produce those scales in target
DAG combines. This is what X86 already does (as long as the
resulting scale would be 1, 2, 4 or 8).

This essentially restores the pre-opaque-pointer state of things.

Fixes https://github.com/llvm/llvm-project/issues/55021.

Differential Revision: https://reviews.llvm.org/D124605
2022-04-29 14:57:53 +02:00
Paul Walker 7a0b897e86 [DAGCombiner][SVE] Ensure MGATHER/MSCATTER addressing mode combines preserve index scaling
refineUniformBase and selectGatherScatterAddrMode both attempt the
transformation:

  base(0) + index(A+splat(B)) => base(B) + index(A)

However, this is only safe when index is not implicitly scaled.

Differential Revision: https://reviews.llvm.org/D123222
2022-04-29 12:35:16 +01:00
Serge Pavlov 9fc58f1820 [PowerPC] Support of ppc_fp128 in lowering of llvm.is_fpclass
PowerPC supports `ppc_fp128`, which is not an IEEE floating point
type. The generic lowering of llvm.is_fpclass could not handle it
properly. This change extends the generic lowering code to
support `ppc_fp128`.

The change was tested on emulator using runtime tests from
https://reviews.llvm.org/D112933 and the patch for clang
https://reviews.llvm.org/D112932.

Differential Revision: https://reviews.llvm.org/D113908
2022-04-29 11:10:47 +07:00
Zequan Wu 4fe2ab5279 Revert "[DebugInfo][InstrRef] Describe value sizes when spilt to stack"
This reverts commit a15b66e76d.

This causes linker to crash at assertion: `Assertion failed: !Expr->isComplex(), file C:\b\s\w\ir\cache\builder\src\third_party\llvm\llvm\lib\CodeGen\LiveDebugValues\InstrRefBasedImpl.cpp, line 907`.
2022-04-28 16:18:16 -07:00
David Penry fa49021c68 Revert "[CodeGen][ARM] Enable Swing Module Scheduling for ARM"
This reverts commit 28d09bbbc3
while I investigate a buildbot failure.
2022-04-28 13:29:27 -07:00
David Penry 28d09bbbc3 [CodeGen][ARM] Enable Swing Module Scheduling for ARM
This patch permits Swing Modulo Scheduling for ARM targets
turns it on by default for the Cortex-M7.  The t2Bcc
instruction is recognized as a loop-ending branch.

MachinePipeliner is extended by adding support for
"unpipelineable" instructions.  These instructions are
those which contribute to the loop exit test; in the SMS
papers they are removed before creating the dependence graph
and then inserted into the final schedule of the kernel and
prologues. Support for these instructions was not previously
necessary because current targets supporting SMS have only
supported it for hardware loop branches, which have no
loop-exit-contributing instructions in the loop body.

The current structure of the MachinePipeliner makes it difficult
to remove/exclude these instructions from the dependence graph.
Therefore, this patch leaves them in the graph, but adds a
"normalization" method which moves them in the schedule to
stage 0, which causes them to appear properly in kernel and
prologues.

It was also necessary to be more careful about boundary nodes
when iterating across successors in the dependence graph because
the loop exit branch is now a non-artificial successor to
instructions in the graph. In additional, schedules with physical
use/def pairs in the same cycle should be treated as creating an
invalid schedule because the scheduling logic doesn't respect
physical register dependence once scheduled to the same cycle.

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D122672
2022-04-28 13:01:18 -07:00
Alexey Bataev 75e1cf4a6a [COST]Improve cost model for shuffles in SLP.
Introduced masks where they are not added and improved target dependent
cost models to avoid returning of the incorrect cost results after
adding masks.

Differential Revision: https://reviews.llvm.org/D100486
2022-04-28 10:04:41 -07:00
Bjorn Pettersson 3a39bb96ca [SelectionDAG] Use correct boolean representation in FoldConstantArithmetic
The description of SETCC says
  /// SetCC operator - This evaluates to a true value iff the condition is
  /// true.  If the result value type is not i1 then the high bits conform
  /// to getBooleanContents.

Without this patch, we sign extended the i1 to the used larger type
regardless of getBooleanContents. This resulted in miscompiles, as
shown in the attached testcase that ended up returning -1 instead of
1 when using -mattr=+v.

Fixes https://github.com/llvm/llvm-project/issues/55168

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D124618
2022-04-28 18:42:16 +02:00
Alexey Bataev 9861ca0c23 Revert "[COST]Improve cost model for shuffles in SLP."
This reverts commit 29a470e380 to fix
a crash reported in https://reviews.llvm.org/D100486#3479989.
2022-04-28 08:11:56 -07:00
Matt Arsenault 7762a3ce18 Revert "BranchFolder: Assert on SSA functions"
This reverts commit 6ff91d17d6.
2022-04-27 19:02:15 -04:00
Matt Arsenault 6ff91d17d6 BranchFolder: Assert on SSA functions
We probably should have the opposite of getRequiredProperties for this
2022-04-27 18:51:37 -04:00
Bill Wendling 8f2ec974d1 [X86] Move target-generic code into CodeGen [NFC]
This code is the same for all platforms.

Differential Revision: https://reviews.llvm.org/D124566
2022-04-27 15:37:28 -07:00
Matt Arsenault 7c2db66632 llvm-reduce: Support multiple MachineFunctions
The current testcase I'm trying to reduce only reproduces with IPRA
enabled and requires handling multiple functions.

The only real difference vs. the IR is the extra indirect to look for
the underlying MachineFunction, so treat the ReduceWorkItem as the
module instead of the function.

The ugliest piece of this is really the ugliness of
MachineModuleInfo. It not only tracks actual module state, but has a
number of transient fields used for isel and/or the asm printer. These
shouldn't do any harm for the use here, though they should be
separated out.
2022-04-27 18:11:59 -04:00
Alexey Bataev 29a470e380 [COST]Improve cost model for shuffles in SLP.
Introduced masks where they are not added and improved target dependent
cost models to avoid returning of the incorrect cost results after
adding masks.

Differential Revision: https://reviews.llvm.org/D100486
2022-04-27 10:56:26 -07:00
Denis Antrushin 4059770af5 [StatepointLowering] Only export STATEPOINT results if used in nonlocal blocks.
Cuurently we always export STATEPOINT results (GC pointers lowered via VRegs)
to virtual registers. When processing gc.relocate instructions we have to
generate CopyFromRegs node and then export it to VReg again if gc.relocate
is used in other basic blocks. This results in generation of extra COPY MIR
instruction if statepoint and its gc.relocate are in the same BB, but gc.relocate
result is used in other blocks.

This patch changes this behavior to export statepoint results only if used
in other basic blocks. For local uses StatepointLoweringState.(get|set)Location()
API is used to communicate appropriate statepoint result from `LowerStatepoint()`
to `visitGCRelocate()`

This is NFC and is purely compile time optimization. On big methids it can improve
codegen compile time up to 10%.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D124444
2022-04-27 15:53:24 +03:00
Jeremy Morse a15b66e76d [DebugInfo][InstrRef] Describe value sizes when spilt to stack
InstrRefBasedLDV can track and describe variable values that are spilt to
the stack -- however it does not current describe the size of the value on
the stack. This can cause uninitialized bytes to be read from the stack if
a small register is spilt for a larger variable, or theoretically on
big-endian machines if a large value on the stack is used for a small
variable.

Fix this by using DW_OP_deref_size to specify the amount of data to load
from the stack, if there's any possibility for ambiguity. There are a few
scenarios where this can be omitted (such as when using DW_OP_piece and a
non-DW_OP_stack_value location), see deref-spills-with-size.mir for an
explicit table of inputs flavours and output expressions.

Differential Revision: https://reviews.llvm.org/D123599
2022-04-27 09:54:50 +01:00
Andrew Savonichev 0a27622a1d [NVPTX] Disable DWARF .file directory for PTX
Default behavior for .file directory was changed in D105856, but
ptxas (CUDA 11.5 release) refuses to parse it:

    $ llc -march=nvptx64 llvm/test/DebugInfo/NVPTX/debug-file-loc.ll
    $ ptxas debug-file-loc.s
    ptxas debug-file-loc.s, line 42; fatal   : Parsing error near
    '"foo.h"': syntax error

Added a new field to MCAsmInfo to control default value of
UseDwarfDirectory. This value is used if -dwarf-directory command line
option is not specified.

Differential Revision: https://reviews.llvm.org/D121299
2022-04-26 21:40:36 +03:00
Jeremy Morse 65d5beca13 Reapply D124184, [DebugInfo][InstrRef] Add a size operand to DBG_PHI
This was reverted twice, in 987cd7c3ed and 13815e8cbf. The latter
stemed from not accounting for rare register classes in a pre-allocated
array, and the former from an array not being completely initialized,
leading to asan complaining.
2022-04-26 15:49:22 +01:00
Serge Pavlov 170a903144 Intrinsic for checking floating point class
This change introduces a new intrinsic, `llvm.is.fpclass`, which checks
if the provided floating-point number belongs to any of the the specified
value classes. The intrinsic implements the checks made by C standard
library functions `isnan`, `isinf`, `isfinite`, `isnormal`, `issubnormal`,
`issignaling` and corresponding IEEE-754 operations.

The primary motivation for this intrinsic is the support of strict FP
mode. In this mode using compare instructions or other FP operations is
not possible, because if the value is a signaling NaN, floating-point
exception `Invalid` is raised, but the aforementioned functions must
never raise exceptions.

Currently there are two solutions for this problem, both are
implemented partially. One of them is using integer operations to
implement the check. It was implemented in https://reviews.llvm.org/D95948
for `isnan`. It solves the problem of exceptions, but offers one
solution for all targets, although some can do the check in more
efficient way.

The other, implemented in https://reviews.llvm.org/D96568, introduced a
hook 'clang::TargetCodeGenInfo::testFPKind', which injects a target
specific code into IR to implement `isnan` and some other functions. It is
convenient for targets that have dedicated instruction to determine FP data
class. However using target-specific intrinsic complicates analysis and can
prevent some optimizations.

A special intrinsic for value class checks allows representing data class
tests with enough flexibility. During IR transformations it represents the
check in target-independent way and saves it from undesired transformations.
In the instruction selector it allows efficient lowering depending on the
used target and mode.

This implementation is an extended variant of `llvm.isnan` introduced
in https://reviews.llvm.org/D104854. It is limited to minimal intrinsic
support. Target-specific treatment will be implemented in separate
patches.

Differential Revision: https://reviews.llvm.org/D112025
2022-04-26 13:09:16 +07:00
Lian Wang 9980148305 [RISCV][SelectionDAG] Support VP_ADD/VP_MUL/VP_SUB mask operations
Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D124144
2022-04-26 02:30:22 +00:00
Jeremy Morse 987cd7c3ed Revert "Reapply D124184, [DebugInfo][InstrRef] Add a size operand to DBG_PHI"
This reverts commit 5db9250231.

Further to the early revert, the sanitizers have found something wrong with
this.
2022-04-25 23:30:15 +01:00
Matt Arsenault 7714e03175 RegAllocGreedy: Allow last chance recolor to retry overlapping tuples
Last chance recoloring didn't try recoloring a done register with the
same class since it believed there was no point. This doesn't
necessarily apply if the members in that class overlap. Allow the
recoloring to proceed if the assigned interfering physical register
overlaps with the candidate register.

This avoids an allocation failure with overlapping tuples. This
testcase could be handled better, and I don't believe should reach
last chance recoloring. The failure only manifests with the mutually
unsatisfiable register hints to overlapping tuples. The earlier
assignment decisions probably should have figured out that using these
hints was a bad idea.
2022-04-25 17:07:17 -04:00
David Green 9727c77d58 [NFC] Rename Instrinsic to Intrinsic 2022-04-25 18:13:23 +01:00
Jeremy Morse 5db9250231 Reapply D124184, [DebugInfo][InstrRef] Add a size operand to DBG_PHI
This was applied in fda4305e53, reverted in 13815e8cbf, the problem
was that fp80 X86 registers that were spilt to the stack aren't expected by
LiveDebugValues. It pre-allocates a position number for all register sizes
that can be spilt, and 80 bits isn't exactly common.

The solution is to scan the register classes to find any unrecognised
register sizes, adn pre-allocate those position numbers, avoiding a later
assertion.
2022-04-25 15:50:15 +01:00
Jeremy Morse 13815e8cbf Revert "[DebugInfo][InstrRef] Add a size operand to DBG_PHI"
This reverts commit fda4305e53.

Green dragon has spotted a problem -- it's understood, but might be fiddly
to fix, reverting in the meantime.
2022-04-25 14:06:12 +01:00
Jeremy Morse fda4305e53 [DebugInfo][InstrRef] Add a size operand to DBG_PHI
DBG_PHI instructions can refer to stack slots, to indicate that multiple
values merge together on control flow joins in that slot. This is fine --
however the slot might be merged at a later date with a slot of a different
size. In doing so, we lose information about the size the eliminated PHI.
Later analysis passes have to guess.

Improve this by attaching an optional "bit size" operand to DBG_PHI, which
only gets added for stack slots, to let us know how large a size the value
on the stack is.

Differential Revision: https://reviews.llvm.org/D124184
2022-04-25 13:41:34 +01:00
Matt Arsenault 6fa1d12b3c ProcessImplicitDefs: Use required properties instead of isSSA assert 2022-04-22 18:28:45 -04:00
Simon Pilgrim 34e7243464 [DAG] Fold freeze(bitcast(x)) -> bitcast(freeze(x))
This is a very specific fold to fix an upstream poor codegen issue.

InstCombine has the much more flexible pushFreezeToPreventPoisonFromPropagating but I don't think we're quite there with DAG/TLI handling for canCreateUndefOrPoison/isGuaranteedNotToBeUndefOrPoison value tracking yet.

Fixes #54911

Differential Revision: https://reviews.llvm.org/D124185
2022-04-22 16:39:25 +01:00
Matt Arsenault 9c122537cd MIR: Serialize FunctionContextIdx in MachineFrameInfo 2022-04-22 11:07:41 -04:00
Matt Arsenault 40bc9112c0 GlobalISel: Relax handling of G_ASSERT_* with source register classes
The most common situation where G_ASSERT_ZEXT appears for AMDGPU is a
copy from a physical register, which happens to use set the actual
register class on the virtual register. After copy coalescing, the
assert's source operand had a vreg with a set class. The verifier was
strictly rejecting cases where the set class/bank weren't an exact
match. Additionally, RegBankSelect was also expecting a register bank
to be set on the register, not a class.

This is much stricter than regular copies so relax this behavior. This
now allows these 2 cases:

1. Source register has either class or bank, and the result does not
2. Source register has a register class, and the result is a register
with a matching bank.

This should avoid needing some kind of special handling to avoid
violating this constraint when folding copies.
2022-04-22 10:49:50 -04:00
Vitaly Buka 9be90748f1 Revert "[asan] Emit .size directive for global object size before redzone"
Revert "[docs] Fix underline"

Breaks a lot of asan tests in google.

This reverts commit 365c3e85bc.
This reverts commit 78a784bea4.
2022-04-21 16:21:17 -07:00
Alex Brachet 78a784bea4 [asan] Emit .size directive for global object size before redzone
This emits an `st_size` that represents the actual useable size of an object before the redzone is added.

Reviewed By: vitalybuka, MaskRay, hctim

Differential Revision: https://reviews.llvm.org/D123010
2022-04-21 20:46:38 +00:00
Paul Kirth 61e36e87df [safestack] Support safestack in stack size diagnostics
Current stack size diagnostics ignore the size of the unsafe stack.
This patch attaches the size of the static portion of the unsafe stack
to the function as metadata, which can be used by the backend to emit
diagnostics regarding stack usage.

Reviewed By: phosek, mcgrathr

Differential Revision: https://reviews.llvm.org/D119996
2022-04-20 18:29:40 +00:00
Alexey Bataev 2cca53c815 [DAG]Introduce llvm::processShuffleMasks and use it for shuffles in DAG Type Legalizer.
We can process the long shuffles (working across several actual
vector registers) in the best way if we take the actual register
represantion into account. We can build more correct representation of
register shuffles, improve number of recognised buildvector sequences.
Also, same function can be used to improve the cost model for the
shuffles. in future patches.

Part of D100486

Differential Revision: https://reviews.llvm.org/D115653
2022-04-20 09:37:16 -07:00
Matt Arsenault 9209a51918 MachineModuleInfo: Move AddrLabelSymbols to AsmPrinter
This was tracking global state only used by the AsmPrinter, which can
store its own module global state.
2022-04-20 11:21:40 -04:00
Matt Arsenault 3659780d58 MachineModuleInfo: Remove UsesMorestackAddr
This is x86 specific, and adds statefulness to
MachineModuleInfo. Instead of explicitly tracking this, infer if we
need to declare the symbol based on the reference previously inserted.

This produces a small change in the output due to the move from
AsmPrinter::doFinalization to X86's emitEndOfAsmFile. This will now be
moved relative to other end of file fields, which I'm assuming doesn't
matter (e.g. the __morestack_addr declaration is now after the
.note.GNU-split-stack part)

This also produces another small change in code if the module happened
to define/declare __morestack_addr, but I assume that's invalid and
doesn't really matter.
2022-04-20 11:10:20 -04:00
Matt Arsenault d7938b1a81 MachineModuleInfo: Move HasSplitStack handling to AsmPrinter
This is used to emit one field in doFinalization for the module. We
can accumulate this when emitting all individual functions directly in
the AsmPrinter, rather than accumulating additional state in
MachineModuleInfo.

Move the special case behavior predicate into MachineFrameInfo to
share it. This now promotes it to generic behavior. I'm assuming this
is fine because no other target implements adjustForSegmentedStacks,
or has tests using the split-stack attribute.
2022-04-20 10:54:29 -04:00
Alexey Bataev 5f7ac15912 Revert "[DAG]Introduce llvm::processShuffleMasks and use it for shuffles in DAG Type Legalizer."
This reverts commit 2f49163b33 to fix
a buildbot failure. Reported in https://lab.llvm.org/buildbot#builders/105/builds/24284
2022-04-20 06:35:55 -07:00
Matt Arsenault 26d575eb08 LocalStackSlotAllocation: Combine debug printing statements 2022-04-20 09:31:14 -04:00
Matt Arsenault 4575f35ea1 LocalStackSlotAllocation: Stop creating unused virtual register 2022-04-20 09:31:14 -04:00
Alexey Bataev 2f49163b33 [DAG]Introduce llvm::processShuffleMasks and use it for shuffles in DAG Type Legalizer.
We can process the long shuffles (working across several actual
vector registers) in the best way if we take the actual register
represantion into account. We can build more correct representation of
register shuffles, improve number of recognised buildvector sequences.
Also, same function can be used to improve the cost model for the
shuffles. in future patches.

Part of D100486

Differential Revision: https://reviews.llvm.org/D115653
2022-04-20 05:32:56 -07:00
Matt Arsenault 9592e88f59 MachineModuleInfo: Don't allow dynamically setting DbgInfoAvailable
This can be set up front, and used only as a cache. This avoids a
field that looks like it requires MIR serialization.

I believe this fixes 2 bugs for CodeView. First, this addresses a
FIXME that the flag -diable-debug-info-print only works with
DWARF. Second, it fixes emitting debug info with emissionKind NoDebug.
2022-04-19 21:08:37 -04:00
Matt Arsenault 8591328e15 Intrinsics: Mark llvm.eh.sjlj.callsite argument as immarg
The assert in SelectionDAG implies that it is
2022-04-19 21:04:33 -04:00
Matt Arsenault 507259820a GlobalISel: Add LegalizeMutations to help use More/FewerElements 2022-04-19 21:04:32 -04:00
Vitaly Buka 33c5d8f939 [msan] Disable assert with msan
The assert uses data from just destroyed BasicBlock.
2022-04-19 16:42:05 -07:00
chenglin.bi 222adf338a [Arch64][SelectionDAG] Add target-specific implementation of srem
1. X%C to the equivalent of X-X/C*C is not always fastest path if there is no SDIV pair exist. So check target have faster for srem only first.
2. Add AArch64 faster path for SREM only pow2 case.

Fix https://github.com/llvm/llvm-project/issues/54649

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D122968
2022-04-19 02:49:42 +08:00
chenglin.bi acfc025a72 Revert "[Arch64][SelectionDAG] Add target-specific implementation of srem"
This reverts commit 9d9eddd3dd.
2022-04-18 10:35:09 +08:00
chenglin.bi 9d9eddd3dd [Arch64][SelectionDAG] Add target-specific implementation of srem
X%C to the equivalent of X-X/C*C is not always fastest path if there is no SDIV pair exist. So check target have faster for srem only first. Add AArch64 faster path for SREM only pow2 case.

Fix https://github.com/llvm/llvm-project/issues/54649

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D122968
2022-04-16 12:29:11 +08:00
Matt Arsenault b8033de063 MIR: Serialize a few bool function fields 2022-04-15 20:31:07 -04:00
Craig Topper c6dc229a6d [DAGCombiner] Move call to hasOneUse after opcode checks. NFC
Checking the opcode is cheap, counting the number of uses is not.
2022-04-15 17:02:16 -07:00
Craig Topper a7b9d75e7a [DAGCombiner] Move or/xor/and opcode check in ReduceLoadOpStoreWidth before hasOneUse check.
hasOneUse is not cheap on nodes with chain results that might have
many uses. By checking the opcode first, we can avoid a costly walk
of the use list on nodes we aren't interested in.

Found by investigating calls to hasNUsesOfValue from the example
provided in D123857.
2022-04-15 16:38:27 -07:00
Chih-Ping Chen eab6e94f91 [DebugInfo] Add a TargetFuncName field in DISubprogram for
specifying DW_AT_trampoline as a string. Also update the signature
of DIBuilder::createFunction to reflect this addition.

Differential Revision: https://reviews.llvm.org/D123697
2022-04-15 16:38:23 -04:00
Johannes Doerfert 3f7a6ce0de [DWARF][FIX] Handle the use of multiple registers gracefully
Certain applications crashed for us with the AMDGPU backend. While this
is not a proper fix it allows us to compile the code for now. I left a
TODO for someone that understands DWARF.

Differential Revision: https://reviews.llvm.org/D123717
2022-04-15 13:43:50 -05:00
Clement Courbet 46a13a0ef8 [ExpandMemCmp] Properly expand `bcmp` to an equality pattern.
Before that change, constant-size `bcmp` would miss an opportunity to generate
a more efficient equality pattern and would generate a -1/0-1 pattern
instead.

Differential Revision: https://reviews.llvm.org/D123849
2022-04-15 11:26:24 +02:00
Matt Arsenault 9196f5dab7 MachineCSE: Report this requires SSA 2022-04-14 20:21:21 -04:00
John Brawn 12c1022679 [AArch64] Lowering and legalization of strict FP16
For strict FP16 to work correctly needs some changes in lowering and
legalization:
 * SelectionDAGLegalize::PromoteNode was missing handling for some
   strict fp opcodes.
 * Some of the custom lowering of strict fp operations needed to be
   adjusted to work with FP16.
 * Custom lowering needed to be added for round-to-int operations.

With this, and the previous patches for the rest of the strict fp
isel, we can set IsStrictFPEnabled = true.

Differential Revision: https://reviews.llvm.org/D115620
2022-04-14 16:51:22 +01:00
Joseph Huber 11f47b791f [OpenMP] Make offloading sections have the SHF_EXCLUDE flag
Offloading sections can be embedded in the host during codegen via a
section. This section was originally marked as metadata to prevent it
from being loaded, but these sections are completely unused at runtime
so the linker should automatically drop them from the final executable
or shard library. This flag adds support for the SHF_EXCLUDE flag in
target lowering and uses it.

Reviewed By: JonChesterfield, MaskRay

Differential Revision: https://reviews.llvm.org/D122987
2022-04-14 10:50:49 -04:00
Paul Walker 0c44115e51 [SVE] Add support for non-element-type sized scaling when lowering MGATHER/MSCATTER.
The lowering code did not use the scale operand of MGATHER/MSCATTER
nodes, but instead assumed scaled indices were always scaled based
on the element type of the memory type. This patch adds the missing
support by rewritting the nodes as unscaled variants.

Differential Revision: https://reviews.llvm.org/D123670
2022-04-14 11:54:46 +01:00
Matt Arsenault 1732242bee RegAlloc: Fix remaining virtual registers after allocation failure
This testcase fails register allocation, but at the failure point
there were also new split virtual registers. Previously this was
assigning the failing register and not enqueueing the newly created
split virtual registers. These would then never be allocated and
assert in VirtRegRewriter.
2022-04-13 16:25:30 -04:00
Matt Arsenault 681b9466c9 RegAllocGreedy: Remove redundant check for virtual registers
The set of interfering virtual registers obviously only includes
virtual registers.
2022-04-13 15:00:18 -04:00
serge-sans-paille fa5a4e1b95 [iwyu] Handle regressions in libLLVM header include
Running iwyu-diff on LLVM codebase since a96638e50e detected a few
regressions, fixing them.
2022-04-13 20:53:19 +02:00
Simon Pilgrim fef221bf1f [DAG] Enable SimplifyVBinOp folds on add/sub sat intrinsics 2022-04-13 12:53:23 +01:00
Jonas Paulsson 46f83caebc [InlineAsm] Add support for address operands ("p").
This patch adds support for inline assembly address operands using the "p"
constraint on X86 and SystemZ.

This was in fact broken on X86 (see example at
https://reviews.llvm.org/D110267, Nov 23).

These operands should probably be treated the same as memory operands by
CodeGenPrepare, which have been commented with "TODO" there.

Review: Xiang Zhang and Ulrich Weigand

Differential Revision: https://reviews.llvm.org/D122220
2022-04-13 12:50:21 +02:00
Simon Pilgrim cfb3ee2185 [DAG] Add non-uniform vector support to (shl (srl x, c1), c2) -> (and (shift x, c3))
Another part of D77804 yak shaving

Differential Revision: https://reviews.llvm.org/D123523
2022-04-13 11:37:33 +01:00
Matt Arsenault d4b1be20f6 RegAllocGreedy: Fix illegal eviction assert for urgent evictions
The condition in canEvictInterferenceBasedOnCost is slightly different
from the assertion in evictInteference.
canEvictInterferenceBasedOnCost uses a <= check for the cascade number
for legality, but the assert was checking for <. For equal cascade
numbers for an urgent eviction, canEvictInterferenceBasedOnCost could
return success. The actual eviction would then hit this assert. Avoid
ever returning true for equivalent cascade numbers.

The resulting failed allocation seems a bit off to me. e.g. in
illegal-eviction-assert.mir, I wuold assume %0 gets allocated starting
at $vgpr0. That was its initial allocation choice, but was later
evicted. In this example no evictions can help improve anything.
2022-04-12 19:16:56 -04:00
Matt Arsenault eefed1dbf0 RegAllocGreedy: Roll back successful recolorings on failure
This is a replacement for the original fix attempted in
c46aab01c0.

This fixes "overlapping insert" assertion failures when trying to
unwind an unsuccessful recoloring attempt.

The problem would occur when there are multiple recoloring candidates
which recursively required recoloring. If one recoloring candidate was
successfully recolored at one level, and the next recoloring candidate
was unsuccessful, we would not roll back the first candidates
successful recoloring. The forgotten successful recoloring may have
been assigned to something that conflicts with a register that needs
to be restored in a parent recoloring attempt.

See the testcase added in issue48473 for a more concrete example with
explanation.
2022-04-12 19:02:48 -04:00
Matt Arsenault 3754f60112 GlobalISel: Implement MoreElements for select of vector conditions 2022-04-12 16:54:04 -04:00
Matt Arsenault 3f2cc7cc2b GlobalISel: Fix lowerSelect handling of boolean high bits
This was making several invalid assumptions about the incoming
select. First, it was assuming the incoming condition was either s1 or
already sign extended, not accounting for different boolean high bits
behavior between scalar and vector conditions. We only had a vector
boolean due to the intermediate step vector select, which is now
avoided.

Second, it was assuming it can use the result vector type as a boolean
mask. These types don't have anything to do with other, and only makes
sense in the context of the expansion to bit operations. Since these
logically are part of the same lowering, do the complete expansion in
a single step.

The added select_v4s1_s1 test does fail to legalize, since it seems
AArch64's vector legalization support is pretty incomplete.
2022-04-12 16:54:03 -04:00
Matt Arsenault 0e489926be GlobalISel: Handle widening addo/subo booleans
This will be tested in a future patch
2022-04-12 16:54:03 -04:00
Matt Arsenault 95c2bcbf8b GlobalISel: Handle widening umulo/smulo condition outputs 2022-04-12 16:54:03 -04:00
Matt Arsenault abe171df06 GlobalISel: Update mutationIsSane assert for scalable vectors 2022-04-12 16:54:03 -04:00
Shao-Ce SUN e90110e696 [NFC][CodeGen] Use ArrayRef in TargetLowering functions
This patch is similar to D122557, adding an `ArrayRef` version for `setOperationAction`, `setLoadExtAction`, `setCondCodeAction`, `setLibcallName`.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D123467
2022-04-13 00:46:05 +08:00
Simon Pilgrim bc32a1dd76 [DAG] Add non-uniform vector support to (shl (sr[la] exact X, C1), C2) folds 2022-04-12 12:57:56 +01:00
Craig Topper 35be4a7af3 [SelectionDAG] Remove unecessary null check after call to getNode. NFC
As far as I know getNode will never return a null SDValue.

I'm guessing this was modeled after the FoldConstantArithmetic
call earlier.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D123550
2022-04-11 18:03:44 -07:00
Matt Arsenault 5a5034d508 GlobalISel: Verify atomic load/store ordering restriction
Reject acquire stores and release loads. This matches the restriction
imposed by the LLParser and IR verifier.
2022-04-11 20:12:22 -04:00
Matt Arsenault d1f97a3419 GlobalISel: Add memSizeNotByteSizePow2 legality helper
This is really a replacement for memSizeInBytesNotPow2 that actually
does what most every target wants. In particular, since s1 rounds to 1
byte, it wasn't lowered by this predicate. This results in targets
needing to think harder and add more matchers to catch all the
degenerate cases.

Also small bug fix that prevented the correct insertion of
G_ASSERT_ZEXT in the AArch64 use case.
2022-04-11 19:43:37 -04:00
Matt Arsenault 1416744f84 GlobalISel: Implement computeKnownBits for overflow bool results 2022-04-11 19:43:37 -04:00
Craig Topper 2ce2562876 [RISCV][SelectionDAG] Add a hook to sign extend i32 ConstantInt operands of phis on RV64.
Materializing constants on RISCV is simpler if the constant is sign
extended from i32. By default i32 constant operands of phis are
zero extended.

This patch adds a hook to allow RISCV to override this for i32. We
have an existing isSExtCheaperThanZExt, but it operates on EVT which
we don't have at these places in the code.

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D122951
2022-04-11 14:38:39 -07:00
Craig Topper 28cb508195 [TargetLowering][RISCV] Allow truncation when checking if the arguments of a setcc are splats.
We're just trying to canonicalize here and won't be using the constant
value returned.

The attached test changes are because we were previously commuting
a seteq X, (splat_vector 0) because we also have (sub 0, X). The
0 is larger than the element type so we don't detect it as a splat
without the AllowTruncation flag. By preventing the commute we are
able to match it to the vmseq.vx instruction during isel. We only
look for constants on the RHS in isel.

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D123256
2022-04-11 09:49:36 -07:00
Momchil Velikov b4ad28da19 [CodeGen] Async unwind - add a pass to fix CFI information
This pass inserts the necessary CFI instructions to compensate for the
inconsistency of the call-frame information caused by linear (non-CGA
aware) nature of the unwind tables.

Unlike the `CFIInstrInserer` pass, this one almost always emits only
`.cfi_remember_state`/`.cfi_restore_state`, which results in smaller
unwind tables and also transparently handles custom unwind info
extensions like CFA offset adjustement and save locations of SVE
registers.

This pass takes advantage of the constraints taht LLVM imposes on the
placement of save/restore points (cf. `ShrinkWrap.cpp`):

  * there is a single basic block, containing the function prologue

  * possibly multiple epilogue blocks, where each epilogue block is
    complete and self-contained, i.e. CSR restore instructions (and the
    corresponding CFI instructions are not split across two or more
    blocks.

  * prologue and epilogue blocks are outside of any loops

Thus, during execution, at the beginning and at the end of each basic
block the function can be in one of two states:

  - "has a call frame", if the function has executed the prologue, or
     has not executed any epilogue

  - "does not have a call frame", if the function has not executed the
    prologue, or has executed an epilogue

These properties can be computed for each basic block by a single RPO
traversal.

From the point of view of the unwind tables, the "has/does not have
call frame" state at beginning of each block is determined by the
state at the end of the previous block, in layout order.

Where these states differ, we insert compensating CFI instructions,
which come in two flavours:

- CFI instructions, which reset the unwind table state to the
    initial one.  This is done by a target specific hook and is
    expected to be trivial to implement, for example it could be:
```
     .cfi_def_cfa <sp>, 0
     .cfi_same_value <rN>
     .cfi_same_value <rN-1>
     ...
```
where `<rN>` are the callee-saved registers.

- CFI instructions, which reset the unwind table state to the one
    created by the function prologue. These are the sequence:
```
       .cfi_restore_state
       .cfi_remember_state
```
In this case we also insert a `.cfi_remember_state` after the
last CFI instruction in the function prologue.

Reviewed By: MaskRay, danielkiss, chill

Differential Revision: https://reviews.llvm.org/D114545
2022-04-11 13:27:26 +01:00
Sanjay Patel 2ed15984b4 [SDAG] try to reduce compare of funnel shift equal 0
fshl (or X, Y), X, C ==/!= 0 --> or (shl Y, C), X ==/!= 0
fshl X, (or X, Y), C ==/!= 0 --> or (srl Y, BW-C), X ==/!= 0

This is similar to an existing setcc-of-rotate fold, but the
matching requires more checks for the more general funnel op:
https://alive2.llvm.org/ce/z/Ab2jDd

We are effectively decomposing the funnel shift into logical
shifts, reassociating, and removing a shift.

This should get us the final improvements for x86-64 that were
originally shown in D111530
( https://github.com/llvm/llvm-project/issues/49541 );
x86-32 still shows some SHLD/SHRD, so the pattern is not
matching there yet.

Differential Revision: https://reviews.llvm.org/D122919
2022-04-11 07:44:58 -04:00
Tim Northover 6c85668d28 Tail calls: look through AssertZExt to find register copy.
arm64_32 guarantees the high 32 bits of pointer parameters are passed as 0, and
this is modelled in the IR by inserting an AssertZExt after the CopyFromReg.
The function deciding whether registers that need to be preserved actually are
wasn't expecting this so it banned perfectly legitimate tail calls.
2022-04-11 12:24:47 +01:00
Matt Arsenault 9fdd25848a Transforms: Fix code duplication between LowerAtomic and AtomicExpand 2022-04-08 19:06:36 -04:00
Fraser Cormack 18106b99f0 [VP] Explicitly map from VP intrinsic to ISD opcode
This patch aims to overcome an issue in these mappings where, when an ISD
node was registered with BEGIN_REGISTER_VP_SDNODE but outwidth the scope
of a pair of BEGIN_REGISTER_VP_INTRINSIC/END_REGISTER_VP_INTRINSIC
macros, the switch cases fell apart. This in particular happened with
VP_SETCC, where we'd end up with something along the lines of:

  case Intrinsic::vp_fcmp:
    break;
  case Intrinsic::vp_icmp:
    break;
    ResOpc = ISD::VP_SETCC;
  case Intrinsic::vp_store:
    ...

To remedy this, we introduce a special-purpose mapping macro which can
map any number of VP intrinsic opcodes to an ISD opcode.

As a result, we no longer need to special-case the mapping from vp.icmp
and vp.fcmp to VP_SETCC, as the new helper macro does it for us.

Thanks to @craig.topper for noticing this and to @rogfer01 for the idea.

Reviewed By: rogfer01

Differential Revision: https://reviews.llvm.org/D123324
2022-04-08 12:30:22 +01:00
Nikita Popov a5a272a491 [SafeStack] Don't create SCEV min between pointer and integer (PR54784)
Rather than rewriting the alloca pointer to zero, use
removePointerBase() to drop the base pointer. This will simply bail
if the base pointer is not the alloca. We could try doing something
more fancy here (like dropping the sources not based on the alloca
on the premise that they aren't SafeStack-relevant), but I don't
think that's worthwhile.

Fixes https://github.com/llvm/llvm-project/issues/54784.

Differential Revision: https://reviews.llvm.org/D123309
2022-04-08 09:44:00 +02:00
Chih-Ping Chen c226a5c4d7 [DebugInfo] Use DW_ATE_signed encoding when creating a Fortran
array index type.
2022-04-07 07:00:56 -04:00
Fraser Cormack 8216255c9f [RISCV][VP] Add basic RVV codegen for vp.fcmp
This patch adds the necessary infrastructure to lower vp.fcmp via
ISD::VP_SETCC to RVV instructions.

Most notably this patch adds cond-code legalization for VP_SETCC,
reusing the existing TargetLowering::LegalizeSetCCCondCode by passing in
additional SDValue parameters for the Mask and EVL. This method then
uses VP operations to legalize the condcode.

There is still a general lack of canonicalization on VP_SETCC as opposed
to SETCC which results in worse code than is theoretically possible.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D123051
2022-04-07 09:16:07 +01:00
Matt Arsenault 7f14a1d46b AtomicExpand: Add NotAtomic lowering strategy
Currently LowerAtomics exists as a separate pass which blindly
replaces all atomics. Add a new lowering strategy option to eliminate
the atomics which the target can control on a per-instruction level.
2022-04-06 22:34:35 -04:00
Matt Arsenault c4ea925f50 AtomicExpand: Change return type for shouldExpandAtomicStoreInIR
Use the same enum as the other atomic instructions for consistency, in
preparation for addition of another strategy.

Introduce a new "Expand" option, since the store expansion does not
use cmpxchg. Alternatively, the existing CmpXChg strategy could be
renamed to Expand.
2022-04-06 22:34:04 -04:00
Craig Topper bdb1ab9804 [LegalizeTypes][VP] Use LoVT/HiVT when splitting VP operations in SplitVecRes_UnaryOp.
The VP path was using the split source VTs instead of the split
destination VTs. This may not be a problem today because the VP
nodes going through this have the same source and dest VTs.
It will be a problem when we start using this function for legalizing
VP cast operations.
2022-04-06 10:51:49 -07:00
Daniil Kovalev 62a983ebc5 Revert "[CodeGen] Place SDNode debug ID declaration under appropriate #if"
This reverts commit 83a798d4b0.

As discussed in D120714 with @thakis, the patch added unneeded complexity
without noticeable benefits.
2022-04-06 20:32:53 +03:00
Craig Topper 8fc19185e3 [LegalizeTypes] Move SplitVecRes_VECTOR_REVERSE/VECTOR_SPLICE near other SplitVecRes methods. NFC
This file is divided into sections for different legalization actions.
We should keep similar methods together.
2022-04-06 10:29:32 -07:00
Craig Topper 1ad36487e9 [LegalizeDAG] Use SelectionDAG::getBoolConstant to simplify some code. NFC 2022-04-06 10:08:11 -07:00
Craig Topper 5b5f59428c [DAGCombiner] Replace call getSExtOrTrunc with a truncate. NFC
The extend case should never occur. The sign extend would be an
arbitrary choice, remove it to avoid confusion.
2022-04-06 09:59:45 -07:00
Paul Walker 7d3af9ef0f [DAGCombine] insert_subvector undef, (splat X), N2 -> splat X
Differential Revision: https://reviews.llvm.org/D120328
2022-04-06 17:15:38 +01:00
Fraser Cormack 6be5e875be [RISCV][VP] Add basic RVV codegen for vp.icmp
This patch adds the minimum required to successfully lower vp.icmp via
the new ISD::VP_SETCC node to RVV instructions.

Regular ISD::SETCC goes through a lot of canonicalization which targets
may rely on which has not hereto been ported to VP_SETCC. It also
supports expansion of individual condition codes and a non-boolean
return type. Support for all of that will follow in later patches.

In the case of RVV this largely isn't a problem as the vector integer
comparison instructions are plentiful enough that it can lower all
VP_SETCC nodes on legal integer vectors except for boolean vectors,
which regular SETCC folds away immediately into logical operations.

Floating-point VP_SETCC operations aren't as well supported in RVV and
the backend relies on condition code expansion, so support for those
operations will come in later patches.

Portions of this code were taken from the VP reference patches.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D122743
2022-04-06 16:51:22 +01:00
Paul Walker 1c307b9794 [NFC] Remove redundant IndexType canonicalisation from DAGTypeLegalizer::PromoteIntOp_MSCATTER
Promotion does not affect the base element type and so the original
index type will remain unchanged.  This reflects the behaviour of
DAGTypeLegalizer::PromoteIntOp_MGATHER with no tests affected.
2022-04-06 15:30:29 +01:00
zhongyunde 19e5235147 [AArch64][InstCombine] Fold MLOAD and zero extensions into MLOAD
Accord the discussion in D122281, we missing an ISD::AND combine for MLOAD
because it relies on BuildVectorSDNode is fails for scalable vectors.
This patch is intend to handle that, so we can circle back the type MVT::nxv2i32

Reviewed By: paulwalker-arm

Differential Revision: https://reviews.llvm.org/D122703
2022-04-06 20:50:42 +08:00
Roman Lebedev 34ce9fd864
[TLI] `TargetLowering::SimplifyDemandedVectorElts()`: narrowing bitcast: fill known zero elts from known src bits
E.g. in
```
%i0 = zext <2 x i8> to <2 x i16>
%i1 = bitcast <2 x i16> to <4 x i8>
```
the `%i0`'s zero bits are known to be `0xFF00` (upper half of every element is known zero),
but no elements are known to be zero, and for `%i1`, we don't know anything about zero bits,
but the elements under `0b1010` mask are known to be zero (i.e. the odd elements).

But, we didn't perform such a propagation.

Noticed while investigating more aggressive `vpmaddwd` formation.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D123163
2022-04-06 14:19:31 +03:00
Daniil Kovalev 83a798d4b0 [CodeGen] Place SDNode debug ID declaration under appropriate #if
Place PersistentId declaration under #if LLVM_ENABLE_ABI_BREAKING_CHECKS to
reduce memory usage when it is not needed.

Differential Revision: https://reviews.llvm.org/D120714
2022-04-06 14:09:32 +03:00
Jeremy Morse fb6596f1ec [DebugInfo][InstrRef] Avoid a crash from mixed variable location modes
Variable locations now come in two modes, instruction referencing and
DBG_VALUE. At -O0 we pick DBG_VALUE to allow fast construction of variable
information. Unfortunately, SelectionDAG edits the optimisation level in
the presence of opt-bisect-limit, meaning different passes have different
views of what variable location mode we should use. That causes assertions
when they're mixed.

This patch plumbs through a boolean in SelectionDAG from start to
instruction emission, so that we don't rely on the current optimisation
level for correctness.

Differential Revision: https://reviews.llvm.org/D123033
2022-04-06 11:55:38 +01:00
Simon Pilgrim 3369e474bb [DAG] Allow XOR(X,MIN_SIGNED_VALUE) to perform AddLike folds
As raised on PR52267, XOR(X,MIN_SIGNED_VALUE) can be treated as ADD(X,MIN_SIGNED_VALUE), so let these cases use the 'AddLike' folds, similar to how we perform no-common-bits OR(X,Y) cases.

define i8 @src(i8 %x) {
  %r = xor i8 %x, 128
  ret i8 %r
}
=>
define i8 @tgt(i8 %x) {
  %r = add i8 %x, 128
  ret i8 %r
}
Transformation seems to be correct!

https://alive2.llvm.org/ce/z/qV46E2

Differential Revision: https://reviews.llvm.org/D122754
2022-04-06 10:37:11 +01:00
Simon Pilgrim 9e97b2a477 [DAG] SimplifySetCC - relax fold (X^C1) == C2 --> X == C1^C2
https://alive2.llvm.org/ce/z/A_auBq

Remove limitation that wouldn't perform the fold if all the inverted bits are known zero

The thumb2 changes look to be benign, although it does show that the TEQ/TST isel patterns could probably be improved.

Fixes movmsk regression in D122754

Differential Revision: https://reviews.llvm.org/D123023
2022-04-06 09:18:08 +01:00
Martin Storsjö 46776f7556 Fix warnings about variables that are set but only used in debug mode
Add void casts to mark the variables used, next to the places where
they are used in assert or `LLVM_DEBUG()` expressions.

Differential Revision: https://reviews.llvm.org/D123117
2022-04-06 10:01:46 +03:00
Argyrios Kyrtzidis 330268ba34 [Support/Hash functions] Change the `final()` and `result()` of the hashing functions to return an array of bytes
Returning `std::array<uint8_t, N>` is better ergonomics for the hashing functions usage, instead of a `StringRef`:

* When returning `StringRef`, client code is "jumping through hoops" to do string manipulations instead of dealing with fixed array of bytes directly, which is more natural
* Returning `std::array<uint8_t, N>` avoids the need for the hasher classes to keep a field just for the purpose of wrapping it and returning it as a `StringRef`

As part of this patch also:

* Introduce `TruncatedBLAKE3` which is useful for using BLAKE3 as the hasher type for `HashBuilder` with non-default hash sizes.
* Make `MD5Result` inherit from `std::array<uint8_t, 16>` which improves & simplifies its API.

Differential Revision: https://reviews.llvm.org/D123100
2022-04-05 21:38:06 -07:00
Matt Arsenault 634bf829a8 MachineVerifier: Diagnose undef set on full register defs
An undef def of a full register would assert in LiveIntervalCalc.
2022-04-05 22:19:17 -04:00
Matt Arsenault ced1250b0f MIRParser: Fix asserting with invalid flags on machine operands
Constructing an operand with kills on defs and deads on uses asserts
in the constructor, so diagnose these.
2022-04-05 21:46:26 -04:00
Daniel Sanders 93977f37e6 Check if register class was changed in constrainOperandRegClass()
NFC
When no actual change happens there's no need to notify the
observers about the fact the register class is being constrained.
So we should avoid notifying observers when no change has
happened, because this can dramatically affect compile
time for particular test cases.

Reviewed By: dsanders, arsenm

Differential Revision: https://reviews.llvm.org/D122615
2022-04-05 11:55:07 -07:00
Muhammad Omair Javaid 0320115c16 Revert "[CodeGen] Async unwind - add a pass to fix CFI information"
This reverts commit 980c3e6dd2.

This commit had failing tests with clang crashing across various
AArch64/Linux buildots.

https://lab.llvm.org/buildbot/#/builders/179/builds/3346

Differential Revision: https://reviews.llvm.org/D114545
2022-04-05 13:12:30 +05:00
Max Kazantsev 9a2798c7a3 [CodeGen][NFC] Hoist budget check out of loop
Less computations & early exit if we know for sure that the limit will be exceeded.
2022-04-05 14:20:42 +07:00
Jeremy Morse 920de9c94c Revert "[DebugInfo] Correctly recognize bitfields when emitting dwarf"
This reverts commit 059d1f84d2.

Some tests on green dragon failed as a result of this -- see notes on D96334.
2022-04-04 17:14:58 +01:00
Thomas Preud'homme 449ef2fcc6 [Pipeliner] Fix comment typo 2022-04-04 16:10:27 +01:00
Momchil Velikov 980c3e6dd2 [CodeGen] Async unwind - add a pass to fix CFI information
This pass inserts the necessary CFI instructions to compensate for the
inconsistency of the call-frame information caused by linear (non-CFG
aware) nature of the unwind tables.

Unlike the `CFIInstrInserer` pass, this one almost always emits only
`.cfi_remember_state`/`.cfi_restore_state`, which results in smaller
unwind tables and also transparently handles custom unwind info
extensions like CFA offset adjustement and save locations of SVE
registers.

This pass takes advantage of the constraints that LLVM imposes on the
placement of save/restore points (cf. `ShrinkWrap.cpp`):

  * there is a single basic block, containing the function prologue

  * possibly multiple epilogue blocks, where each epilogue block is
    complete and self-contained, i.e. CSR restore instructions (and the
    corresponding CFI instructions are not split across two or more
    blocks.

  * prologue and epilogue blocks are outside of any loops

Thus, during execution, at the beginning and at the end of each basic
block the function can be in one of two states:

  - "has a call frame", if the function has executed the prologue, or
     has not executed any epilogue

  - "does not have a call frame", if the function has not executed the
    prologue, or has executed an epilogue

These properties can be computed for each basic block by a single RPO
traversal.

In order to accommodate backends which do not generate unwind info in
epilogues we compute an additional property "strong no call frame on
entry" which is set for the entry point of the function and for every
block reachable from the entry along a path that does not execute the
prologue. If this property holds, it takes precedence over the "has a
call frame" property.

From the point of view of the unwind tables, the "has/does not have
call frame" state at beginning of each block is determined by the
state at the end of the previous block, in layout order.

Where these states differ, we insert compensating CFI instructions,
which come in two flavours:

- CFI instructions, which reset the unwind table state to the
    initial one.  This is done by a target specific hook and is
    expected to be trivial to implement, for example it could be:
```
     .cfi_def_cfa <sp>, 0
     .cfi_same_value <rN>
     .cfi_same_value <rN-1>
     ...
```
where `<rN>` are the callee-saved registers.

- CFI instructions, which reset the unwind table state to the one
    created by the function prologue. These are the sequence:
```
       .cfi_restore_state
       .cfi_remember_state
```
In this case we also insert a `.cfi_remember_state` after the
last CFI instruction in the function prologue.

Reviewed By: MaskRay, danielkiss, chill

Differential Revision: https://reviews.llvm.org/D114545
2022-04-04 14:38:22 +01:00
Simon Pilgrim 328754474a [DAG] SimplifySetCC - clang-format add/xor/sub with constant handling. NFC. 2022-04-04 13:30:17 +01:00
Jeremy Morse 059d1f84d2 [DebugInfo] Correctly recognize bitfields when emitting dwarf
Use the "isBitfield" flag for debug types to determine whether something is
a bitfield, rather than trying to guess from it's layout. Fixes
https://bugs.llvm.org/show_bug.cgi?id=44601

Patch by: mahkoh

Differential Revision: https://reviews.llvm.org/D96334
2022-04-04 11:14:13 +01:00
Michael Gottesman e24f534879 [debug-info] As an NFC commit, refactor EmitFuncArgumentDbgValue so that it can be extended to support llvm.dbg.addr.
The reason why I am making this change is that before this commit,
EmitFuncArgumentDbgValue relied on a boolean flag IsDbgDeclare both to signal
that a DBG_VALUE should be made to be indirect /and/ that the original intrinsic
was a dbg.declare. This is no longer always true if we add support for handling
dbg.addr since we will have an indirect DBG_VALUE that is a different intrinsic
from dbg.declare.

With that in mind, in this NFC patch, we prepare for future fixes by introducing
a 3 case-enum argument to EmitFuncArgumentDbgValue that allows the caller to
explicitly specify how the argument's DBG_VALUE should be emitted. This then
allows us to turn the indirect checks into a != FuncArgumentDbgValueKind::Value
and prepare us for a future where we add support here for llvm.dbg.addr
directly.

rdar://83957028

Reviewed By: aprantl

Differential Revision: https://reviews.llvm.org/D122945
2022-04-01 17:07:28 -07:00
Craig Topper fa630e7594 [RISCV][AMDGPU][TargetLowering] Special case overflow expansion for (uaddo X, 1).
If we expand (uaddo X, 1) we previously expanded the overflow calculation
as (X + 1) <u X. This potentially increases the live range of X and
can prevent X+1 from reusing the register that previously held X.

Since we're adding 1, overflow only occurs if X was UINT_MAX in which
case (X+1) would be 0. So this patch adds a special case to expand
the overflow calculation to (X+1) == 0.

This seems to help with uaddo intrinsics that get introduced by
CodeGenPrepare after LSR. Alternatively, we could block the uaddo
transform in CodeGenPrepare for this case.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D122933
2022-04-01 13:14:10 -07:00
Simon Pilgrim 76cd11f303 [DAG] Add llvm::isMinSignedConstant helper. NFC
Pulled out of D122754
2022-04-01 17:47:34 +01:00
Matt Arsenault 4a8665e23e SelectionDAG: Avoid some uses of getPointerTy
Avoids use of the default address space parameter, and avoids some
assumptions about the incoming address space.
2022-03-31 18:49:22 -04:00
Matt Arsenault 395f8ccfc9 RegAllocGreedy: Fix typo 2022-03-31 16:30:01 -04:00
Abinav Puthan Purayil 898d5776ec [AMDGPU][GlobalISel] Scalarize add/sub with overflow ops in the legalizer
Differential Revision: https://reviews.llvm.org/D122803
2022-03-31 21:46:34 +05:30
Craig Topper 85eae45520 [SelectionDAG] Move extension type for ConstantSDNode from getCopyToRegs to HandlePHINodesInSuccessorBlocks.
D122053 set the ExtendType for ConstantSDNodes in getCopyToRegs to
ZERO_EXTEND to match assumptions in ComputePHILiveOutRegInfo. PHIs
are probably not the only way ConstantSDNodeNodes can get to
getCopyToRegs.

This patch adds an ExtendType parameter to CopyValueToVirtualRegister and
has HandlePHINodesInSuccessorBlocks pass ISD::ZERO_EXTEND for ConstantInts.
This way we only affect ConstantSDNodes for PHIs.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D122171
2022-03-30 11:32:43 -07:00
Sanjay Patel 436b875e49 [SDAG] avoid libcalls to fmin/fmax for soft-float targets
This is an extension of D70965 to avoid creating a mathlib
call where it did not exist in the original source. Also see
D70852 for discussion about an alternative proposal that was
abandoned.

In the motivating bug report:
https://github.com/llvm/llvm-project/issues/54554
...we also have a more general issue about handling "no-builtin" options.

Differential Revision: https://reviews.llvm.org/D122610
2022-03-30 11:22:03 -04:00
Sanjay Patel e18cc5277f [SDAG] try to canonicalize logical shift after bswap
When shifting by a byte-multiple:
bswap (shl X, C) --> lshr (bswap X), C
bswap (lshr X, C) --> shl (bswap X), C

This is the backend version of D122010 and an alternative
suggested in D120648.
There's an extra check to make sure the shift amount is
valid that was not in the rough draft.

I'm not sure if there is a larger motivating case for RISCV (bug report?),
but the ARM diffs show a benefit from having a late version of the
transform (because we do not combine the loads in IR).

Differential Revision: https://reviews.llvm.org/D122655
2022-03-30 09:29:32 -04:00
Fraser Cormack 43a91a8474 [SelectionDAG] Don't create illegally-typed nodes while constant folding
This patch fixes a (seemingly very rare) crash during vector constant
folding introduced in D113300.

Normally, during legalization, if we create an illegally-typed node during
a failed attempt at constant folding it's cleaned up before being
visited, due to it having no uses.

If, however, an illegally-typed node is created during one round of
legalization and isn't cleaned up, it's possible for a second round of
legalization to create new illegally-typed nodes which add extra uses to
the old illegal nodes. This means that we can end up visiting the old
nodes before they're known to be dead, at which point we crash.

I'm not happy about this fix. Creating illegal types at all seems like a
bad idea, but we all-too-often rely on illegal constants being
successfully folded and being fixed up afterwards. However, we can't
rely on constant folding actually happening, and we don't have a
foolproof way of peering into the future.

Perhaps the correct fix is to revisit the node-iteration order during
legalization, ensuring we visit all uses of nodes before the nodes
themselves. Or alternatively we could try and clean up dead nodes
immediately after failing constant folding.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D122382
2022-03-30 13:17:55 +01:00
serge-sans-paille 01be9be2f2 Cleanup includes: final pass
Cleanup a few extra files, this closes the work on libLLVM dependencies on my
side.

Impact on libLLVM preprocessed output: -35876 lines

Discourse thread: https://discourse.llvm.org/t/include-what-you-use-include-cleanup
Differential Revision: https://reviews.llvm.org/D122576
2022-03-29 09:00:21 +02:00
Craig Topper e68257fcee [RISCV][SelectionDAG] Enable TargetLowering::hasBitTest for masks that fit in ANDI.
Modified DAGCombiner to pass the shift the bittest input and the shift amount
to hasBitTest. This matches the other call to hasBitTest in TargetLowering.h

This is an alternative to D122454.

Reviewed By: luismarques

Differential Revision: https://reviews.llvm.org/D122458
2022-03-28 12:46:36 -07:00
David Blaikie a5032b2633 DebugInfo: Don't allow type units to references types in the CU
We could only do this in limited ways (since we emit the TUs first, we
can't use ref_addr (& we can't use that in Split DWARF either) - so we
had to synthesize declarations into the TUs) and they were ambiguous in
some cases (if the CU type had internal linkage, parsing the TU would
require knowing which CU was referencing the TU to know which type the
declaration was for, which seems not-ideal). So to avoid all that, let's
just not reference types defined in the CU from TUs - instead moving the
TU type into the CU (recursively).

This does increase debug info size (by pulling more things out of type
units, into the compile unit) - about 2% of uncompressed dwp file size
for clang -O0 -g -gsplit-dwarf. (5% .debug_info.dwo section size
increase in the .dwp)
2022-03-25 23:49:03 +00:00
Hongtao Yu e25f4e4c4a [PseudoProbe] Do not emit pseudo probes when module is not probed.
There is a case when a function has pseudo probe intrinsics but the module it resides does not have the probe desc. This could happen when the current module is not built with `-fpseudo-probe-for-profiling` while a function in it calls some other function from a probed module. In thinLTO mode, the callee function could be imported and inlined into the current function.
While this is undefined behavior, I'm fixing the asm printer to not ICE and warn user about this.

Reviewed By: wenlei

Differential Revision: https://reviews.llvm.org/D121737
2022-03-25 12:59:53 -07:00
Simon Pilgrim e209190c2d [SDAG] enable binop identity constant folds for multiplies
Add mul to the list of ops that we canonicalize with a select to expose an identity merge

Differential Revision: https://reviews.llvm.org/D122071
2022-03-25 11:07:04 +00:00
Simon Pilgrim ae95f291e8 [AsmPrinter] AIXException::endFunction - use cast<> instead of dyn_cast<> to avoid dereference of nullptr
The pointer is used immediately inside the getSymbol() call, so assert the cast is correct instead of returning nullptr
2022-03-25 10:23:30 +00:00
Craig Topper 67eb2f144e [SelectionDAG] Add AssertAlign to AddNodeIDCustom so that it will CSE properly.
The alignment needs to be part of the folding set hash. This is
handled by getAssertAlign when nodes are created, but needs to repeated here.

No test case as I found it as part of a very early experimental patch.

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D122279
2022-03-24 08:59:09 -07:00
Daniil Kovalev c53cbce45e [CodeGen] Define ABI breaking class members correctly
Non-static class members declared under #ifndef NDEBUG should be declared
under #if LLVM_ENABLE_ABI_BREAKING_CHECKS to make headers library-friendly and
allow cross-linking, as discussed in D120714.

Differential Revision: https://reviews.llvm.org/D121549
2022-03-24 12:42:59 +03:00
Julian Lettner 64902d335c Reland "Lower `@llvm.global_dtors` using `__cxa_atexit` on MachO"
For MachO, lower `@llvm.global_dtors` into `@llvm_global_ctors` with
`__cxa_atexit` calls to avoid emitting the deprecated `__mod_term_func`.

Reuse the existing `WebAssemblyLowerGlobalDtors.cpp` to accomplish this.

Enable fallback to the old behavior via Clang driver flag
(`-fregister-global-dtors-with-atexit`) or llc / code generation flag
(`-lower-global-dtors-via-cxa-atexit`).  This escape hatch will be
removed in the future.

Differential Revision: https://reviews.llvm.org/D121736
2022-03-23 18:36:55 -07:00
Zequan Wu 581dc3c729 Revert "Lower `@llvm.global_dtors` using `__cxa_atexit` on MachO"
This reverts commit 22570bac69.
2022-03-23 16:11:54 -07:00
Craig Topper cac9773dcc [SelectionDAG] Don't create entries in ValueMap in ComputePHILiveOutRegInfo
Instead of using operator[], use DenseMap::find to prevent default
constructing an entry if it isn't already in the map.

Also simplify a condition to check for 0 instead of a virtual register.
I'm pretty sure we can only get 0 or a virtual register out of the value
map.
2022-03-23 09:52:07 -07:00
serge-sans-paille 60ca256953 Cleanup include: Add missing header
Should fix https://lab.llvm.org/buildbot#builders/57/builds/16192 introduced by
02c28970b2
2022-03-23 15:15:56 +01:00
Benjamin Kramer 9a6e0afac5 Unbreak the build after 02c28970b2 2022-03-23 14:38:13 +01:00
serge-sans-paille 02c28970b2 Cleanup include: codegen second round
Discourse thread: https://discourse.llvm.org/t/include-what-you-use-include-cleanup
Differential Revision: https://reviews.llvm.org/D122180
2022-03-23 13:54:00 +01:00
Craig Topper 681fd2c11e Revert "[SelectionDAG] Don't create entries in ValueMap in ComputePHILiveOutRegInfo"
This reverts commit 1a9b55b63a.

Causing build bot failures
2022-03-22 23:41:47 -07:00
Craig Topper 1a9b55b63a [SelectionDAG] Don't create entries in ValueMap in ComputePHILiveOutRegInfo
Instead of using operator[], use DenseMap::find to prevent default
constructing an entry if it isn't already in the map.
2022-03-22 23:24:53 -07:00
Craig Topper 73f0af106b [SelectionDAG] Add printing support for the Align value of AssertAlign nodes.
Differential Revision: https://reviews.llvm.org/D122262
2022-03-22 14:16:32 -07:00
Carl Ritson 8e64d84995 [MachineSink] Check block prologue interference
Sinking must check for interference between the block prologue
and the instruction being sunk.
Specifically check for clobbering of uses by the prologue, and
overwrites to prologue defined registers by the sunk instruction.

Reviewed By: rampitec, ruiling

Differential Revision: https://reviews.llvm.org/D121277
2022-03-22 11:15:37 +09:00
Mircea Trofin f658ca1aba [mlgo] Fix build breaks introduced by includes cleanups
These were not detected by the build bots because those went quietly
offline, too, due to a misconfiguration (fixed since)
2022-03-21 13:49:40 -07:00
Craig Topper 37c0aacd71 [SelectionDAG] Make getPreferredExtendForValue take a Instruction * instead of Value *.
This is only called for instructions and the caller is already holding
an Instruction *. This makes the code more explicit and makes it
obvious the code doesn't make decisions about constants.
2022-03-21 12:15:22 -07:00
Jay Foad 1bb3a9c642 [MachineCopyPropagation] More robust isForwardableRegClassCopy
Change the implementation of isForwardableRegClassCopy so that it
does not rely on getMinimalPhysRegClass. Instead, iterate over all
classes looking for any that satisfy a required property.

NFCI on current upstream targets, but this copes better with
downstream AMDGPU changes where some new smaller classes have been
introduced, which was breaking regclass equality tests in the old
code like:
    if (UseDstRC != CrossCopyRC && CopyDstRC == CrossCopyRC)

Differential Revision: https://reviews.llvm.org/D121903
2022-03-21 16:41:01 +00:00
zhongyunde 828b89bc0b [AArch64][SelectionDAG] Supports unpklo/hi instructions to reduce the number of loads
Trying to reduce the number of masked loads in favour of more unpklo/hi
instructions. Both ISD::ZEXTLOAD and ISD::SEXTLOAD are supported to extensions
from legal types.

Both of normal and masked loads test cases added to guard compile crash.

Reviewed By: paulwalker-arm

Differential Revision: https://reviews.llvm.org/D120953
2022-03-21 23:47:33 +08:00
Simon Pilgrim 35a7be6ccb [SDAG] enable binop identity constant folds for shifts
Add shl/srl/sra to the list of ops that we canonicalize with a select to expose an identity merge

Differential Revision: https://reviews.llvm.org/D122070
2022-03-21 13:02:50 +00:00
Kazu Hirata 1eada2adda [CodeGen] Apply clang-tidy fixes for readability-redundant-smartptr-get (NFC) 2022-03-20 23:11:06 -07:00
Luo, Yuanke 10bb623192 enable binop identity constant folds for add
Differential Revision: https://reviews.llvm.org/D119654
2022-03-20 19:07:16 +08:00
Craig Topper 4eb59f0179 [SelectionDAG][RISCV] Make RegsForValue::getCopyToRegs explicitly zero_extend constants.
ComputePHILiveOutRegInfo assumes that constant incoming values to
Phis will be zero extended if they aren't a legal type. To guarantee
that we should zero_extend rather than any_extend constants.

This fixes a bug for RISCV where any_extend of constants can be
treated as a sign_extend.

Differential Revision: https://reviews.llvm.org/D122053
2022-03-19 18:43:14 -07:00
Craig Topper 306ff74154 [SelectionDAG] Use APInt::zextOrSelf instead of zextOrTrunc in ComputePHILiveOutRegInfo
The width never decreases here.
2022-03-18 23:26:19 -07:00
Eli Friedman 5cd9fa551e Fix computation of MadeChange bit in AtomicExpandPass.
Fixes llvm-clang-x86_64-expensive-checks-debian failure with 2f497ec3.

expandAtomicStore always modifies the function, so make sure we set
MadeChange unconditionally. Not sure how nobody else has stumbled over
this before.
2022-03-18 13:47:11 -07:00
Kai Luo 31906a6090 [AtomicExpand][PowerPC] Fix all-one mask value
When generating a all-one mask value whose bitwidth is larger than 64, signed extension should be used rather then zero extension.

Reviewed By: jsji

Differential Revision: https://reviews.llvm.org/D120865
2022-03-18 13:35:54 +08:00
Julian Lettner 22570bac69 Lower `@llvm.global_dtors` using `__cxa_atexit` on MachO
For MachO, lower `@llvm.global_dtors` into `@llvm_global_ctors` with
`__cxa_atexit` calls to avoid emitting the deprecated `__mod_term_func`.

Reuse the existing `WebAssemblyLowerGlobalDtors.cpp` to accomplish this.

Enable fallback to the old behavior via Clang driver flag
(`-fregister-global-dtors-with-atexit`) or llc / code generation flag
(`-lower-global-dtors-via-cxa-atexit`).  This escape hatch will be
removed in the future.

Differential Revision: https://reviews.llvm.org/D121736
2022-03-17 10:47:13 -07:00
Matt Arsenault 8d66603a48 Revert "RegAllocGreedy: Fix last chance recolor assert in impossible case"
This reverts commit c46aab01c0.

This evidently blocks compiling in some cases that used to work
before. I'm also not fully convinced this is the correct place to fix
this problem.
2022-03-17 13:12:01 -04:00
Marco Elver b09439e20b [AtomicExpandPass][NFC] Reformat with clang-format
NFCI.
2022-03-17 16:58:16 +01:00
Jeremy Morse 12a2f7494e [DebugInfo][InstrRef] Prefer stack locations for variables
This patch adjusts what location is picked for a known variable value --
preferring to leave locations on the stack, even when a value is re-loaded
into a register. The benefit is reduced location list entropy, on a
clang-3.4 build I found that .debug_loclists reduces in size by 6%, from
29Mb down to 27Mb.

Testing: a few tests need the stack slot to be written to explicitly, to
force LiveDebugValues into restoring the variable location to a register.
I've added an explicit test for the desired behaviour in
livedebugvalues_recover_clobbers.mir .

Differential Revision: https://reviews.llvm.org/D120732
2022-03-17 14:26:15 +00:00
Heejin Ahn b8038a916d [WebAssembly] Disable SimplifyDemandedVectorElts after legalization
This fixes a reported bug that caused an infinite loop during the
SelectionDAG optimization phase in ISel, by creating an overridable hook
in `TargetLowering` that allows us to bail out from running
`SimplifyDemandedVectorElts`.

Reviewed By: tlively

Differential Revision: https://reviews.llvm.org/D121869
2022-03-16 20:52:43 -07:00
Marco Elver 555df03012 [SelectionDAG][NFC] Clean up SDCallSiteDbgInfo accessors
* Consistent naming: addCallSiteInfo vs. getCallSiteInfo;
* Use ternary operator to reduce verbosity;
* const'ify getters;
* Add comments;

NFCI.

Differential Revision: https://reviews.llvm.org/D121820
2022-03-16 17:46:06 +01:00
Shengchen Kan ac64d0d230 [NFC][CodeGen] Remove redundant if clause in TargetPassConfig::addPass 2022-03-16 22:14:23 +08:00
Shengchen Kan 37b378386e [NFC][CodeGen] Rename some functions in MachineInstr.h and remove duplicated comments 2022-03-16 20:25:42 +08:00
Matthias Gehre 09854f2af3 [SelectionDAG] Emit calls to __divei4 and friends for division/remainder of large integers
Emit calls to __divei4 and friends for divison/remainder of large integers.

This fixes https://github.com/llvm/llvm-project/issues/44994.

The overall RFC is in https://discourse.llvm.org/t/rfc-add-support-for-division-of-large-bitint-builtins-selectiondag-globalisel-clang/60329

The compiler-rt part is in https://reviews.llvm.org/D120327

Differential Revision: https://reviews.llvm.org/D120329
2022-03-16 09:36:28 +00:00
serge-sans-paille 989f1c72e0 Cleanup codegen includes
This is a (fixed) recommit of https://reviews.llvm.org/D121169

after:  1061034926
before: 1063332844

Discourse thread: https://discourse.llvm.org/t/include-what-you-use-include-cleanup
Differential Revision: https://reviews.llvm.org/D121681
2022-03-16 08:43:00 +01:00
Craig Topper 1bf4bbc492 [LegalizeTypes][RISCV][WebAssembly] Expand ABS in PromoteIntRes_ABS if it will expand to sra+xor+sub later.
If we promote the ABS and then Expand in LegalizeDAG, then both the
sra and the xor will have their inputs sign extended. This generates
extra code on RISCV which lacks an i8 or i16 sign extend instructon.
If we expand during type legalization, then only the sra will get its
input sign extended. RISCV is able to combine this with the sra by
doing a shift left followed by an sra.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D121664
2022-03-15 08:27:39 -07:00
Craig Topper ad94dfb9a0 [DAGCombiner][RISCV] Adjust (aext (and (trunc x), cst)) -> (and x, cst) to sext cst based on target preference
RISCV strong prefers i32 values be sign extended to i64. This combine
was always zero extending the constant using APInt methods.

This adjusts the code so that it calls getNode using ISD::ANY_EXTEND instead.
getNode will call TLI.isSExtCheaperThanZExt to decide how to handle
the constant.

Tests were copied from D121598 where I noticed that we were creating
constants that were hard to materialize.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D121650
2022-03-15 08:26:47 -07:00
Simon Pilgrim 7262eacd41 Revert rG9c542a5a4e1ba36c24e48185712779df52b7f7a6 "Lower `@llvm.global_dtors` using `__cxa_atexit` on MachO"
Mane of the build bots are complaining: Unknown command line argument '-lower-global-dtors'
2022-03-15 13:01:35 +00:00
Fangrui Song 252bc2b9f5 [MachineLICM] Simplify code and avoid adding nullptr values to ParentMap. NFC 2022-03-15 01:24:01 -07:00
Julian Lettner 9c542a5a4e Lower `@llvm.global_dtors` using `__cxa_atexit` on MachO
For MachO, lower `@llvm.global_dtors` into `@llvm_global_ctors` with
`__cxa_atexit` calls to avoid emitting the deprecated `__mod_term_func`.

Reuse the existing `WebAssemblyLowerGlobalDtors.cpp` to accomplish this.

Enable fallback to the old behavior via Clang driver flag
(`-fregister-global-dtors-with-atexit`) or llc / code generation flag
(`-lower-global-dtors-via-cxa-atexit`).  This escape hatch will be
removed in the future.

Differential Revision: https://reviews.llvm.org/D121327
2022-03-14 17:51:18 -07:00
Amara Emerson 8cbf18cb04 [GlobalISel] Fix store merging incorrectly merging volatile stores.
The existing volatile checks only handle aliasing hazards between stores,
but that isn't enough since by that point volatile stores may have already
been added to the current candidate group.
2022-03-14 13:48:51 -07:00
Kazu Hirata 9286786e87 [CodeGen] Remove an unused variable introduced in D121128 2022-03-14 11:41:04 -07:00
Mircea Trofin 294eca35a0 [regalloc] Remove -consider-local-interval-cost
Discussed extensively on D98232. The functionality introduced in D35816
never worked correctly. In D98232, it was fixed, but, as it was
introducing a large compile-time regression, and the value of the
original patch was called into doubt, we disabled it by default
everywhere. A year later, it appears that caused no grief, so it seems
safe to remove the disabled code.

This should be accompanied by re-opening bug 26810.

Differential Revision: https://reviews.llvm.org/D121128
2022-03-14 10:49:16 -07:00
Sanjay Patel c2592c374e [SDAG] simplify bitwise logic with repeated operand
We do not have general reassociation here (and probably
do not need it), but I noticed these were missing in
patches/tests motivated by D111530, so we can at
least handle the simplest patterns.

The VE test diff looks correct, but we miss that
pattern in IR currently:
https://alive2.llvm.org/ce/z/u66_PM
2022-03-13 11:12:30 -04:00
Wenlei He 4f320ca4ba [DebugInfo] Include DW_TAG_skeleton_unit when looking for parent UnitDie
`DIE::getUnitDie` looks up parent DIE until compile unit or type unit is found. However for skeleton CU with debug fission, we would have DW_TAG_skeleton_unit instead of DW_TAG_compile_unit as top level DIE.

This change fixes the look up so we can get DW_TAG_skeleton_unit as UnitDie for skeleton CU.

Differential Revision: https://reviews.llvm.org/D120610
2022-03-12 13:27:42 -08:00
serge-sans-paille ed98c1b376 Cleanup includes: DebugInfo & CodeGen
Discourse thread: https://discourse.llvm.org/t/include-what-you-use-include-cleanup
Differential Revision: https://reviews.llvm.org/D121332
2022-03-12 17:26:40 +01:00
Yuanfang Chen d538ad53c3 [JMCInstrument] infer proper path style based on debug info
By default, the path style is decided by the host. This patch makes JMC
uses the path style used by the SP directory. This makes JMC output
host-independent.

Fixes: https://github.com/llvm/llvm-project/issues/54219

Reviewed By: rnk

Differential Revision: https://reviews.llvm.org/D121236
2022-03-10 10:50:44 -08:00
Lorenzo Albano 28cfa764c2 [VP] Strided loads/stores
This patch introduces two new experimental IR intrinsics and SDAG nodes
to represent vector strided loads and stores.

Reviewed By: simoll

Differential Revision: https://reviews.llvm.org/D114884
2022-03-10 18:46:54 +01:00
Nico Weber a278250b0f Revert "Cleanup codegen includes"
This reverts commit 7f230feeea.
Breaks CodeGenCUDA/link-device-bitcode.cu in check-clang,
and many LLVM tests, see comments on https://reviews.llvm.org/D121169
2022-03-10 07:59:22 -05:00
serge-sans-paille 7f230feeea Cleanup codegen includes
after:  1061034926
before: 1063332844

Differential Revision: https://reviews.llvm.org/D121169
2022-03-10 10:00:30 +01:00
Xiang1 Zhang c31014322c TLS loads opimization (hoist)
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D120000
2022-03-10 09:29:06 +08:00
Stanislav Mekhanoshin 0be6fd44f3 [SDAG] Use MMO flags in MemSDNode folding
SDNodes with different target flags may now be folded together
rightfully resulting in the assertion in the refineAlignment.
Folding nodes with different target flags may result in the
wrong load instructions produced at least on the AMDGPU.

Fixes: SWDEV-326805

Differential Revision: https://reviews.llvm.org/D121335
2022-03-09 14:25:22 -08:00
Sanjay Patel 341623653d [SDAG] match rotate pattern with extra 'or' operation
This is another fold generalized from D111530.
We can find a common source for a rotate operation hidden inside an 'or':
https://alive2.llvm.org/ce/z/9pV8hn

Deciding when this is profitable vs. a funnel-shift is tricky, but this
does not show any regressions: if a target has a rotate but it does not
have a funnel-shift, then try to form the rotate here. That is why we
don't have x86 test diffs for the scalar tests that are duplicated from
AArch64 ( 74a65e3834 ) - shld/shrd are available. That also makes it
difficult to show vector diffs - the only case where I found a diff was
on x86 AVX512 or XOP with i64 elements.

There's an additional check for a legal type to avoid a problem seen
with x86-32 where we form a 64-bit rotate but then it gets split
inefficiently. We might avoid that by adding more rotate folds, but
I didn't check to see what is missing on that path.

This gets most of the motivating patterns for AArch64 / ARM that are in
D111530.

We still need a couple of enhancements to setcc pattern matching with
rotate/funnel-shift to get the rest.

Differential Revision: https://reviews.llvm.org/D120933
2022-03-09 13:19:00 -05:00
Thomas Preud'homme 67c14d5c69 [MachinePipeliner] Fix isPseduo typo. 2022-03-09 15:26:39 +00:00
Tom Stellard fb616c9b31 SafeStack: Re-enable SafeStack coloring optimization
This was disabled in 2acea2786b as a
work-around for Issue #31491.  I've reduced the test case from that bug
and confirmed that it is now fixed.

Reviewed By: eugenis

Differential Revision: https://reviews.llvm.org/D120866
2022-03-08 15:10:41 -08:00
Craig Topper 29511ec7da [LegalizeTypes][VP] Add widening and splitting support for VP_FMA.
Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D120854
2022-03-08 09:59:59 -08:00
Craig Topper c392b9924e [LegalizeTypes][VP] Add splitting and widening support for VP_FNEG.
Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D120785
2022-03-08 09:59:34 -08:00
Fraser Cormack 17310f3d19 [SelectionDAG][NFC] Address a few clang-tidy warnings
Fix a couple of else-after-return warnings and some unnecessary
parentheses.
2022-03-08 16:22:26 +00:00
Yuanfang Chen eddd94c27d Reland "[clang][debug] port clang-cl /JMC flag to ELF"
This relands commit 7313474319.

It failed on Windows/Mac because `-fjmc` is only checked for ELF targets.
Check the flag unconditionally instead and issue a warning for non-ELF targets.
2022-03-07 21:55:41 -08:00
Yuanfang Chen f46fa4de4a Revert "[clang][debug] port clang-cl /JMC flag to ELF"
This reverts commit 7313474319.

Break bots:
http://45.33.8.238/win/54551/step_7.txt
http://45.33.8.238/macm1/29590/step_7.txt
2022-03-07 12:40:43 -08:00
Craig Topper 8e132c5c1d [LegalizeTypes][ARM][X86] Change ExpandIntRes_ABS to use sra+xor+sub.
Previously we used sra+add+xor if ADDCARRY is supported. This changes
to sra+xor+sub is SUBCARRY is available.

This is consistent with the recent change to the default expansion
in LegalizeDAG.

Differential Revision: https://reviews.llvm.org/D121039
2022-03-07 11:28:32 -08:00
Yuanfang Chen 7313474319 [clang][debug] port clang-cl /JMC flag to ELF
The motivation is to enable the MSVC-style JMC instrumentation usable by a ELF-based
debugger. Since there is no prior experience implementing JMC feature for ELF-based
debugger, it might be better to just reuse existing MSVC-style JMC instrumentation.
For debuggers that support both ELF&COFF (like lldb), the JMC implementation might
be shared between ELF&COFF. If this is found to inadequate, it is pretty low-cost
switching to alternatives.

Implementation:
- The '-fjmc' is already a driver and cc1 flag. Wire it up for ELF in the driver.
- Refactor the JMC instrumentation pass a little bit.
- The ELF handling is different from MSVC in two places:
  * the flag section name is ".just.my.code" instead of ".msvcjmc"
  * the way default function is provided: MSVC uses /alternatename; ELF uses weak function.

Based on D118428.

Reviewed By: rnk

Differential Revision: https://reviews.llvm.org/D119910
2022-03-07 10:16:24 -08:00
David Green 4388f4f776 [DAG] Don't convert undef to 0 when creating buildvector
When inserting undef into buildvectors created from shuffles of
buildvectors, we convert elements to the largest needed type. This had
the effect of converting undef into 0, which isn't needed as the
buildvector implicitly truncates and trunc(zext(undef)) == undef.

Differential Revision: https://reviews.llvm.org/D121002
2022-03-06 18:35:34 +00:00
Sanjay Patel f4b53972ce [SDAG] fold bitwise logic with shifted operands
This extends acb96ffd14 to 'and' and 'xor' opcodes.

Copying from that message:

LOGIC (LOGIC (SH X0, Y), Z), (SH X1, Y) --> LOGIC (SH (LOGIC X0, X1), Y), Z

https://alive2.llvm.org/ce/z/QmR9rR

This is a reassociation + factoring fold. The common shift operation is moved
after a bitwise logic op on 2 input operands.
We get simpler cases of these patterns in IR, but I suspect we would miss all
of these exact tests in IR too. We also handle the simpler form of this plus
several other folds in DAGCombiner::hoistLogicOpWithSameOpcodeHands().
2022-03-05 11:14:45 -05:00
Jeremy Morse 0e96d95d13 [DebugInfo][InstrRef] Accept register-reads after isel in any block
When lowering LLVM-IR to instruction referencing stuff, if a value is
defined by a COPY, we try and follow the register definitions back to where
the value was defined, and build an instruction reference to that
instruction. In a few scenarios (such as arguments), this isn't possible.
I added some assertions to catch cases that weren't explicitly whitelisted.

Over the course of a few months, several more scenarios have cropped up,
the lastest is the llvm.read_register intrinsic, which lets LLVM-IR read an
arbitary register at any point. In the face of this, there's little point
in validating whether debug-info reads a register in an expected scenario.
Thus: this patch just deletes those assertions, and adds a regression test
to check that something is done with the llvm.read_register intrinsic.

Fixes #54190

Differential Revision: https://reviews.llvm.org/D121001
2022-03-04 17:01:12 +00:00
Paul Walker 42b4a6227e [DAGCombine] Prevent illegal ISD::SPLAT_VECTOR operations post legalisation.
When triggered during operation legalisation the affected combine
generates a splat_vector that when custom lowered for SVE fixed
length code generation, results in the original precombine sequence
and thus we enter a legalisation/combine hang.

NOTE: The patch contains no tests because I observed this issue
only when combined with other work that might never become public.
The current way AArch64 lowers ISD::SPLAT_VECTOR meant a specific
test was not possible so I'm hoping the DAGCombiner fix can be seen
as obvious. The AArch64ISelLowering change is requirted to maintain
existing code quality.

Differential Revision: https://reviews.llvm.org/D120735
2022-03-04 11:54:03 +00:00
Maksim Panchenko 7e570308f2 [NFC] Fix typos
Reviewed By: yota9, Amir

Differential Revision: https://reviews.llvm.org/D120859
2022-03-03 13:26:39 -08:00
Vasileios Porpodas 6f9640d6a3 [RegAlloc] Add a complexity limit in growRegion() to cap compilation time.
growRegion() does not scale in code with BBs with a very large number of edges.
In such code growRegion() becomes a compile-time bottleneck, consuming 60% of
the total compilation time.
This patch adds a limit to the complexity of growRegion() by incrementing a counter
in each iteration. We bail out once the limit is reached.

Differential Revision: https://reviews.llvm.org/D120752
2022-03-03 11:31:07 -08:00
Paul Robinson 7b85f0f32f [PS4] isPS4 and isPS4CPU are not meaningfully different 2022-03-03 11:36:59 -05:00
Sanjay Patel e9302bf7ef [SDAG] try harder to remove a rotate from X == 0
https://alive2.llvm.org/ce/z/mJP7XP

This can be viewed as expanding the compare into and/or-of-compares:
https://alive2.llvm.org/ce/z/bkZYWE
followed by reduction of each compare.

This could be extended in several ways:
1. There's a (X & Y) == -1 sibling.
2. We can recurse through more than 1 'or'.
3. The fold could be generalized beyond rotates - any operation that
   only changes the order of bits (bswap, bitreverse).

This is a transform noted in D111530.
2022-03-03 09:25:46 -05:00
Sanjay Patel c33dbc2a2d [SDAG] refactor foldSetCCWithRotate; NFC
There are more potential optimizations to make here,
so rearrange to make it easier to append those.
2022-03-02 16:42:05 -05:00
Craig Topper ab7a7cc1dd Revert "[LegalizeTypes][VP] Add splitting and widening support for VP_FNEG."
This reverts commit ac93f95861.

Committed by accident.
2022-03-02 10:00:22 -08:00
Craig Topper 324c0a7206 [SelectionDAG][RISCV] Emit a canonical sign bit test from ExpandIntRes_ABS.
Instead of emitting 0 > Hi, emit Hi < 0. If Hi needs to be expanded again
this will allow the special case for sign bit tests in ExpandIntOp_SETCC
to trigger.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D120761
2022-03-02 09:47:26 -08:00
Craig Topper ac93f95861 [LegalizeTypes][VP] Add splitting and widening support for VP_FNEG.
Differential Revision: https://reviews.llvm.org/D120785
2022-03-02 09:47:05 -08:00
Daniel McIntosh d636b76eca [CodeGen] Use AdjustStackOffset for Callee Saved Registers in PEI::calculateFrameObjectOffsets
Also, changes how the CSR loop is indexed, which should avoid bugs like the one fixed by rG4a57bb5a3b74bdad9b0518009a7d7ac7ca2ac650

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D120668
2022-03-02 11:41:12 -05:00
Nikita Popov 6fde043951 [MachineSink] Disable if there are any irreducible cycles
This is an alternative to D120330, which disables MachineSink for
functions with irreducible cycles entirely. This avoids both the
correctness problem, and ensures we don't perform non-profitable
sinks into cycles. At the same time, it may also disable
profitable sinks in the same function. This can be made more
precise by using MachineCycleInfo in the future.

Fixes https://github.com/llvm/llvm-project/issues/53990.

Differential Revision: https://reviews.llvm.org/D120800
2022-03-02 16:57:29 +01:00
Simon Pilgrim 5cce97d61e [DAG] isSplatValue - improve ISD::VECTOR_SHUFFLE splat detection
Currently we only check for splat shuffles, this extends it to see if the source operand is a splat across the demanded elts based upon the shuffle mask
2022-03-02 15:32:24 +00:00
spupyrev bcdc047731 speeding up ext-tsp for huge instances
Differential Revision: https://reviews.llvm.org/D120780
2022-03-02 07:17:48 -08:00
Simon Pilgrim df0a2b4f30 [DAG] SelectionDAG::isSplatValue - add initial BITCAST handling
This patch adds support for recognising vector splats by peeking through bitcasts to vectors with smaller element types - if all the offset subelements are splats then the bitcasted vector is a splat as well.

We don't have great coverage for isSplatValue so I've made this pretty specific to the use case I'm trying to fix - regressions in some vXi64 vector shift by splat cases that 32-bit x86 doesn't recognise because the shift amount buildvector has been type legalised to v2Xi32.

We can add further support (floats, bitcast from larger element types, undef elements) when we have actual test coverage.

Differential Revision: https://reviews.llvm.org/D120553
2022-03-02 11:25:51 +00:00
Xiang1 Zhang 65588a0776 Revert "TLS loads opimization (hoist)"
Revert for more reviews

This reverts commit 30e612ebdf.
2022-03-02 14:10:11 +08:00
Mircea Trofin cb2160760e [nfc][codegen] Move RegisterBank[Info].h under CodeGen
This wraps up from D119053. The 2 headers are moved as described,
fixed file headers and include guards, updated all files where the old
paths were detected (simple grep through the repo), and `clang-format`-ed it all.

Differential Revision: https://reviews.llvm.org/D119876
2022-03-01 21:53:25 -08:00
Xiang1 Zhang 30e612ebdf TLS loads opimization (hoist)
Reviewed By: Wang Pheobe, Topper Craig

Differential Revision: https://reviews.llvm.org/D120000
2022-03-02 10:37:24 +08:00
Craig Topper 8787726609 [LegalizeTypes] Remove incomplete StrictFP support from SplitVecRes_UnaryOp. NFC
There is no handling of Chain operands in this function so it can't
work. There's a separate splitting function for all strict fp nodes.
2022-03-01 15:43:57 -08:00
Zequan Wu 5c9e20d7d0 [PDB] Add char8_t type
Differential Revision: https://reviews.llvm.org/D120690
2022-03-01 13:39:51 -08:00
serge-sans-paille a494ae43be Cleanup includes: TransformsUtils
Estimation on the impact on preprocessor output:
before: 1065307662
after:  1064800684

Discourse thread: https://discourse.llvm.org/t/include-what-you-use-include-cleanup
Differential Revision: https://reviews.llvm.org/D120741
2022-03-01 21:00:07 +01:00
Craig Topper bf8054644d [DAGCombiner] Don't expand (neg (abs x)) if the abs has an additional user.
If the types aren't legal, the expansions may get type legalized in a
different way preventing code sharing. If the type is legal, we will
share some instructions between the two expansions, but we will need an
extra register.

Since we don't appear to fold (neg (sub A, B)) if the sub has an
additional user, I think it makes sense not to expand NABS.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D120513
2022-03-01 07:32:07 -08:00
Jeremy Morse ab49dce01f [DebugInfo][InstrRef][NFC] Use unique_ptr instead of raw pointers
InstrRefBasedLDV allocates some big tables of ValueIDNum, to store live-in
and live-out block values in, that then get passed around as pointers
everywhere. This patch wraps the allocation in a std::unique_ptr, names
some types based on unique_ptr, and passes references to those around
instead. There's no functional change, but it makes it clearer to the
reader that references to these tables are borrowed rather than owned, and
we get some extra validity assertions too.

Differential Revision: https://reviews.llvm.org/D118774
2022-03-01 12:49:50 +00:00
Sam Parker 20d75059a2 Revert "[TypePromotion] Avoid some unnecessary truncs"
This reverts commit 281d29b8fe.

Report of a miscompilation and awaiting a reproducer.
2022-03-01 08:59:52 +00:00
Phoebe Wang e03d216c28 [X86] Use bit test instructions to optimize some logic atomic operations
This is to match GCC's optimizations: https://gcc.godbolt.org/z/3odh9e7WE

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D120199
2022-03-01 09:57:08 +08:00
Sanjay Patel 69684b84c6 [SDAG] fold (rotate X) eq/ne (0/-1)
This is the SDAG equivalent of an instcombine transform added with:
fd807601a7

This is another step towards solving #49541 and part of an alternative
set of more general transforms than what is proposed in D111530.

https://alive2.llvm.org/ce/z/ToxaE8
2022-02-27 11:31:19 -05:00
Sanjay Patel acb96ffd14 [SDAG] fold bitwise logic with shifted operands
LOGIC (LOGIC (SH X0, Y), Z), (SH X1, Y) --> LOGIC (SH (LOGIC X0, X1), Y), Z

https://alive2.llvm.org/ce/z/QmR9rR

This is a reassociation + factoring fold. The common shift operation is moved
after a bitwise logic op on 2 input operands.
We get simpler cases of these patterns in IR, but I suspect we would miss all
of these exact tests in IR too. We also handle the simpler form of this plus
several other folds in DAGCombiner::hoistLogicOpWithSameOpcodeHands().

This is a partial implementation of a transform suggested in D111530
(only handles 'or' bitwise logic as a first step - need to stamp out more
tests for other opcodes).
Several of the same tests added for D111530 are altered here (but not
fully optimized). I'm not sure yet if this would help/hinder that patch,
but this should be an improvement for all tests added with ecf606cb43
since it removes a shift operation in those examples.

Differential Revision: https://reviews.llvm.org/D120516
2022-02-27 09:54:12 -05:00
Simon Pilgrim fadd20f80d [DAG] Ensure type is legal for bswap(shl(x,c)) -> zext(bswap(trunc(shl(x,c-bw/2)))) fold
As reported on D120192
2022-02-27 11:25:22 +00:00
Benjamin Kramer 1de11fe360 Use RegisterInfo::regsOverlaps instead of checking aliases
This is both less code and faster since it doesn't have to expand all
the sub & superreg sets. NFCI.
2022-02-26 20:32:12 +01:00
Jameson Nash c4b1a63a1b mark getTargetTransformInfo and getTargetIRAnalysis as const
Seems like this can be const, since Passes shouldn't modify it.

Reviewed By: wsmoses

Differential Revision: https://reviews.llvm.org/D120518
2022-02-25 14:30:44 -05:00
Rong Xu ccbbb4f6c7 [Sample-PGO] Emit FS discriminators only when -fdebug-info-for-profiling is set
IR level addDiscriminator pass is guarded by DebugInfoForProfiling
(set by option -fdebug-info-for-profiling).
This patch syncs the logic for the MIR and IR level implementations.

Differential Revision: https://reviews.llvm.org/D120536
2022-02-25 09:41:17 -08:00
Nikita Popov 87ebd9a36f [IR] Use CallBase::getParamElementType() (NFC)
As this method now exists on CallBase, use it rather than the
one on AttributeList.
2022-02-25 10:01:58 +01:00
Rahman Lavaee aeec9671fb Revert "Encode address offsets of basic blocks relative to the end of the previous basic blocks."
This reverts commit 029283c1c0.
The code in `ELFFile::decodeBBAddrMap` was not changed in the submitted patch.

Differential Revision: https://reviews.llvm.org/D120457
2022-02-24 13:31:15 -08:00
Simon Pilgrim 370ebc9d9a [DAG] Attempt to fold bswap(shl(x,c)) -> zext(bswap(trunc(shl(x,c-bw/2))))
If the shl is at least half the bitwidth (i.e. the lower half of the bswap source is zero), then we can reduce the shift and perform the bswap at half the bitwidth and just zero extend.

Based off PR51391 + PR53867

Differential Revision: https://reviews.llvm.org/D120192
2022-02-24 19:33:51 +00:00
Sanjay Patel 4a3708cd6b [SDAG] remove shift that is redundant with part of funnel shift
This is the SDAG translation of D120253 :
https://alive2.llvm.org/ce/z/qHpmNn

The SDAG nodes can have different operand types than the result value.
We can see an example of that with AArch64 - the funnel shift amount
is an i64 rather than i32.

We may need to make that match even more flexible to handle
post-legalization nodes, but I have not stepped into that yet.

Differential Revision: https://reviews.llvm.org/D120264
2022-02-24 11:25:46 -05:00
Jay Foad 719bac55df [MIRParser] Diagnose too large align values in MachineMemOperands
When parsing MachineMemOperands, MIRParser treated the "align" keyword
the same as "basealign". Really "basealign" should specify the
alignment of the MachinePointerInfo base value, and "align" should
specify the alignment of that base value plus the offset.

This worked OK when the specified alignment was no larger than the
alignment of the offset, but in cases like this it just caused
confusion:

    STW killed %18, 4, %stack.1.ap2.i.i :: (store (s32) into %stack.1.ap2.i.i + 4, align 8)

MIRPrinter would never have printed this, with an offset of 4 but an
align of 8, so it must have been written by hand. MIRParser would
interpret "align 8" as "basealign 8", but I think it is better to give
an error and force the user to write "basealign 8" if that is what they
really meant.

Differential Revision: https://reviews.llvm.org/D120400

Change-Id: I7eeeefc55c2df3554ba8d89f8809a2f45ada32d8
2022-02-24 15:32:08 +00:00
Matthias Braun 6a383369f9 PGOInstrumentation, GCOVProfiling: Split indirectbr critical edges regardless of PHIs
The `SplitIndirectBrCriticalEdges` function was originally designed for
`CodeGenPrepare` and skipped splitting of edges when the destination
block didn't contain any `PHI` instructions. This only makes sense when
reducing COPYs like `CodeGenPrepare`. In the case of
`PGOInstrumentation` or `GCOVProfiling` it would result in missed
counters and wrong result in functions with computed goto.

Differential Revision: https://reviews.llvm.org/D120096
2022-02-23 16:27:37 -08:00
Craig Topper c7d6448d03 [DAGCombiner][TargetLowering] Pass SDValue by value to isMulAddWithConstProfitable.
Internally to DAGCombiner the SDValues were passed by non-const
reference despite not being modified. They were then passed by
const reference to TLI.

This patch passes them by value which is consistent with the vast
majority of code.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D120420
2022-02-23 12:40:45 -08:00
Pawe Bylica afdaa86b77
[DAGCombine] Extend combineCarryDiamond()
In combineCarryDiamond() use getAsCarry() to find more candidates for being a carry flag.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D118362
2022-02-23 21:37:49 +01:00
Jessica Paquette 68c718c8f4 Revert "[MachineOutliner][AArch64] NFC: Split MBBs into "outlinable ranges""
This reverts commit d97f997eb7.

This commit was not NFC.

(See: https://reviews.llvm.org/rGd97f997eb79d91b2872ac13619f49cb3a7120781)
2022-02-23 10:35:52 -08:00
Sanjay Patel 21d7c3bcc6 [DAG] try to convert multiply to shift via demanded bits
This is a fix for a regression discussed in:
https://github.com/llvm/llvm-project/issues/53829

We cleared more high multiplier bits with 995d400,
but that can lead to worse codegen because we would fail
to recognize the now disguised multiplication by neg-power-of-2
as a shift-left. The problem exists independently of the IR
change in the case that the multiply already had cleared high
bits. We also convert shl+sub into mul+add in instcombine's
negator.

This patch fills in the high-bits to see the shift transform
opportunity. Alive2 attempt to show correctness:
https://alive2.llvm.org/ce/z/GgSKVX

The AArch64, RISCV, and MIPS diffs look like clear wins. The
x86 code requires an extra move register in the minimal examples,
but it's still an improvement to get rid of the multiply on all
CPUs that I am aware of (because multiply is never as fast as a
shift).

There's a potential follow-up noted by the TODO comment. We
should already convert that pattern into shl+add in IR, so
it's probably not common:
https://alive2.llvm.org/ce/z/7QY_Ga

Fixes #53829

Differential Revision: https://reviews.llvm.org/D120216
2022-02-23 12:09:32 -05:00
Rainer Orth 365be7ac72 [MC][ELF] Use SHF_SUNW_NODISCARD instead of SHF_GNU_RETAIN on Solaris
As requested in D107955 <https://reviews.llvm.org/D107955>, this patch
splits off the `MC` and `CodeGen` parts and adds a testcase.

Tested on `sparcv9-sun-solaris2.11`, `amd64-pc-solaris2.11`, and
`x86_64-pc-linux-gnu`.

Differential Revision: https://reviews.llvm.org/D120318
2022-02-23 15:43:12 +01:00
Bill Wendling a5bbc6ef99 [NFC] Remove unnecessary "#include"s from header files 2022-02-23 01:20:48 -08:00
Rahman Lavaee 029283c1c0 Encode address offsets of basic blocks relative to the end of the previous basic blocks.
Conceptually, the new encoding emits the offsets and sizes as label differences between each two consecutive basic block begin and end label. When decoding, the offsets must be aggregated along with basic block sizes to calculate the final relative-to-function offsets of basic blocks.

This encoding uses smaller values compared to the existing one (offsets relative to function symbol).
Smaller values tend to occupy fewer bytes in ULEB128 encoding. As a result, we get about 25% reduction
in the size of the bb-address-map section (reduction from about 9MB to 7MB).

Reviewed By: tmsriram, jhenderson

Differential Revision: https://reviews.llvm.org/D106421
2022-02-22 15:46:46 -08:00
Jay Foad b47e2dc91f [StableHashing] Hash machine basic blocks and functions
This adds very basic support for hashing MachineBasicBlock
and MachineFunction, for use in MachineFunctionPass to
detect passes that modify the MachineFunction wrongly.

Differential Revision: https://reviews.llvm.org/D120122
2022-02-22 17:38:47 +00:00
Joseph Huber 456ffd7a22 [OpenMP] Ensure offloading sections do not have SHF_ALLOC flag
We use offloading sections in the new Clang driver scheme to embed
device code into the host. We later use these sections to link the
device image, after which point they are completely unused and should
not be loaded into memory if they are still in the executable.

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D120275
2022-02-21 21:35:17 -05:00
Jessica Paquette d97f997eb7 [MachineOutliner][AArch64] NFC: Split MBBs into "outlinable ranges"
We found a case in the Swift benchmarks where the MachineOutliner introduces
about a 20% compile time overhead in comparison to building without the
MachineOutliner.

The origin of this slowdown is that the benchmark has long blocks which incur
lots of LRU checks for lots of candidates.

Imagine a case like this:

```
bb:
  i1
  i2
  i3
  ...
  i123456
```

Now imagine that all of the outlining candidates appear early in the block, and
that something like, say, NZCV is defined at the end of the block.

The outliner has to check liveness for certain registers across all candidates,
because outlining from areas where those registers are used is unsafe at call
boundaries.

This is fairly wasteful because in the previously-described case, the outlining
candidates will never appear in an area where those registers are live.

To avoid this, precalculate areas where we will consider outlining from.
Anything outside of these areas is mapped to illegal and not included in the
outlining search space. This allows us to reduce the size of the outliner's
suffix tree as well, giving us a potential memory win.

By precalculating areas, we can also optimize other checks too, like whether
or not LR is live across an outlining candidate.

Doing all of this is about a 16% compile time improvement on the case.

This is likely useful for other targets (e.g. ARM + RISCV) as well, but for now,
this only implements the AArch64 path. The original "is the MBB safe" method
still works as before.
2022-02-21 15:29:16 -08:00
Paweł Bylica df0c16ce00
[NFC][DAGCombine] Use isOperandOf() in combineCarryDiamond
Pre-commit for https://reviews.llvm.org/D118362.
2022-02-21 21:41:31 +01:00
Matt Arsenault 9c7ca51b2c MIR: Start diagnosing too many operands on an instruction
Previously this would just assert which was annoying and didn't point
to the specific instruction/operand.
2022-02-21 10:36:39 -05:00
Simon Pilgrim 46f1e8359e [DAG] visitBSWAP - pull out repeated SDLoc. NFC
Cleanup for D120192
2022-02-21 13:08:01 +00:00
Jay Foad 9a547e7009 [StableHashing] Hash vregs with multiple defs
This allows stableHashValue to be used on Machine IR that is
not in SSA form.

Differential Revision: https://reviews.llvm.org/D120121
2022-02-21 10:26:34 +00:00
Craig Topper 440c4b705a [SelectionDAG][RISCV][ARM][PowerPC][X86][WebAssembly] Change default abs expansion to use sra (X, size(X)-1); sub (xor (X, Y), Y).
Previous we used sra (X, size(X)-1); xor (add (X, Y), Y).

By placing sub at the end, we allow RISCV to combine sign_extend_inreg
with it to form subw.

Some X86 tests for Z - abs(X) seem to have improved as well.

Other targets look to be a wash.

I had to modify ARM's abs matching code to match from sub instead of
xor. Maybe instead ISD::ABS should be made legal. I'll try that in
parallel to this patch.

This is an alternative to D119099 which was focused on RISCV only.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D119171
2022-02-20 21:11:23 -08:00
Chen Zheng efe5b8ad90 [ISEL] remove unnecessary getNode(); NFC
Reviewed By: RKSimon, craig.topper

Differential Revision: https://reviews.llvm.org/D120049
2022-02-20 21:08:49 -05:00
Luo, Yuanke 67ef63138b [SDAG] enable binop identity constant folds for sub
This patch extract the sub folding from D119654 and leave only add
folding in that patch.

Differential Revision: https://reviews.llvm.org/D120116
2022-02-21 09:37:36 +08:00
David Blaikie 323c672789 DebugInfo: Add an assert about cross-unit references in dwo units
This is helping me debug some issues with simplified template names
2022-02-20 14:53:17 -08:00
Amara Emerson b09e63bad1 [AArch64][GlobalISel] Implement combines for boolean G_SELECT->bitwise ops.
Differential Revision: https://reviews.llvm.org/D117160
2022-02-20 00:53:09 -08:00
Craig Topper 24bfa24355 [SelectionDAGBuilder] Simplify visitShift. NFC
This code was detecting whether the value returned by getShiftAmountTy
can represent all shift amounts. If not, it would use MVT::i32 as a
placeholder. getShiftAmountTy was updated last year to return i32
if the type returned by the target couldn't represent all values.

This means the MVT::i32 case here is dead and can the logic can
be simplified.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D120164
2022-02-19 12:40:59 -08:00
Craig Topper 1df8efae56 [SelectionDAG][X86] Support f16 in getReciprocalOpName.
If the "reciprocal-estimates" attribute is present and it doesn't
contain "all", "none", or "default", we previously crashed on f16
operations.

This patch addes an 'h' suffix' to prevent the crash.

I've added simple tests that just enable the estimate for all
vec-sqrt and one test case that explicitly tests the new 'h' suffix
to override the default steps.

There may be some frontend change needed to, but I haven't checked
that yet.

Reviewed By: pengfei

Differential Revision: https://reviews.llvm.org/D120158
2022-02-18 21:55:49 -08:00
Craig Topper 8e7247a377 [SelectionDAG] Fix off by one error in range check in DAGTypeLegalizer::ExpandShiftByConstant.
The code was considering shifts by an about larger than the number of
bits in the original VT to be out of range. Shifts exactly equal to
the original bit width are also out of range.

I don't know how to test this. DAGCombiner should usually fold this
away. I just noticed while looking for something else in this code. The
llvm-cov report shows that we don't have coverage for out of range shifts here.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D120170
2022-02-18 18:42:20 -08:00
Craig Topper 0d59a54cea Revert "[SelectionDAG][X86] Support f16 in getReciprocalOpName."
This reverts commit 86b5e25662.

This wasn't supposed to be commited yet
2022-02-18 15:39:50 -08:00
Craig Topper 04f815c26f [SelectionDAGBuilder] Remove LegalTypes=false from a call to getShiftAmountConstant.
getShiftAmountTy will return MVT::i32 if the shift amount
coming from the target's getScalarShiftAmountTy can't reprsent
all possible values. That should eliminate the need to use the
pointer type which is what we do when LegalTypes is false.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D120165
2022-02-18 15:36:35 -08:00
Craig Topper 86b5e25662 [SelectionDAG][X86] Support f16 in getReciprocalOpName.
If the "reciprocal-estimates" attribute is present and it doesn't
contain "all", "none", or "default", we previously crashed on f16
operations.

This patch addes an 'h' suffix' to prevent the crash.

I've added simple tests that just enable the estimate for all
vec-sqrt and one test case that explicitly tests the new 'h' suffix
to override the default steps.

There may be some frontend change needed to, but I haven't checked
that yet.

Differential Revision: https://reviews.llvm.org/D120158
2022-02-18 15:36:35 -08:00
Sanjay Patel a2963d871e [SDAG] fold sub-of-shift to add-of-shift
This fold is done in IR:
https://alive2.llvm.org/ce/z/jWyFrP

There is an x86 test that shows an improvement
from the added flexibility of using add (commutative).

The other diffs are presumed neutral.

Note that this could also be folded to an 'xor',
but I'm not sure if that would be universally better
(eg, x86 can convert adds more easily into LEA).

This helps prevent regressions from a potential fold for
issue #53829.
2022-02-18 11:55:50 -05:00
Jay Foad 074d1e2536 [CodeGen] Return better Changed status from PostRAHazardRecognizer
Differential Revision: https://reviews.llvm.org/D119954
2022-02-18 09:46:24 +00:00
Jessica Paquette 12389e3758 [MachineOutliner] Add statistics for unsigned vector size
Useful for debugging + evaluating improvements to the outliner.

Stats are the number of illegal, legal, and invisible instructions in the
unsigned vector, and it's total length.
2022-02-17 18:25:51 -08:00
Heejin Ahn 4f9b839772 [WebAssembly] Make EH/SjLj vars unconditionally thread local
This makes three thread local variables (`__THREW__`, `__threwValue`,
and `__wasm_lpad_context`) unconditionally thread local. If the target
doesn't support TLS, they will be downgraded to normal variables in
`stripThreadLocals`. This makes the object not linkable with other
objects using shared memory, which is what we intend here; these
variables should be thread local when used with shared memory. This is
what we initially tried in D88262.

But D88323 changed this: It only created these variables when threads
were supported, because `__THREW__` and `__threwValue` were always
generated even if Emscripten EH/SjLj was not used, making all objects
built without threads not linkable with shared memory, which was too
restrictive. But sometimes this is not safe. If we build an object using
variables such as `__THREW__` without threads, it can be linked to other
objects using shared memory, because the original object's `__THREW__`
was not created thread local to begin with.

So this CL basically reverts D88323 with some additional improvements:
- This checks each of the functions and global variables created within
  `LowerEmscriptenEHSjLj` pass and removes it if it's not used at the
  end of the pass. So only modules using those variables will be
  affected.
- Moves `CoalesceFeaturesAndStripAtomics` and `AtomicExpand` passes
  after all other IR pasess that can create thread local variables. It
  is not sufficient to move them to the end of `addIRPasses`, because
  `__wasm_lpad_context` is created in `WasmEHPrepare`, which runs inside
  `addPassesToHandleExceptions`, which runs before `addISelPrepare`. So
  we override `addISelPrepare` and move atomic/TLS stripping and
  expanding passes there.

This also removes merges `TLS` and `NO-TLS` FileCheck lines into one
`CHECK` line, because in the bitcode level we always create them as
thread local. Also some function declarations are deleted `CHECK` lines
because they are unused.

Reviewed By: tlively, sbc100

Differential Revision: https://reviews.llvm.org/D120013
2022-02-17 16:04:18 -08:00
Matt Arsenault c46aab01c0 RegAllocGreedy: Fix last chance recolor assert in impossible case
This example is not compilable without handling eviction of specific
subregisters. Last chance recoloring was deciding it could try
evicting an overlapping superregister, which doesn't help make any
progress. The LiveIntervalUnion would then assert due to an
overlapping / identical range when trying the new assignment.

Unfortunately this is also producing a verifier error after the
allocation fails. I've seen a number of these, and not sure if we
should just start deleting the function on error rather than trying to
figure out how to put together valid MIR.

I'm not super confident this is the right place to fix this. I also
have a number of failing testcases I need to fix by handling partial
evictions of superregisters.
2022-02-17 18:30:56 -05:00
Paul Walker 6457f42bde [DAGCombiner] Extend ISD::ABDS/U combine to handle more cases.
The current ABD combine doesn't quite work for SVE because only a
single scalable vector per scalar integer type is legal (e.g. for
i32, <vscale x 4 x i32> is the only legal scalable vector type).

This patch extends the combine to also trigger for the cases when
operand extension must be retained.

Differential Revision: https://reviews.llvm.org/D115739
2022-02-17 13:32:20 +00:00
Bjorn Pettersson 1a8bdf95a3 [DAG] Fix in ReplaceAllUsesOfValuesWith
When doing SelectionDAG::ReplaceAllUsesOfValuesWith a worklist is
prepared containing all users that should be updated. Then we use
the RemoveNodeFromCSEMaps/AddModifiedNodeToCSEMaps helpers to handle
recursive CSE updates while doing the replacements.

This patch aims at solving a problem that could arise if the recursive
CSE updates would result in an SDNode present in the worklist is being
removed as a side-effect of morphing a prio user in the worklist.

To examplify such a scenario, imagine that we have these nodes in
the DAG
   t12: i64 = add t8, t11
   t13: i64 = add t12, t8
   t14: i64 = add t11, t11
   t15: i64 = add t14, t8
   t16: i64 = sub t13, t15
and that the t8 uses should be replaced by t11. An initial worklist
(listing the users that should be morphed) could be [t12, t13, t15].
When updating t12 we get
   t12: i64 = add t11, t11
which results in a CSE update that replaces t14 by t12, so we get
   t15: i64 = add t12, t8
which results in a CSE update that replaces t13 by t12, so we get
   t16: i64 = sub t12, t15
and then t13 is removed given that it was the last use of t13.

So when being done with the updates triggered by rewriting the use
of t8 in t12 the t13 node no longer exist. And we used to end up
hitting an assertion when continuing with the worklist aiming at
replacing the t8 uses in t13.

The solution is based on using a DAGUpdateListener, making sure that
we prune a user from the worklist if it is removed during the
recursive CSE updates.

The bug was found using an OOT target. I think the problem is quite
old, even if the particular intree target reproducer added in this
patch seem to pass when using LLVM 13.0.0.

Differential Revision: https://reviews.llvm.org/D119088
2022-02-17 14:29:59 +01:00
Jay Foad 50ddb5d2d1 [CodeGen] Return better Changed status from LocalStackSlotAllocation
Differential Revision: https://reviews.llvm.org/D119942
2022-02-17 09:31:41 +00:00
Jay Foad f0092f9ded [CodeGen] Return false from LiveIntervals::runOnMachineFunction
This is an analysis pass so it does not modify the MachineFunction.

Differential Revision: https://reviews.llvm.org/D119941
2022-02-17 09:31:41 +00:00
Jay Foad 3c9229c663 [CodeGen] Return better Changed status from DetectDeadLanes
Differential Revision: https://reviews.llvm.org/D119940
2022-02-17 09:31:41 +00:00
Heejin Ahn c60d822965 [WebAssembly] Make __wasm_lpad_context thread-local
This makes `__wasm_lpad_context`, a struct that is used as a
communication channel between compiler-generated code and personality
function in libunwind, thread local. The library code will be changed to
thread local in the emscripten side.

Reviewed By: sbc100, tlively

Differential Revision: https://reviews.llvm.org/D119803
2022-02-16 15:56:38 -08:00
Craig Topper 1daa66d3fd [SelectionDAG] Add SPLAT_VECTOR to SelectionDAG::isConstantFPBuildVectorOrConstantFP.
Matches what is done for the int version.

Reviewed By: sdesmalen

Differential Revision: https://reviews.llvm.org/D119793
2022-02-16 09:22:11 -08:00
Simon Pilgrim 30e9cdd1aa [DAG] computeKnownBits - add ISD::AVGCEILU handling
Expand the ISD::AVGCEILU to determine the known bits of the result.

First part of PR53622

Differential Revision: https://reviews.llvm.org/D119629
2022-02-16 13:00:15 +00:00
Shengchen Kan ce02c79dc6 [Debugify] Mark mir-check-debugify change nothing of input
Reviewed By: pengfei

Differential Revision: https://reviews.llvm.org/D119914
2022-02-16 18:37:26 +08:00
Shao-Ce SUN 2aed07e96c [NFC][MC] remove unused argument `MCRegisterInfo` in `MCCodeEmitter`
Reviewed By: skan

Differential Revision: https://reviews.llvm.org/D119846
2022-02-16 13:10:09 +08:00
Shao-Ce SUN 9cc49c1951 Revert "[NFC][MC] remove unused argument `MCRegisterInfo` in `MCCodeEmitter`"
This reverts commit fe25c06cc5.
2022-02-16 11:57:49 +08:00
Shao-Ce SUN fe25c06cc5 [NFC][MC] remove unused argument `MCRegisterInfo` in `MCCodeEmitter`
For ten years, it seems that `MCRegisterInfo` is not used by any target.

Reviewed By: skan

Differential Revision: https://reviews.llvm.org/D119846
2022-02-16 11:47:17 +08:00
Carl Ritson ef949ecba5 [MachineSink] Use SkipPHIsAndLabels for sink insertion points
For AMDGPU the insertion point for a block may not be the first
non-PHI instruction.  This happens when a block contains EXEC
mask manipulation related to control flow (converging lanes).

Use SkipPHIsAndLabels to determine the block insertion point
so that the target can skip any block prologue instructions.

Reviewed By: rampitec, ruiling

Differential Revision: https://reviews.llvm.org/D119399
2022-02-16 12:44:22 +09:00
Mircea Trofin c62eefb886 [nfc][codegen] Move RegisterBank[Info].cpp under CodeGen
Layering-wise, it seems RegisterBank stuff fits under CodeGen, like
other target abstraction.
In particular, TargetSubtargetInfo has a getRegBankInfo member, but
using that object requires making sure GlobalISel is linked, which is
not always the case (e.g. llvm-jitlink doesn't).

Differential Revision: https://reviews.llvm.org/D119053
2022-02-15 11:27:15 -08:00
David Green 655d0d86f9 [DAGCombine] Move AVG combine to SimplifyDemandBits
This moves the matching of AVGFloor and AVGCeil into a place where
demand bit are available, so that it can detect more cases for more
folds. It changes the transform to start from a shift, not from a
truncate. We match the pattern shr(add(ext(A), ext(B)), 1), transforming
to ext(hadd(A, B)).

For signed values, because only the bottom bits are demanded llvm will
transform the above to use a lshr too, as opposed to ashr. In order to
correctly detect the hadd we need to know the demanded bits to turn it
back. Depending on whether the shift is signed (ashr) or logical (lshr),
and the extensions are signed or unsigned we can create different nodes.
If the shift is signed:
  Needs >= 2 sign bits. https://alive2.llvm.org/ce/z/h4gQAW generating signed rhadd.
  Needs >= 2 zero bits. https://alive2.llvm.org/ce/z/B64DUA generating unsigned rhadd.
If the shift is unsigned:
  Needs >= 1 zero bits. https://alive2.llvm.org/ce/z/ByD8sj generating unsigned rhadd.
  Needs 1 demanded bit zero and >= 2 sign bits https://alive2.llvm.org/ce/z/hvPGxX and
    https://alive2.llvm.org/ce/z/32P5n1 generating signed rhadd.

Differential Revision: https://reviews.llvm.org/D119072
2022-02-15 10:17:02 +00:00
Momchil Velikov 6398903ac8 Extend the `uwtable` attribute with unwind table kind
We have the `clang -cc1` command-line option `-funwind-tables=1|2` and
the codegen option `VALUE_CODEGENOPT(UnwindTables, 2, 0) ///< Unwind
tables (1) or asynchronous unwind tables (2)`. However, this is
encoded in LLVM IR by the presence or the absence of the `uwtable`
attribute, i.e.  we lose the information whether to generate want just
some unwind tables or asynchronous unwind tables.

Asynchronous unwind tables take more space in the runtime image, I'd
estimate something like 80-90% more, as the difference is adding
roughly the same number of CFI directives as for prologues, only a bit
simpler (e.g. `.cfi_offset reg, off` vs. `.cfi_restore reg`). Or even
more, if you consider tail duplication of epilogue blocks.
Asynchronous unwind tables could also restrict code generation to
having only a finite number of frame pointer adjustments (an example
of *not* having a finite number of `SP` adjustments is on AArch64 when
untagging the stack (MTE) in some cases the compiler can modify `SP`
in a loop).
Having the CFI precise up to an instruction generally also means one
cannot bundle together CFI instructions once the prologue is done,
they need to be interspersed with ordinary instructions, which means
extra `DW_CFA_advance_loc` commands, further increasing the unwind
tables size.

That is to say, async unwind tables impose a non-negligible overhead,
yet for the most common use cases (like C++ exceptions), they are not
even needed.

This patch extends the `uwtable` attribute with an optional
value:
      -  `uwtable` (default to `async`)
      -  `uwtable(sync)`, synchronous unwind tables
      -  `uwtable(async)`, asynchronous (instruction precise) unwind tables

Reviewed By: MaskRay

Differential Revision: https://reviews.llvm.org/D114543
2022-02-14 14:35:02 +00:00
David Green 03380c70ed [DAGCombine] Basic combines for AVG nodes.
This adds very basic combines for AVG nodes, mostly for constant folding
and handling degenerate (zero) cases. The code performs mostly the same
transforms as visitMULHS, adjusted for AVG nodes.

Constant folding extends to a higher bitwidth and drops the lowest bit.
For undef nodes, `avg undef, x` is transformed to x.  There is also a
transform for `avgfloor x, 0` transforming to `shr x, 1`.

Differential Revision: https://reviews.llvm.org/D119559
2022-02-14 11:18:35 +00:00
Tim Northover a87d3ba61c Reapply: StackProtector: ignore debug insts when splitting blocks.
When deciding where to split a block to insert stack guard checks, we should
move past any debug instructions we see that might (e.g.) be separating a tail
call from its frame wrangling.

This time, also don't run off the front of a basic block.
2022-02-14 10:58:22 +00:00
Nikita Popov ff040eca93 [FastISel] Reuse register for bitcast that does not change MVT
The current FastISel code reuses the register for a bitcast that
doesn't change the IR type, but uses a reg-to-reg copy if it
changes the IR type without changing the MVT. However, we can
simply reuse the register in that case as well.

In particular, this avoids unnecessary reg-to-reg copies for pointer
bitcasts. This was found while inspecting O0 codegen differences
between typed and opaque pointers.

Differential Revision: https://reviews.llvm.org/D119432
2022-02-14 09:13:17 +01:00
Craig Topper e72fe654b7 [DAGCombiner] Use getShiftAmountConstant in DAGCombiner::foldSelectOfConstants.
This enables fshl to be matched earlier on X86

  %6 = lshr i32 %3, 1
  %7 = select i1 %4, i32 -2147483648, i32 0
  %8 = or i32 %6, %7

X86 uses i8 for shift amounts. SelectionDAGBuilder creates the
ISD::SRL with an i8 shift type. DAGCombiner turns the select into
an ISD::SHL. Prior to this patch it would use i32 for the shift
amount. fshl matching failed because the shift amounts have different
types. LegalizeDAG fixes the ISD::SHL shift amount to i8. This
allowed fshl matching to succeed.

With this patch, the ISD::SHL will be created with an i8 shift
amount. This allows the fshl to match immediately.

No test case beause we still end up with a fshl either way.
2022-02-13 19:09:26 -08:00
Benjamin Kramer bee4531bee [MachineSink] Inline getRegUnits
Reg unit sets are uniqued, so no need to wrap it in a set.
2022-02-12 17:46:12 +01:00
Sanjay Patel 96b7e0b5a0 [SDAG] clean up scalarizing load transform
I have not found a way to expose a difference for this patch in a test
because it only triggers for a one-use load, but this is the code that
was adapted into D118376 and caused miscompiles. The new code pattern
is the same as what we do in narrowExtractedVectorLoad() (reduces load
width for a subvector extract).

This removes seemingly unnecessary manual worklist management and fixes
the chain updating via "SelectionDAG::makeEquivalentMemoryOrdering()".

Differential Revision: https://reviews.llvm.org/D119549
2022-02-12 11:41:19 -05:00
Sanjay Patel 429f10f5f2 [SDAG] reduce code duplication and fix formatting; NFC 2022-02-12 10:22:13 -05:00
Arthur Eubanks c0281c7607 [OpaquePtr][SPARC] Remove getPointerElementType() call in SparcISelLowering
Requires keeping better track of sret types.
2022-02-11 11:31:19 -08:00
David Green 4072e362c0 [ISel] Port AArch64 HADD and RHADD to ISel
This ports the aarch64 combines for HADD and RHADD over to DAG combine,
so that they can be used in more architectures (notably MVE in a
followup patch). They are renamed to AVGFLOOR and AVGCEIL in the
process, to avoid confusion with instructions such as X86 hadd. The code
was also rewritten slightly to remove the AArch64 idiosyncrasies.

The general pattern for a AVGFLOORS is
  %xe = sext i8 %x to i32
  %ye = sext i8 %y to i32
  %a = add i32 %xe, %ye
  %r = lshr i32 %a, 1
  %t = trunc i32 %r to i8

An AVGFLOORU is equivalent with zext. Because of the truncate
lshr==ashr, as the top bits are not demanded. An AVGCEIL also includes
an extra rounding, so includes an extra add of 1.

Differential Revision: https://reviews.llvm.org/D106237
2022-02-11 18:28:56 +00:00
Tim Northover 2ba06bed6b Revert "StackProtector: ignore debug insts when splitting blocks."
This reverts commit 7605ca85f1.

It caused an assertion failure in Fuschia.
2022-02-11 18:06:28 +00:00
Julien Pages dcb2da13f1 [AMDGPU] Add a new intrinsic to control fp_trunc rounding mode
Add a new llvm.fptrunc.round intrinsic to precisely control
the rounding mode when converting from f32 to f16.

Differential Revision: https://reviews.llvm.org/D110579
2022-02-11 12:08:23 -05:00
Tim Northover 7605ca85f1 StackProtector: ignore debug insts when splitting blocks.
When deciding where to split a block to insert stack guard checks, we should
move past any debug instructions we see that might (e.g.) be separating a tail
call from its frame wrangling.
2022-02-11 10:13:50 +00:00
serge-sans-paille 06943537d9 Cleanup MCParser headers
As usual with that header cleanup series, some implicit dependencies now need to
be explicit:

llvm/MC/MCParser/MCAsmParser.h no longer includes llvm/MC/MCParser/MCAsmLexer.h

Preprocessed lines to build llvm on my setup:
after:  1068185081
before: 1068324320

So no compile time benefit to expect, but we still get the looser coupling
between files which is great.

Discourse thread: https://discourse.llvm.org/t/include-what-you-use-include-cleanup
Differential Revision: https://reviews.llvm.org/D119359
2022-02-11 10:39:29 +01:00
Yuanfang Chen f927021410 Reland "[clang-cl] Support the /JMC flag"
This relands commit b380a31de0.

Restrict the tests to Windows only since the flag symbol hash depends on
system-dependent path normalization.
2022-02-10 15:16:17 -08:00
Yuanfang Chen b380a31de0 Revert "[clang-cl] Support the /JMC flag"
This reverts commit bd3a1de683.

Break bots:
https://luci-milo.appspot.com/ui/p/fuchsia/builders/toolchain.ci/clang-windows-x64/b8822587673277278177/overview
2022-02-10 14:17:37 -08:00
Reid Kleckner 64037afe01 [CodeView] Avoid integer overflow while parsing long version strings
This came up on a funny vendor-provided version string that didn't have
a standard dotted quad of numbers.
2022-02-10 13:52:11 -08:00
Yuanfang Chen bd3a1de683 [clang-cl] Support the /JMC flag
The introduction and some examples are on this page:
https://devblogs.microsoft.com/cppblog/announcing-jmc-stepping-in-visual-studio/

The `/JMC` flag enables these instrumentations:
- Insert at the beginning of every function immediately after the prologue with
  a call to `void __fastcall __CheckForDebuggerJustMyCode(unsigned char *JMC_flag)`.
  The argument for `__CheckForDebuggerJustMyCode` is the address of a boolean
  global variable (the global variable is initialized to 1) with the name
  convention `__<hash>_<filename>`. All such global variables are placed in
  the `.msvcjmc` section.
- The `<hash>` part of `__<hash>_<filename>` has a one-to-one mapping
  with a directory path. MSVC uses some unknown hashing function. Here I
  used DJB.
- Add a dummy/empty COMDAT function `__JustMyCode_Default`.
- Add `/alternatename:__CheckForDebuggerJustMyCode=__JustMyCode_Default` link
  option via ".drectve" section. This is to prevent failure in
  case `__CheckForDebuggerJustMyCode` is not provided during linking.

Implementation:
All the instrumentations are implemented in an IR codegen pass. The pass is placed immediately before CodeGenPrepare pass. This is to not interfere with mid-end optimizations and make the instrumentation target-independent (I'm still working on an ELF port in a separate patch).

Reviewed By: hans

Differential Revision: https://reviews.llvm.org/D118428
2022-02-10 10:26:30 -08:00
Nikita Popov 6241f7dee0 [FastISel] Remove redundant reg class check (NFC)
SrcVT and DstVT are the same in this branch, as such their register
classes will also be the same.
2022-02-10 14:10:00 +01:00
Jeremy Morse be5734ddaa [DebugInfo][InstrRef] Don't fire assertions if debug-info is faulty
It's inevitable that optimisation passes will fail to update debug-info:
when that happens, it's best if the compiler doesn't crash as a result.
Therefore, downgrade a few assertions / failure modes that would crash
when illegal debug-info was seen, to instead drop variable locations. In
practice this means that an instruction reference to a nonexistant or
illegal operand should be tolerated.

Differential Revision: https://reviews.llvm.org/D118998
2022-02-10 11:25:08 +00:00
Jay Foad abda8d2229 [GlobalISel] CSE FP constants at -O0
At -O0 we claim to CSE constants only. I think this should apply to
G_FCONSTANT as well as G_CONSTANT.

Differential Revision: https://reviews.llvm.org/D119344
2022-02-10 09:17:11 +00:00