Commit Graph

32610 Commits

Author SHA1 Message Date
Simon Pilgrim e7a0fa4df0 [DAG] foldAddSubOfSignBit - don't bother creating the new shift node unless constant folding succeeds
Noticed by inspection - the new shift is only ever used if the constant fold occurs
2022-07-05 16:44:31 +01:00
Thomas Symalla 04c5fed5e0 [NFC] Fix wrong comment. 2022-07-05 13:37:44 +02:00
Simon Pilgrim cce64e7a9c [DAG] visitTRUNCATE - move GetDemandedBits AFTER SimplifyDemandedBits.
Another cleanup step before removing GetDemandedBits entirely.
2022-07-04 11:25:40 +01:00
Nikita Popov 7283f48a05 [IR] Remove support for insertvalue constant expression
This removes the insertvalue constant expression, as part of
https://discourse.llvm.org/t/rfc-remove-most-constant-expressions/63179.
This is very similar to the extractvalue removal from D125795.
insertvalue is also not supported in bitcode, so no auto-ugprade
is necessary.

ConstantExpr::getInsertValue() can be replaced with
IRBuilder::CreateInsertValue() or ConstantFoldInsertValueInstruction(),
depending on whether a constant result is required (with the latter
being fallible).

The ConstantExpr::hasIndices() and ConstantExpr::getIndices()
methods also go away here, because there are no longer any constant
expressions with indices.

Differential Revision: https://reviews.llvm.org/D128719
2022-07-04 09:27:22 +02:00
esmeyi d2a35e4d39 [AIX] Handling the label alignment of a global
variable with its multiple aliases.

This patch handles the case where a variable has
multiple aliases.
AIX's assembly directive .set is not usable for the
aliasing purpose, and using different labels allows
AIX to emulate symbol aliases. If a value is emitted
between any two labels, meaning they are not aligned,
XCOFF will automatically calculate the offset for them.

This patch implements:
1) Emits the label of the alias just before emitting
the value of the sub-element that the alias referred to.
2) A set of aliases that refers to the same offset
should be aligned.
3) We didn't emit aliasing labels for common and
zero-initialized local symbols in
PPCAIXAsmPrinter::emitGlobalVariableHelper, but
emitted linkage for them in
AsmPrinter::emitGlobalAlias, which caused a FAILURE.
This patch fixes the bug by blocking emitting linkage
for the alias without a label.

Reviewed By: shchenz

Differential Revision: https://reviews.llvm.org/D124654
2022-07-03 23:16:16 -04:00
Quentin Colombet f4145ddf5b [GISel] Don't fold convergent instruction across CFG
Before merging two instructions together, GISel does some sanity checks
that the folding is legal. However that check was missing that the
source of the pattern may be convergent. When the destination location
is in a different basic block, the folding is invalid.

Differential Revision: https://reviews.llvm.org/D128539
2022-07-01 10:24:24 -07:00
Sander de Smalen 690db16422 [AArch64] Make nxv1i1 types a legal type for SVE.
One motivation to add support for these types are the LD1Q/ST1Q
instructions in SME, for which we have defined a number of load/store
intrinsics which at the moment still take a `<vscale x 16 x i1>` predicate
regardless of their element type.

This patch adds basic support for the nxv1i1 type such that it can be passed/returned
from functions, as well as some basic support to support some existing tests that
result in a nxv1i1 type. It also adds support for splats.

Other operations (e.g. insert/extract subvector, logical ops, etc) will be
supported in follow-up patches.

Reviewed By: paulwalker-arm, efriedma

Differential Revision: https://reviews.llvm.org/D128665
2022-07-01 15:11:13 +00:00
Xiang1 Zhang 72a23cef7e [ISel] Match all bits when merge undefs for DAG combine
Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D128570
2022-07-01 09:09:43 +08:00
Xiang1 Zhang 64f44a90ef Revert "[ISel] Match all bits when merge undef(s) for DAG combine"
This reverts commit 5fe5aa284e.
2022-07-01 08:59:04 +08:00
Xiang1 Zhang 5fe5aa284e [ISel] Match all bits when merge undef(s) for DAG combine 2022-07-01 08:58:00 +08:00
Nuno Lopes 373571dbb4 [NFC] Switch a few uses of undef to poison as placeholders for unreachble code 2022-06-30 23:01:43 +01:00
jeff 09424f802c [AMDGPU] Check for CopyToReg PhysReg clobbers in pre-RA-sched
Differential Revision: https://reviews.llvm.org/D128681
2022-06-30 09:18:04 -07:00
Luo, Yuanke fa8656d28d [greedyalloc] Return early when there is no register to allocate.
In X86 we split greddy register allocation into 2 passes. The 1st pass
is to allocate tile register, and the 2nd pass is to allocate the rest
of virtual register. In most cases there is no tile register, so the 1st
pass is unnecessary. To improve the compiling time, we check if there is
any register need to be allocated by invoking callback
`ShouldAllocateClass`. If there is no register to be allocated, just
return false in the pass. This would improve the 1st greed RA pass for
normal cases.

Differential Revision: https://reviews.llvm.org/D128804
2022-06-30 11:12:05 +08:00
Stefan Pintilie e50a8c8435 [GlobalMerge] Ensure that the MustKeepGlobalVariables has all globals from each landingpad clause.
The filter clause in the landingpad may not have a GlobalVariable operand.
It may instead have a ConstantArray of operands and each operand within this
ConstantArray should also be checked to see if it is a GlobalVariable.

This patch add the check for the ConstantArray as well as a debug message that
outputs the contents of MustKeepGlobalVariables.

Reviewed By: lei, amyk, scui

Differential Revision: https://reviews.llvm.org/D128287
2022-06-29 15:55:47 -05:00
Nikita Popov 16033ffdd9 [ConstExpr] Remove more leftovers of extractvalue expression (NFC)
Remove some leftover bits of extractvalue handling after the
removal in D125795.
2022-06-29 10:45:19 +02:00
Nikita Popov 348ea34bcd [AsmPrinter] Further restrict expressions supported in global initializers
lowerConstant() currently accepts a number of constant expressions
which have corresponding MC expressions, but which cannot be
evaluated as a relocatable expression (unless the operands are
constant, in which case we'll just fold the expression to a constant).

The motivation here is to clarify which constant expressions are
really needed for https://discourse.llvm.org/t/rfc-remove-most-constant-expressions/63179,
and in particular clarify that we do not need to support any
division expressions, which are particularly problematic.

Differential Revision: https://reviews.llvm.org/D127972
2022-06-29 10:02:07 +02:00
Luo, Yuanke 5cb0979870 [X86][AMX] Split greedy RA for tile register
When we fill the shape to tile configure memory, the shape is gotten
from AMX pseudo instruction. However the register for the shape may be
split or spilled by greedy RA. That cause we fill the shape to config
memory after ldtilecfg is executed, so that the shape configuration
would be wrong.
This patch is to split the tile register allocation from greedy register
allocation, so that after tile registers are allocated the shape
registers are still virtual register. The shape register only may be
redefined or multi-defined by phi elimination pass, two address pass.
That doesn't affect tile register configuration.

Differential Revision: https://reviews.llvm.org/D128584
2022-06-29 10:35:43 +08:00
Guozhi Wei ddc9e8861c [MachineCombiner, AArch64] Add a new pattern A-(B+C) => (A-B)-C to reduce latency
Add a new pattern A - (B + C) ==> (A - B) - C to give machine combiner a chance
to evaluate which instruction sequence has lower latency.

Differential Revision: https://reviews.llvm.org/D124564
2022-06-28 21:42:51 +00:00
Rahman Lavaee 0aa6df6575 [Propeller] Encode address offsets of basic blocks relative to the end of the previous basic blocks.
This is a resurrection of D106421 with the change that it keeps backward-compatibility. This means decoding the previous version of `LLVM_BB_ADDR_MAP` will work. This is required as the profile mapping tool is not released with LLVM (AutoFDO). As suggested by @jhenderson we rename the original  section type value to `SHT_LLVM_BB_ADDR_MAP_V0` and assign a new value to the `SHT_LLVM_BB_ADDR_MAP` section type. The new encoding adds a version byte to each function entry to specify the encoding version for that function.  This patch also adds a feature byte to be used with more flexibility in the future. An use-case example for the feature field is encoding multi-section functions more concisely using a different format.

Conceptually, the new encoding emits basic block offsets and sizes as label differences between each two consecutive basic block begin and end label. When decoding, offsets must be aggregated along with basic block sizes to calculate the final offsets of basic blocks relative to the function address.

This encoding uses smaller values compared to the existing one (offsets relative to function symbol).
Smaller values tend to occupy fewer bytes in ULEB128 encoding. As a result, we get about 17% total reduction in the size of the bb-address-map section (from about 11MB to 9MB for the clang PGO binary).
The extra two bytes (version and feature fields) incur a small 3% size overhead to the `LLVM_BB_ADDR_MAP` section size.

Reviewed By: jhenderson

Differential Revision: https://reviews.llvm.org/D121346
2022-06-28 07:42:54 -07:00
Tim Northover 4aafebce52 SelectionDAG: allow FP extensions when folding extract/insert.
Before, we were trying to sign extend half -> float, and asserted in getNode.
2022-06-28 12:08:35 +01:00
Guillaume Chatelet 3c126d5fe4 [Alignment] Replace commonAlignment with std::min
`commonAlignment` is a shortcut to pick the smallest of two `Align`
objects. As-is it doesn't bring much value compared to `std::min`.

Differential Revision: https://reviews.llvm.org/D128345
2022-06-28 07:15:02 +00:00
Fangrui Song f1e27716cf [LiveInterval] Simplify with partition_point. NFC 2022-06-27 19:25:26 -07:00
Yuanfang Chen 6678f8e505 [ubsan] Using metadata instead of prologue data for function sanitizer
Information in the function `Prologue Data` is intentionally opaque.
When a function with `Prologue Data` is duplicated. The self (global
value) references inside `Prologue Data` is still pointing to the
original function. This may cause errors like `fatal error: error in backend: Cannot represent a difference across sections`.

This patch detaches the information from function `Prologue Data`
and attaches it to a function metadata node.

This and D116130 fix https://github.com/llvm/llvm-project/issues/49689.

Reviewed By: pcc

Differential Revision: https://reviews.llvm.org/D115844
2022-06-27 12:09:13 -07:00
Patrick Walton becbbb7e3c Round up zero-sized symbols to 1 byte in `.debug_aranges` (without breaking other logic).
This commit modifies the AsmPrinter to avoid emitting any zero-sized symbols to
the .debug_aranges table, by rounding their size up to 1. Entries with zero
length violate the DWARF 5 spec, which states:

> Each descriptor is a triple consisting of a segment selector, the beginning
> address within that segment of a range of text or data covered by some entry
> owned by the corresponding compilation unit, followed by the non-zero length
> of that range.

In practice, these zero-sized entries produce annoying warnings in lld and
cause GNU binutils to truncate the table when parsing it.

Other parts of LLVM, such as DWARFDebugARanges in the DebugInfo module
(specifically the appendRange method), already avoid emitting zero-sized
symbols to .debug_aranges, but not comprehensively in the AsmPrinter. In fact,
the AsmPrinter does try to avoid emitting such zero-sized symbols when labels
aren't involved, but doesn't when the symbol to emitted is a difference of two
labels; this patch extends that logic to handle the case in which the symbol is
defined via labels.

Furthermore, this patch fixes a bug in which `available_externally` symbols
would cause unpredictable values to be emitted into the `.debug_aranges` table
under certain circumstances. In practice I don't believe that this caused
issues up until now, but the root cause of this bug--an invalid DenseMap
lookup--triggered failures in Chromium when combined with an earlier version of
this patch. Therefore, this patch fixes that bug too.

This is a revised version of diff D126257, which was reverted due to breaking
tests. The now-reverted version of this patch didn't distinguish between
symbols that didn't have their size reported to the DwarfDebug handler and
those that had their size reported to be zero. This new version of the patch
instead restricts the special handling only to the symbols whose size is
definitively known to be zero.

Reviewed By: dblaikie

Differential Revision: https://reviews.llvm.org/D126835
2022-06-27 10:01:03 -07:00
Matt Arsenault 97ed2fbc5f MIR: Fix parse error on empty CustomRegMask 2022-06-27 08:50:35 -04:00
Bradley Smith a83aa33d1b [IR] Move vector.insert/vector.extract out of experimental namespace
These intrinsics are now fundemental for SVE code generation and have been
present for a year and a half, hence move them out of the experimental
namespace.

Differential Revision: https://reviews.llvm.org/D127976
2022-06-27 10:48:45 +00:00
Kazu Hirata 94460f5136 Don't use Optional::hasValue (NFC)
This patch replaces x.hasValue() with x where x is contextually
convertible to bool.
2022-06-26 19:54:41 -07:00
Kazu Hirata d08f34b592 [llvm] Don't use Optional::hasValue (NFC)
This patch replaces Optional::hasValue with the implicit cast to bool
in conditionals only.
2022-06-26 18:31:51 -07:00
Kazu Hirata a81b64a1fb [llvm] Use Optional::has_value instead of Optional::hasValue (NFC)
This patch replaces x.hasValue() with x.has_value() where x is not
contextually convertible to bool.
2022-06-26 16:10:42 -07:00
Craig Topper 44b456e5f0 [CodeGenPrepare] Avoid double map lookup. NFCI 2022-06-26 10:47:14 -07:00
Nikita Popov ec19223131 Revert "[LiveInterval] Simplify. NFC"
This reverts commit e733b80f3c.

This caused a major compile-time regression:
https://llvm-compile-time-tracker.com/compare.php?from=3b7c3a654c9175f41ac871a937cbcae73dfb3c5d&to=e733b80f3cba26bf2df9bd691120f37fc1af21ce&stat=instructions

About 1% on average, 10% on individual files.
2022-06-26 11:51:07 +02:00
Kazu Hirata a7938c74f1 [llvm] Don't use Optional::hasValue (NFC)
This patch replaces Optional::hasValue with the implicit cast to bool
in conditionals only.
2022-06-25 21:42:52 -07:00
Fangrui Song e733b80f3c [LiveInterval] Simplify. NFC 2022-06-25 11:59:33 -07:00
Kazu Hirata 3b7c3a654c Revert "Don't use Optional::hasValue (NFC)"
This reverts commit aa8feeefd3.
2022-06-25 11:56:50 -07:00
Kazu Hirata aa8feeefd3 Don't use Optional::hasValue (NFC) 2022-06-25 11:55:57 -07:00
Matt Arsenault e7bc73739a GlobalISel: Make LoadStoreOpt preserve all
Avoids dropping CSE info analysis
2022-06-25 09:24:54 -04:00
Matt Arsenault a397846cb0 CodeGen: Use else if between Value and PseudoSourceValue cases
These are mutually exclusive.
2022-06-25 09:24:25 -04:00
chenglin.bi 8c74205642 [SelectionDAG][DAGCombiner] Reuse exist node by reassociate
When already have (op N0, N2), reassociate (op (op N0, N1), N2) to (op (op N0, N2), N1) to reuse the exist (op N0, N2)

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D122539
2022-06-24 23:15:06 +08:00
Nabeel Omer 0d41794335 [SLP] Add cost model for `llvm.powi.*` intrinsics (REAPPLIED)
Patch was reverted in 4c5f10a due to buildbot failures, now being
reapplied with updated AArch64 and RISCV tests.

This patch adds handling for the llvm.powi.* intrinsics in
BasicTTIImplBase::getIntrinsicInstrCost() and improves vectorization.
Closes #53887.

Differential Revision: https://reviews.llvm.org/D128172
2022-06-24 10:23:19 +00:00
Carl Ritson 874fbe2cbb [MachineSink] Clear kill flags on operands outside loop
If an instruction is sunk into a loop then any kill flags on
operands declared outside the loop must be cleared as these
will be live for all loop iterations.

Fixes #46827

Reviewed By: MatzeB

Differential Revision: https://reviews.llvm.org/D126754
2022-06-24 14:02:48 +09:00
Lian Wang 1ce30457c1 [LegalizeTypes][NFC] Add an assert to WidenVecRes_EXTRACT_SUBVECTOR and adjust some code
Reviewed By: craig.topper, david-arm

Differential Revision: https://reviews.llvm.org/D128038
2022-06-24 03:06:16 +00:00
Lian Wang 770fe864fe [SelectionDAG] Enable WidenVecOp_VECREDUCE for scalable vector
Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D128239
2022-06-24 02:32:53 +00:00
Craig Topper 8b10ffabae [RISCV] Disable <vscale x 1 x *> types with Zve32x or Zve32f.
According to the vector spec, mf8 is not supported for i8 if ELEN
is 32. Similarily mf4 is not suported for i16/f16 or mf2 for i32/f32.

Since RVVBitsPerBlock is 64 and LMUL is calculated as
((MinNumElements * ElementSize) / RVVBitsPerBlock) this means we
need to disable any type with MinNumElements==1.

For generic IR, these types will now be widened in type legalization.
For RVV intrinsics, we'll probably hit a fatal error somewhere. I plan
to work on disabling the intrinsics in the riscv_vector.h header.

Reviewed By: arcbbb

Differential Revision: https://reviews.llvm.org/D128286
2022-06-23 08:49:18 -07:00
Baptiste Saleil 79e77a9f39 [AMDGPU] Flush the vmcnt counter in loop preheaders when necessary
waitcnt vmcnt instructions are currently generated in loop bodies before using
values loaded outside of the loop. In some cases, it is better to flush the
vmcnt counter in a loop preheader before entering the loop body. This patch
detects these cases and generates waitcnt instructions to flush the counter.

Reviewed By: foad

Differential Revision: https://reviews.llvm.org/D115747
2022-06-23 10:53:21 -04:00
Nico Weber 851a5efe45 Revert "[fastalloc] Support allocating specific register class in fastalloc"
This reverts commit 719658d078.
Breaks a few things, see comments on https://reviews.llvm.org/D128437
There's disagreement about the best fix.
So let's keep HEAD green while discussions are happening.
2022-06-23 10:44:24 -04:00
Luo, Yuanke 719658d078 [fastalloc] Support allocating specific register class in fastalloc
The base RA support infrastructure that only allow a specific register
class be allocated in RA pss. Since greedy RA, basic RA derived from
base RA, they all allow allocating specific register class. Fast RA
doesn't support allocating register for specific register class. This
patch is to enable ShouldAllocateClass in fast RA, so that it can
support allocating register for specific register class.

Differential Revision: https://reviews.llvm.org/D126771
2022-06-23 14:42:04 +08:00
chenglin.bi 9c2bf534f5 Revert "[SelectionDAG][DAGCombiner] Reuse exist node by reassociate"
This reverts commit 6c951c5ee6.
2022-06-23 13:21:51 +08:00
Matt Arsenault 370aa2f88f InlineSpiller: Don't fold spills into undef reads
This was producing a load into a dead register which was a verifier
error.
2022-06-22 20:47:55 -04:00
Guillaume Chatelet 57ffff6db0 Revert "[NFC] Remove dead code"
This reverts commit 8ba2cbff70.
2022-06-22 14:55:47 +00:00
Guillaume Chatelet 8ba2cbff70 [NFC] Remove dead code 2022-06-22 13:33:58 +00:00
Simon Pilgrim 2c3a4a9334 [DAG] SelectionDAG::GetDemandedBits - don't recurse back into GetDemandedBits
Another minor cleanup as we work toward removing GetDemandedBits entirely - call SimplifyMultipleUseDemandedBits directly.
2022-06-22 13:48:57 +01:00
Simon Pilgrim 1c2b756cd6 [DAG] visitTRUNCATE - move TRUNCATE(ADDE/ADDCARRY) folds to switch statement handling the other binops. NFC. 2022-06-21 22:07:41 +01:00
Simon Pilgrim 8cecb6be56 [DAG] Remove SelectionDAG::GetDemandedBits DemandedElts variant. NFC.
We're slowly removing SelectionDAG::GetDemandedBits and replacing it with SimplifyMultipleUseDemandedBits, we no longer have any uses for the vector demanded elt variant.
2022-06-21 21:23:10 +01:00
Nabeel Omer 4c5f10aeeb Revert rGe6ccb57bb3f6b761f2310e97fd6ca99eff42f73e "[SLP] Add cost model for `llvm.powi.*` intrinsics"
This reverts commit e6ccb57bb3.
2022-06-21 15:05:55 +00:00
Nabeel Omer e6ccb57bb3 [SLP] Add cost model for `llvm.powi.*` intrinsics
This patch adds handling for the llvm.powi.* intrinsics in
BasicTTIImplBase::getIntrinsicInstrCost() and improves vectorization.
Closes #53887.

Differential Revision: https://reviews.llvm.org/D128172
2022-06-21 14:40:34 +00:00
Markus Lavin 3815ae29b5 [machinesink] fix debug invariance issue
Do not include debug instructions when comparing block sizes with
thresholds.

Differential Revision: https://reviews.llvm.org/D127208
2022-06-21 08:13:09 +02:00
Kazu Hirata 7a47ee51a1 [llvm] Don't use Optional::getValue (NFC) 2022-06-20 22:45:45 -07:00
Kazu Hirata d66cbc565a Don't use Optional::hasValue (NFC) 2022-06-20 20:26:05 -07:00
Kazu Hirata 0916d96d12 Don't use Optional::hasValue (NFC) 2022-06-20 20:17:57 -07:00
Kazu Hirata 064a08cd95 Don't use Optional::hasValue (NFC) 2022-06-20 20:05:16 -07:00
chenglin.bi 6c951c5ee6 [SelectionDAG][DAGCombiner] Reuse exist node by reassociate
When already have (op N0, N2), reassociate (op (op N0, N1), N2) to (op (op N0, N2), N1) to reuse the exist (op N0, N2)

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D122539
2022-06-21 09:45:19 +08:00
Luo, Yuanke 44e8a205f4 [fastregalloc] Enhance the heuristics for liveout in self loop.
For below case, virtual register is defined twice in the self loop. We
don't need to spill %0 after the third instruction `%0 = def (tied %0)`,
because it is defined in the second instruction `%0 = def`.

1 bb.1
2 %0 = def
3 %0 = def (tied %0)
4 ...
5 jmp bb.1

Reviewed By: MatzeB

Differential Revision: https://reviews.llvm.org/D125079
2022-06-21 09:18:49 +08:00
Kazu Hirata 5413bf1bac Don't use Optional::hasValue (NFC) 2022-06-20 11:33:56 -07:00
David Green c0ecbfa4fd [AArch64] Known bits for AArch64ISD::DUP
An AArch64ISD::DUP is just a splat, where the known bits for each lane
are the same as the input. This teaches that to computeKnownBitsForTargetNode.

Problems arise for constants though, as a constant BUILD_VECTOR can be
lowered to an AArch64ISD::DUP, which SimplifyDemandedBits would then
turn back into a constant BUILD_VECTOR leading to an infinite cycle.
This has been prevented by adding a isTargetCanonicalConstantNode node
to prevent the conversion back into a BUILD_VECTOR.

Differential Revision: https://reviews.llvm.org/D128144
2022-06-20 19:11:57 +01:00
Kazu Hirata e0e687a615 [llvm] Don't use Optional::hasValue (NFC) 2022-06-20 10:38:12 -07:00
Guillaume Chatelet 03036061c7 [Alignment] Use 'previous()' method instead of scalar division
This is in preparation of integration with D128052.

Differential Revision: https://reviews.llvm.org/D128169
2022-06-20 11:01:43 +00:00
Simon Pilgrim e4a124dda5 [DAG] Fold (srl (shl x, c1), c2) -> and(shl/srl(x, c3), m)
Similar to the existing (shl (srl x, c1), c2) fold

Part of the work to fix the regressions in D77804

Differential Revision: https://reviews.llvm.org/D125836
2022-06-20 08:37:38 +01:00
Lian Wang ab25e263a9 [SelectionDAG] Enable WidenVecOp_VECREDUCE_SEQ for scalable vector
Reviewed By: sdesmalen

Differential Revision: https://reviews.llvm.org/D127710
2022-06-20 06:30:26 +00:00
Craig Topper 314dbde12c [DAGCombiner][ARM][RISCV] Teach ShrinkLoadReplaceStoreWithStore to use truncstore.
The VT we want to shrink to may not be legal especially after type
legalization.

Fixes PR56110.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D128135
2022-06-19 15:50:15 -07:00
Haojian Wu 44582afe48 Fix an unused-variable warning in release build, NFC. 2022-06-19 20:52:00 +02:00
David Green e995e34469 [MachinePipeliner] Handle failing constrainRegClass
The included test hits a verifier problems as one of the instructions:
```
%113:tgpreven, %114:tgprodd = MVE_VMLSLDAVas16 %12:tgpreven(tied-def 0), %11:tgprodd(tied-def 1), %7:mqpr, %8:mqpr, 0, $noreg, $noreg
```
Has two inputs that come from different PHIs with the same base reg, but
conflicting regclasses:
```
%11:tgprodd = PHI %103:gpr, %bb.1, %16:gpr, %bb.2
%12:tgpreven = PHI %103:gpr, %bb.1, %17:gpr, %bb.2
```

The MachinePipeliner would attempt to use %103 for both the %11 and %12
operands in the prolog, constraining the register class to the common
subset of both. Unfortunately there are no registers that are both odd
and even, so the second constrainRegClass fails. Fix this situation by
inserting a COPY for the second if the call to constrainRegClass fails.

The register allocation can then fold that extra copy away. The register
allocation of Q regs changed with this test, but the R regs were the
same and no new instructions are needed in the final assembly.

Differential Revision: https://reviews.llvm.org/D127971
2022-06-19 18:55:19 +01:00
Simon Pilgrim ba3f2667b6 [DAG] Add MaskedVectorIsZero helper
Equivalent to MaskedValueIsZero, except its checking if all of the demanded vectors elements are known to be zero
2022-06-19 17:56:30 +01:00
Simon Pilgrim 1ebe5cac46 [DAG] SimplifyDemandedBits - add DemandedElts handling to ISD::SIGN_EXTEND_INREG simplification 2022-06-19 15:35:29 +01:00
Simon Pilgrim db1be696c4 [DAG] SimplifyDemandedBits - add ISD::VSELECT handling 2022-06-19 15:18:25 +01:00
Kazu Hirata 129b531c9c [llvm] Use value_or instead of getValueOr (NFC) 2022-06-18 23:07:11 -07:00
Kazu Hirata 3cbe0bc4a1 [CodeGen] Use default member initialization (NFC)
Identified with modernize-use-default-member-init.
2022-06-18 12:01:34 -07:00
Kazu Hirata 4271a1ff33 [llvm] Call *set::insert without checking membership first (NFC) 2022-06-18 10:17:22 -07:00
Kazu Hirata b254d67160 [llvm] Call *set::insert without checking membership first (NFC) 2022-06-18 08:32:54 -07:00
Han-Kuan Chen e29133629b [MachineCopyPropagation][RISCV] Fix D125335 accidentally change control flow.
D125335 makes regsOverlap skip following control flow, which is not entended
in the original code.

Differential Revision: https://reviews.llvm.org/D128039
2022-06-17 21:40:08 -07:00
Vitaly Buka f0ca0a324f [CodeGen] Init EmptyExpr before the first use 2022-06-17 17:40:06 -07:00
Guillaume Chatelet 90f96ec7a5 [NFC][Alignment] Remove assumeAligned from MachineFrameInfo ctor 2022-06-17 15:21:17 +00:00
Paul Walker 0e21f1d56a [SelectionDAG] Extend WidenVecOp_INSERT_SUBVECTOR to cover more cases.
WidenVecOp_INSERT_SUBVECTOR only supported cases where widening
effectively converts the insert into a copy.  However, when the
widened subvector is no bigger than the vector being inserted into
and we can be sure there's no loss of data, we can simply emit
another INSERT_SUBVECTOR.

Fixes: #54982

Differential Revision: https://reviews.llvm.org/D127508
2022-06-17 12:39:42 +00:00
Mingming Liu 1e67385d28 [MachineBlockPlacementStats] Added check for "-filter-print-funcs"
option to the machine-block-placement-stats.

Differential Revision: https://reviews.llvm.org/D128019
2022-06-16 21:59:54 -07:00
Mingming Liu b7d09557f6 Revert "[MachineBlockPlacementStats] Add check for `-filter-print-funcs` option to machine-block-placement stats."
This reverts commit 46d45df451. Going to
add differential revision link to commit message and re-commit.
2022-06-16 21:56:08 -07:00
Mingming Liu 46d45df451 [MachineBlockPlacementStats] Add check for `-filter-print-funcs` option to machine-block-placement stats. 2022-06-16 21:48:08 -07:00
Lian Wang f2bcf33058 [LegalizeTypes][NFC] Merge promote SPLAT_VECTOR and promote SCALAR_TO_VECTOR to one function
Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D127825
2022-06-17 02:43:52 +00:00
Lian Wang 16215eb979 [LegalizeTypes][RISCV][NFC] Modify assert in PromoteIntRes_STEP_VECTOR and add some tests for RISCV
Reviewed By: david-arm

Differential Revision: https://reviews.llvm.org/D127939
2022-06-17 02:26:09 +00:00
Craig Topper e6c7a3a54f [SelectionDAG] Don't apply MinRCSize constraint in InstrEmitter::AddRegisterOperand for IMPLICIT_DEF sources.
MinRCSize is 4 and prevents constrainRegClass from changing the
register class if the new class has size less than 4.

IMPLICIT_DEF gets a unique vreg for each use and will be removed
by the ProcessImplicitDef pass before register allocation. I don't
think there is any reason to prevent constraining the virtual register
to whatever register class the use needs.

The attached test case was previously creating a copy of IMPLICIT_DEF
because vrm8nov0 has 3 registers in it.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D128005
2022-06-16 14:55:14 -07:00
Adrian Tong 55311801f0 Allow bitwidth difference when checking for isOneOrOneSplat.
This helps handling a case where the BUILD_VECTOR has i16 element type
and i32 constant operands

t2: v8i16 = setcc t8, t17, setult:ch
t3: v8i16 = BUILD_VECTOR Constant:i32<1>, ...
   t4: v8i16 = and t2, t3
      t5: v8i16 = add t8, t4

This can be turned into t5: v8i16 = sub t8, t2, and allows us to remove
t3 and t4 from the DAG.

Differential Revision: https://reviews.llvm.org/D127354
2022-06-16 16:04:20 +00:00
Kito Cheng e9f7263b38 Reland "[SplitKit] Handle early clobber + tied to def correctly"
This reverts commit 7207373e1e.

We found another RISC-V bug when landing D126048, and it has been fixed
by D127642 now.

Differential Revision: https://reviews.llvm.org/D126048
2022-06-16 17:13:09 +08:00
Craig Topper 3aa6ec619f [ValueTypes] Add types for nxv16bf16 and nxv32bf16.
This is needed by our downstream and makes bf16 and f16 have the
same set of scalable vector types.

Reviewed By: rui.zhang

Differential Revision: https://reviews.llvm.org/D127877
2022-06-15 23:00:53 -07:00
Benjamin Kramer 8c4a07c61f [DAGCombiner] Fold fold (fp_to_bf16 (bf16_to_fp op)) -> op 2022-06-15 19:54:39 +02:00
Benjamin Kramer ca50cb120b [SelectionDAG] Constant fold FP_TO_BF16 and BF16_TO_FP. 2022-06-15 18:51:32 +02:00
Paul Robinson ac2ad3b7bb [PS5] Support sin+cos->sincos optimization 2022-06-15 09:36:05 -07:00
Paul Robinson 654a835c3f [PS5] Trap after noreturn calls, with special case for stack-check-fail 2022-06-15 09:02:17 -07:00
Luo, Yuanke 16547f9fbb [CodeGen] Fix the bug of machine sink
The use operand may be undefined. In that case we can just continue to
check the next operand since it won't increase register pressure.

Differential Revision: https://reviews.llvm.org/D127848
2022-06-15 23:35:52 +08:00
Benjamin Kramer 8bc0bb9564 Add a conversion from double to bf16
This introduces a new compiler-rt function `__truncdfbf2`.
2022-06-15 12:56:31 +02:00
Benjamin Kramer fb34d531af Promote bf16 to f32 when the target doesn't support it
This is modeled after the half-precision fp support. Two new nodes are
introduced for casting from and to bf16. Since casting from bf16 is a
simple operation I opted to always directly lower it to integer
arithmetic. The other way round is more complicated if you want to
preserve IEEE semantics, so it's handled by a new __truncsfbf2
compiler-rt builtin.

This is of course very bare bones, but sufficient to get a semi-softened
fadd on x86.

Possible future improvements:
 - Targets with bf16 conversion instructions can now make fp_to_bf16 legal
 - The software conversion to bf16 can be replaced by a trivial
   implementation under fast math.

Differential Revision: https://reviews.llvm.org/D126953
2022-06-15 12:56:31 +02:00
Keith Walker 94fac097ad [DebugInfo][ARM] Not readonly check for RWPI globals
When compiling for the RWPI relocation model [1], the debug information
is wrong for readonly global variables.

Writable global variables are accessed by the static base register (R9
on ARM) in the RWPI relocation model.  This is being correctly generated

Readonly global variables are not accessed by the static base register
in the RWPI relocation model. This case is incorrectly generating the
same debugging information as for writable global variables.

References:
[1] ARM Read-Write Position Independence: https://github.com/ARM-software/abi-aa/blob/main/aapcs32/aapcs32.rst#read-write-position-independence-rwpi

Differential Revision: https://reviews.llvm.org/D126361
2022-06-15 11:52:12 +01:00
Simon Pilgrim f096d5926d [DAG] Fix SDLoc mismatch in (shl (srl x, c1), c2) -> and(shift(x,c3)) fold
Noticed by @craig.topper on D125836 which uses a tweaked copy of the same code.

Differential Revision: https://reviews.llvm.org/D127772
2022-06-15 11:07:59 +01:00