Commit Graph

32929 Commits

Author SHA1 Message Date
Alexander Shaposhnikov 1e636f2676 [IRBuilder] Add assert for AtomicRMW ordering
Add assert for AtomicRMW: Ordering != AtomicOrdering::Unordered
(https://github.com/llvm/llvm-project/blob/main/llvm/lib/IR/Verifier.cpp#L3944)
and adjust expandAtomicStore accordingly.

Test plan:
1/ ninja check-llvm check-clang check-lld
2/ Bootstrapped LLVM/Clang pass tests

Differential revision: https://reviews.llvm.org/D130457
2022-07-25 22:51:25 +00:00
Matt Arsenault 62531518f9 RegAllocGreedy: Add a command line flag for reverseLocalAssignment
Introduce a flag like for some of the other target heuristic controls
to help with experimentation.
2022-07-25 15:47:15 -04:00
Vladislav Dzhidzhoev fc93ba061a [GlobalISel][DebugInfo] Remove debug info with zero line from constants inserted at entry block
Emission of constants having DebugLoc with line 0 causes significant increase of debug_line section size for some source files.

To illustrate, we can compare section sizes of several files from llvm test-suite, built with SelectionDAG vs GlobalISel, on Aarch64 (macOS), using -O0 optimization level:

| Source path                                                    | SDAG text sz | GISel text sz | SDAG debug_line sz |  GISel debug_line sz
| -------------------------------------------------------------- | ------------ | ------------- | ------------------ | --------------------
| `SingleSource/Regression/C/gcc-c-torture/execute/strlen-2.c`   | 15320        | 660           | 14872              | 6340
| `SingleSource/Regression/C/gcc-c-torture/execute/20040629-1.c` | 33640        | 26300         | 2812               | 6693
| `SingleSource/Benchmarks/Misc/flops-4.c`                       | 1428         | 1196          | 594                | 1008
| `MultiSource/Benchmarks/MiBench/consumer-typeset/z31.c`        | 2716         | 964           | 809                | 903
| `MultiSource/Benchmarks/Prolangs-C/gnugo/showinst.c`           | 2534         | 2502          | 189                | 573

For instance, here is a fragment of `flops-4.c.o` debug line section dump

```
Address            Line   Column File   ISA Discriminator Flags
------------------ ------ ------ ------ --- ------------- -------------
0x0000000000000000    174      0      1   0             0  is_stmt
0x0000000000000010      0      0      1   0             0
0x0000000000000018    185      4      1   0             0  is_stmt prologue_end
0x000000000000001c      0      0      1   0             0
0x0000000000000024    186      4      1   0             0  is_stmt
0x000000000000002c    189     10      1   0             0  is_stmt
0x0000000000000030      0      0      1   0             0
0x0000000000000038    207     11      1   0             0  is_stmt
0x0000000000000044    208     11      1   0             0  is_stmt
0x0000000000000048      0      0      1   0             0
0x0000000000000058    210     10      1   0             0  is_stmt
0x000000000000005c      0      0      1   0             0
0x0000000000000060    211     10      1   0             0  is_stmt
0x0000000000000064      0      0      1   0             0
0x000000000000006c    212     10      1   0             0  is_stmt
0x0000000000000070      0      0      1   0             0
0x000000000000007c    213     10      1   0             0  is_stmt
0x0000000000000080      0      0      1   0             0
0x0000000000000088    214     10      1   0             0  is_stmt
0x000000000000008c      0      0      1   0             0
0x0000000000000094    215     10      1   0             0  is_stmt
```

Lot of zero lines are produced by constants (global values) having DebugLoc with line 0.
It seems that they're not significant for debugging experience.

With the commit applied, total size of debug_line sections of llvm shared libraries has reduced by 2.5%.
Change of debug line section size of files listed above:

| Source path                                                    | GISel debug_line sz | Patch debug_line sz
| -------------------------------------------------------------- | ------------------- | --------------------
| `SingleSource/Regression/C/gcc-c-torture/execute/strlen-2.c`   | 6340                | 1465
| `SingleSource/Regression/C/gcc-c-torture/execute/20040629-1.c` | 6693                | 3782
| `SingleSource/Benchmarks/Misc/flops-4.c`                       | 1008                | 609
| `MultiSource/Benchmarks/MiBench/consumer-typeset/z31.c`        | 903                 | 841
| `MultiSource/Benchmarks/Prolangs-C/gnugo/showinst.c`           | 573                 | 190

Reviewed By: dblaikie

Differential Revision: https://reviews.llvm.org/D127488
2022-07-25 17:19:01 +00:00
Nikita Popov fb7caa3c7b [AsmPrinter] Reject ptrtoint to larger size in lowerConstant()
When using a ptrtoint to a size larger than the pointer width in a
global initializer, we currently create a ptr & low_bit_mask style
MCExpr, which will later result in a relocation error during object
file emission.

This patch rejects the constant expression already during
lowerConstant(), which results in a much clearer error message
that references the constant expression at fault.

This fixes https://github.com/llvm/llvm-project/issues/56400,
for certain definitions of "fix".

Differential Revision: https://reviews.llvm.org/D130366
2022-07-25 10:18:27 +02:00
Kazu Hirata b5188591a0 [llvm] Remove redundaunt virtual specifiers (NFC)
Identified with modernize-use-override.
2022-07-24 21:50:35 -07:00
Kazu Hirata acf648b5e9 Use llvm::less_first and llvm::less_second (NFC) 2022-07-24 16:21:29 -07:00
Kazu Hirata ea29810c9d [CodeGen] Remove a redundant void (NFC)
Identified with modernize-redundant-void-arg.
2022-07-24 12:27:14 -07:00
Matt Arsenault 40abb28f61 RegAllocGreedy: Fix subranges when rematerializing dead subreg defs
This would create a new interval missing the subrange and hit this
verifier error:

*** Bad machine code: Live interval for subreg operand has no subranges ***
- function:    test_remat_subreg_def
- basic block: %bb.0  (0xa568758) [0B;128B)
- instruction: 32B	dead undef %4.sub0:vreg_64 = V_MOV_B32_e32 2, implicit $exec
2022-07-24 11:51:59 -04:00
Simon Pilgrim 562ee7cc5f [DAG] visitSMUL_LOHI/visitUMUL_LOHI - ensure we canonicalize constants to the RHS 2022-07-24 16:09:56 +01:00
Simon Pilgrim 428c0f2adc [DAG] getNode - assert that SMUL_LOHI/UMUL_LOHI nodes have the correct ops + types 2022-07-24 15:30:57 +01:00
Simon Pilgrim 0708771cce [DAG] MaskedVectorIsZero - don't bother with (-1).isSubsetOf mask check. NFC.
Just use KnownBits::isZero() to ensure all the bits are known zero.
2022-07-24 13:12:21 +01:00
Simon Pilgrim e82d49bfed [DAG] SimplifyMultipleUseDemandedBits - early-out for any scalable vector types
Noticed while working to remove SelectionDAG::GetDemandedBits - we were relying on the callers to have already bailed for scalable vectors
2022-07-24 12:59:43 +01:00
Simon Pilgrim a3e38b4a20 [DAG] SimplifyDemandedVectorElts - if every and/mul element-pair has a zero/undef then just constant fold to zero 2022-07-24 12:00:31 +01:00
Kazu Hirata 7bfa06f6c0 [CodeGen] Use range-based for loops (NFC) 2022-07-23 16:10:46 -07:00
Simon Pilgrim ac8be21365 [DAG] isSplatValue - don't attempt to merge any BITCAST sub elements if they contain UNDEFs
We still haven't found a solution that correctly handles 'don't care' sub elements properly - given how close it is to the next release branch, I'm making this fail safe change and we can revisit this later if we can't find alternatives.

NOTE: This isn't a reversion of D128570 - it's the removal of undef handling across bitcasts entirely

Fixes #56520
2022-07-23 18:38:48 +01:00
Dmitri Gribenko aba43035bd Use llvm::sort instead of std::sort where possible
llvm::sort is beneficial even when we use the iterator-based overload,
since it can optionally shuffle the elements (to detect
non-determinism). However llvm::sort is not usable everywhere, for
example, in compiler-rt.

Reviewed By: nhaehnle

Differential Revision: https://reviews.llvm.org/D130406
2022-07-23 15:19:05 +02:00
Simon Pilgrim 5f89d2bae9 [DAG] Move OR(AND(X,C1),AND(OR(X,Y),C2)) -> OR(AND(X,OR(C1,C2)),AND(Y,C2)) fold to SimplifyDemandedBits
This will fix the SystemZ v3i31 memcpy regression in D77804 (with the help of D129765 as well....).

It should also allow us to /bend/ the oneuse limitation for cases where we can use demanded bits to safely peek though multiple uses of the AND ops.
2022-07-23 13:17:24 +01:00
Simon Pilgrim 6aff1b7b3c [DAG] SimplifyDemandedBits - pull out repeated getValueType() calls. NFC. 2022-07-23 12:01:54 +01:00
Simon Pilgrim 2421a5af72 [DAG] ExpandIntRes_ADDSUB - create UADDO/USUBO instead of ADDCARRY/SUBCARRY if overflow is known to be zero
As noticed on D127115, when splitting ADD/SUB nodes we often end up with cases where overflow from the lower bits is impossible - in such cases we're better off breaking the carry chain dependency as soon as possible.

This path is being exercised by llvm/test/CodeGen/ARM/dsp-mlal.ll, although I haven't been able to get any codegen diff without a topological worklist.
2022-07-23 11:13:44 +01:00
Simon Pilgrim 8937252465 [DAG] computeKnownBits - add basic shift-by-parts handling
Concat KnownBits from ISD::SHL_PARTS / ISD::SRA_PARTS / ISD::SRL_PARTS lo/hi operands and perform the KnownBits calculation by the shift amount on the extended type, before splitting the KnownBits based on the requested lo/hi result.
2022-07-23 09:46:30 +01:00
ARCHIT SAXENA 3bb1ce2319 Add a nop instruction if a section starts with landing pad for function splitter
This change adds a nop instruction if section starts with landing pad. This change is like [D73739](https://reviews.llvm.org/D73739) which avoids zero offset landing pad in basic block sections.

Detailed description:
The current machine functions splitter can create ˜sections which start with a landing pad themselves. This places landing pad at offset zero from LPStart.
```
	.section	.text.split.foo10,"ax",@progbits
foo10.cold:                             # %lpad
	.cfi_startproc
	.cfi_personality 3, __gxx_personality_v0
	.cfi_lsda 3, .Lexception5
	.cfi_def_cfa %rsp, 16
.Ltmp11: <--- This is a Landing pad and also LP Start as it is start of this section
	movq	%rax, %rdi <--- first instruction is at offest 0 from LPStart
	callq	_Unwind_Resume@PLT

 ```
This will cause landing pad entries to become zero (.Ltmp11-foo10.cold)
```
.Lcst_begin4:
	.uleb128 .Ltmp9-.Lfunc_begin2           # >> Call Site 1 <<
	.uleb128 .Ltmp10-.Ltmp9                 #   Call between .Ltmp9 and .Ltmp10
	.uleb128 .Ltmp11-foo10.cold  <---This is zero           #     jumps to .Ltmp11
	.byte	3                               #   On action: 2
	.uleb128 .Ltmp10-.Lfunc_begin2          # >> Call Site 2 <<
	.uleb128 .Lfunc_end9-.Ltmp10            #   Call between .Ltmp10 and .Lfunc_end9
	.byte	0                               #     has no landing pad
	.byte	0                               #   On action: cleanup
	.p2align	2
```
The C++ ABI somehow assumes that no landing pads point directly to LPStart (which works in the normal case since the function begin is never a landing pad), and uses LP.offset = 0 to specify no landing pad. This change adds a nop instruction at start of such sections so that such a case could be avoided. Output:
```
	.section	.text.split.foo10,"ax",@progbits
foo10.cold:                             # %lpad
	.cfi_startproc
	.cfi_personality 3, __gxx_personality_v0
	.cfi_lsda 3, .Lexception5
	.cfi_def_cfa %rsp, 16
	nop <--- new instruction that is added
.Ltmp11:
	movq	%rax, %rdi
	callq	_Unwind_Resume@PLT
```

Reviewed By: modimo, snehasish, rahmanl

Differential Revision: https://reviews.llvm.org/D130133
2022-07-22 15:20:10 -07:00
Craig Topper be208b40c1 [DAGCombiner] Simplify code around call to reduceLoadWidth in visitAND. NFC
We were looking for loads or any_extend+load. reduceLoadWidth
hasn't known how to look through such an any_extend to find the
load since D40667 almost 5 years ago.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D130333
2022-07-22 08:36:56 -07:00
Nikita Popov c2be703c6c [AsmPrinter] Move lowerConstant() error code out of switch (NFC)
Move this out of the switch, so that different branches can
indicate an error by breaking out of the switch. This becomes
important if there are more than the two current error cases.
2022-07-22 16:08:28 +02:00
Cullen Rhodes bf268a05cd [AArch64] Emit vector FP cmp when LE is used with fast-math
Reviewed By: paulwalker-arm

Differential Revision: https://reviews.llvm.org/D130093
2022-07-22 07:53:55 +00:00
jacquesguan e60eb7053d recommit "[DAGCombiner] Teach scalarizeBinOpOfSplats handle scalable splat."
With fix for AArch64 and Hexgon test cases.
2022-07-21 17:34:34 +08:00
David Green 23d6186be0 [SelectionDAG] Fix fptoi.sat scalable vector lowering
Vector fptosi_sat and fptoui_sat were being expanded by unrolling the
vector operation. This doesn't work for scalable vector, so this patch
adds a call to TLI.expandFP_TO_INT_SAT if the vector is scalable.

Scalable tests are added for AArch64 and RISCV. Some of the AArch64
fptoi_sat operations should be legal, but that will be handled in
another patch.

Differential Revision: https://reviews.llvm.org/D130028
2022-07-21 08:00:22 +01:00
esmeyi 339392ecf2 [AIX] follow-up of D124654.
Emitting the remaining aliases instead of reporting
an error to avoid SPEC2017 PEAK failures.
And mark this as a TODO.
2022-07-21 01:10:09 -04:00
Simon Pilgrim 029e83b401 [DAG] getNode - don't bother creating ADDO(X,0) or SUBO(X,0) nodes.
Similar to what we already do in getNode for basic ADD/SUB nodes, return the X operand directly, but here we know that there will be no/zero overflow as well.

As noted on D127115 - this path is being exercised by llvm/test/CodeGen/ARM/dsp-mlal.ll, although I haven't been able to get any codegen without a topological worklist.
2022-07-20 12:04:33 +01:00
Simon Pilgrim 766cd95481 [DAG] getNode - assert that ADDO/SUBO nodes have the correct ops + types 2022-07-20 11:23:58 +01:00
Simon Pilgrim 9fc347aa4e [DAG] PromoteIntRes_BUILD_VECTOR - extend constant boolean vectors according to target BooleanContents
PromoteIntRes_BUILD_VECTOR currently always ANY_EXTENDs build vector operands, but if this is a constant boolean vector we're losing the useful ability to keep the vector matching the BooleanContents mode used by the target.

This patch extends constant boolean vectors according to target BooleanContents, allowing a number of additional all-bits folds (notable XOR -> NOT conversions) to occur.

Differential Revision: https://reviews.llvm.org/D129641
2022-07-20 10:49:31 +01:00
Lorenzo Albano 07d69d9fc9 [VP] Legalize the stride operand for EXPERIMENTAL_VP_STRIDED SDNodes
Add promotion and expansion of integer operands for
experimental_vp_strided SelectionDAG nodes; the expansion is actually
just a truncation of the stride operand.

Reviewed By: simoll

Differential Revision: https://reviews.llvm.org/D123112
2022-07-20 10:22:43 +02:00
Kazu Hirata 76e18cc4f6 [llvm] Use llvm::any_of and llvm::none_of (NFC) 2022-07-20 00:36:19 -07:00
Kazu Hirata 0387da6f4f Use value instead of getValue (NFC) 2022-07-19 21:18:26 -07:00
Kazu Hirata 41ae78ea3a Use has_value instead of hasValue (NFC) 2022-07-19 20:15:44 -07:00
Kazu Hirata bbbb4393ee [CodeGen] Use value_or instead of getValueOr (NFC) 2022-07-19 19:50:43 -07:00
David Truby 4c82f56d8f [llvm][SVE] Remove redundant and when comparing against extending load
When determining if an `and` should be merged into an extending load
the constant argument to the `and` is currently not checked if the
argument requires truncation. This prevents the combine happening when
the vector width is half the normal available vector width for SVE VLA
vectors.

Reviewed By: c-rhodes

Differential Revision: https://reviews.llvm.org/D129281
2022-07-19 17:08:32 +01:00
Simon Pilgrim 71c502cbca [DAG] Call SimplifyDemandedBits from ISD::MUL nodes
Noticed while triaging D129765.
2022-07-19 14:11:04 +01:00
Benjamin Kramer 8aff88fd3a [LegalizeDAG] Propagate alignment in ExpandExtractFromVectorThroughStack
Unlike the name suggests this can reuse any store as a base for a
memory-based vector extract. If that store is underaligned the loads
created to extract will have an invalid alignment. Since most CPUs are
forgiving wrt alignment this is almost never an issue, on x86 this is
only reproducible by extracting a 128 bit vector out of a wider vector.

I tried making a test case in the context of
https://reviews.llvm.org/D127982 but it's really really fragile, as the
output pretty much looks like a missed optimization.
2022-07-19 13:13:55 +02:00
Simon Pilgrim 0f6b0461b0 [DAG] SimplifyDemandedBits - relax "xor (X >> ShiftC), XorC --> (not X) >> ShiftC" to match only demanded bits
The "xor (X >> ShiftC), XorC --> (not X) >> ShiftC" fold is currently limited to the XOR mask being a shifted all-bits mask, but we can relax this to only need to match under the demanded bits.

This helps expose more bit extraction/clearing patterns and fixes the PowerPC testCompares*.ll regressions from D127115

Alive2: https://alive2.llvm.org/ce/z/fl7T7K

Differential Revision: https://reviews.llvm.org/D129933
2022-07-19 10:59:07 +01:00
Max Kazantsev 69b284aaf6 Revert "[DAGCombiner] Teach scalarizeBinOpOfSplats handle scalable splat."
This reverts commit 58dfaaaace.

Massive AARCH test failures in buildbot.
2022-07-19 13:41:52 +07:00
jacquesguan 58dfaaaace [DAGCombiner] Teach scalarizeBinOpOfSplats handle scalable splat.
This revision supports to scalarize a binary operation of two scalable splat vectors.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D122791
2022-07-19 11:20:51 +08:00
Matt Arsenault 8d0383eb69 CodeGen: Remove AliasAnalysis from regalloc
This was stored in LiveIntervals, but not actually used for anything
related to LiveIntervals. It was only used in one check for if a load
instruction is rematerializable. I also don't think this was entirely
correct, since it was implicitly assuming constant loads are also
dereferenceable.

Remove this and rely only on the invariant+dereferenceable flags in
the memory operand. Set the flag based on the AA query upfront. This
should have the same net benefit, but has the possible disadvantage of
making this AA query nonlazy.

Preserve the behavior of assuming pointsToConstantMemory implying
dereferenceable for now, but maybe this should be changed.
2022-07-18 17:23:41 -04:00
Jay Foad dbed4326dd [LiveIntervals] Find better anchoring end points when repairing ranges
r175673 changed repairIntervalsInRange to find anchoring end points for
ranges automatically, but the calculation of Begin included the first
instruction found that already had an index. This patch changes it to
exclude that instruction:

1. For symmetry, so that the half open range [Begin,End) only includes
   instructions that do not already have indexes.
2. As a possible performance improvement, since repairOldRegInRange
   will scan fewer instructions.
3. Because repairOldRegInRange hits assertion failures in some cases
   when it sees a def that already has a live interval.

(3) fixes about ten tests in the CodeGen lit test suite when
-early-live-intervals is forced on.

Differential Revision: https://reviews.llvm.org/D110182
2022-07-18 19:34:43 +01:00
Itay Bookstein 2570f226d1 [SDAG] Remove single-result restriction on commutative CSE
The DAG Combiner unnecessarily restricts commutative CSE
to nodes with a single result value. This commit removes
that restriction.

Signed-off-by: Itay Bookstein <ibookstein@gmail.com>

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D129666
2022-07-18 19:19:13 +03:00
Lorenzo Albano c00a44fa68 [VP] IR expansion pass for VP gather and scatter
Add vp_gather and vp_scatter expansion to unpredicated intrinsics.

Reviewed By: simoll

Differential Revision: https://reviews.llvm.org/D120664
2022-07-18 17:00:38 +02:00
Nikita Popov 56b4b6e81b [SDAG] Fix release build
This variable was only declared in debug builds, but is needed
in release builds as well.
2022-07-18 14:10:31 +02:00
Max Kazantsev d693fd29f1 [Verifier] Make Verifier recognize undef tokens as correct IR
Undef tokens may appear in unreached code as result of RAUW of some optimization,
and it should not be considered as bad IR.

Patch by Dmitry Bakunevich!

Differential Revision: https://reviews.llvm.org/D128904
Reviewed By: mkazantsev
2022-07-18 16:26:06 +07:00
Lorenzo Albano f390781cec [VP] Implementing expansion pass for VP load and store.
Added function to the ExpandVectorPredication pass to handle VP loads
and stores.

Reviewed By: simoll

Differential Revision: https://reviews.llvm.org/D109584
2022-07-18 08:47:54 +02:00
Craig Topper 7fa1c32634 [CodeGen] Remove unnecessary APInt copy. NFC 2022-07-17 23:41:53 -07:00
Craig Topper a55ff6aadd [Support][CodeGen] Fix spelling Divison->Division. NFC 2022-07-17 23:16:29 -07:00
Craig Topper 795602af0c [CodeGen] Don't compare bool with integer 0. NFC
The IsAdd field is a bool.
2022-07-17 23:16:14 -07:00
Kazu Hirata 3112987d5c Remove unused forward declarations (NFC) 2022-07-17 15:37:48 -07:00
Simon Pilgrim 53b90dd372 [DAG] Fold (or (and X, C1), (and (or X, Y), C2)) -> (or (and X, C1|C2), (and Y, C2))
Pulled out of D77804

Alive2: https://alive2.llvm.org/ce/z/g61VRe
2022-07-17 18:51:41 +01:00
Simon Pilgrim 26ce33706f [DAG] computeKnownBits - move UDIV handling to same place as UREM/SREM. NFC. 2022-07-17 11:59:42 +01:00
Simon Pilgrim 5ec47c6dc5 [DAG] Add MERGE_VALUE computeKnownBits/ComputeNumSignBits handling.
Just forward the value tracking to the operand specified by the ResNo
2022-07-17 11:58:08 +01:00
Kazu Hirata 9e6d1f4b5d [CodeGen] Qualify auto variables in for loops (NFC) 2022-07-17 01:33:28 -07:00
Kazu Hirata c0fe37de04 [CodeGen] Remove redundant declaration createGreedyRegisterAllocator (NFC)
The function is declared in llvm/include/llvm/CodeGen/Passes.h.

Identified with readability-redundant-declaration.
2022-07-16 15:43:34 -07:00
Kazu Hirata 4d9d07c5fb [CodeGen] Use RegClassFilterFunc where appropriate (NFC) 2022-07-16 15:43:33 -07:00
Sanjay Patel 7ca3e23f25 [SDAG] narrow truncated sign_extend_inreg
trunc (sign_ext_inreg X, iM) to iN --> sign_ext_inreg (trunc X to iN), iM

There are improvements on existing tests from this, and there are a pair
of large regressions in D127115 for Thumb2 caused by not folding this
pattern.

Differential Revision: https://reviews.llvm.org/D129890
2022-07-16 16:29:15 -04:00
Simon Pilgrim a44bdf9bc1 [DAG] visitINSERT_VECTOR_ELT - refactor BUILD_VECTOR creation from INSERT_VECTOR_ELT chain.
D127595 added the ability to recurse up a (one-use) INSERT_VECTOR_ELT chain to create a BUILD_VECTOR before other combines manage to break the chain, something that is particularly bad in D127115.

The patch generalises this so it doesn't have to build the chain starting from the last element insertion, instead it can now start from any insertion and will recurse up the chain until it finds all elements or finds a UNDEF/BUILD_VECTOR/SCALAR_TO_VECTOR which represents that start of the chain.

Fixes several regressions in D127115
2022-07-16 16:37:31 +01:00
Simon Pilgrim 52b6168c16 [DAG] visitINSERT_VECTOR_ELT - remove duplicate VT.getVectorNumElements() call. NFC. 2022-07-16 16:20:49 +01:00
Tim Besard a323dfc015 Don't sink ptrtoint/inttoptr sequences into non-noop addrspacecasts.
In https://reviews.llvm.org/D30114, support for mismatching address
spaces was introduced to CodeGenPrepare's optimizeMemoryInst, using
addrspacecast as it was argued that only no-op addrspacecasts would be
considered when constructing the address mode. However, by doing
inttoptr/ptrtoint, it's possible to get CGP to emit an addrspace
that's not actually no-op, introducing a miscompilation:

define void @kernel(i8* %julia_ptr) {
  %intptr = ptrtoint i8* %julia_ptr to i64
  %ptr = inttoptr i64 %intptr to i32 addrspace(3)*

  br label %end
end:

  store atomic i32 1, i32 addrspace(3)* %ptr unordered, align 4
  ret void
}

Gets compiled to:

define void @kernel(i8* %julia_ptr) {
end:
  %0 = addrspacecast i8* %julia_ptr to i32 addrspace(3)*
  store atomic i32 1, i32 addrspace(3)* %0 unordered, align 4
  ret void
}

In the case of NVPTX, this introduces a cvta.to.shared, whereas
leaving out the %end block and branch doesn't trigger this
optimization. This results in illegal memory accesses as seen in
https://github.com/JuliaGPU/CUDA.jl/issues/558

In this change, I introduced a check before doing the pointer cast
that verifies address spaces are the same. If not, it emits a
ptrtoint/inttoptr combination to get a no-op cast between address
spaces. I decided against disallowing ptrtoint/inttoptr with
non-default AS in matchOperationAddr, because now its still possible
to look through multiple sequences of them that ultimately do not
result in a address space mismatch (i.e. the second lit test).
2022-07-16 10:56:42 -04:00
Simon Pilgrim 2bb6b03d71 Fix signed/unsigned mismatch 2022-07-16 11:48:41 +01:00
Simon Pilgrim a5d0122f75 [DAG] Canonicalize non-inlane shuffle -> AND if all non-inlane referenced elements are known zero
As mentioned on D127115, this patch that attempts to recognise shuffle masks that could be simplified to a AND mask - we already have a similar transform that will fold AND -> 'clear mask' shuffle, but this patch handles cases where the referenced elements are not from the same lane indices but are known to be zero.

Differential Revision: https://reviews.llvm.org/D129150
2022-07-16 11:38:24 +01:00
Simon Pilgrim 1cb7416ee3 [DAG] combineShiftAnd1ToBitTest - match "and (srl (not X), C)), 1 --> (and X, 1<<C) == 0" patterns
combineShiftAnd1ToBitTest already matches "and (not (srl X, C)), 1 --> (and X, 1<<C) == 0" patterns, but we can end up with situations where the not is before the shift.

Part of some yak shaving for D127115 to generalise the "xor (X >> ShiftC), XorC --> (not X) >> ShiftC" fold.
2022-07-16 11:00:07 +01:00
Kazu Hirata 1a5d007659 Use has_value/value instead of hasValue/getValue (NFC) 2022-07-15 21:48:17 -07:00
Simon Pilgrim 3c8bf29696 [DAG] Move "xor (X logical_shift ShiftC), XorC --> (not X) logical_shift ShiftC" fold into SimplifyDemandedBits
SimplifyDemandedBits is called slightly later which allows the not(sext(x)) -> sext(not(x)) fold to occur via foldLogicOfShifts

As mentioned on D127115, we should be able to further generalise this based off the demanded bits.
2022-07-15 13:10:15 +01:00
Edd Barrett 2e62a26fd7
[stackmaps] Legalise patchpoint arguments.
This is similar to D125680, but for llvm.experimental.patchpoint
(instead of llvm.experimental.stackmap).

Differential review: https://reviews.llvm.org/D129268
2022-07-15 12:01:59 +01:00
Nikita Popov 2a721374ae [IR] Don't use blockaddresses as callbr arguments
Following some recent discussions, this changes the representation
of callbrs in IR. The current blockaddress arguments are replaced
with `!` label constraints that refer directly to callbr indirect
destinations:

    ; Before:
    %res = callbr i8* asm "", "=r,r,i"(i8* %x, i8* blockaddress(@test8, %foo))
    to label %asm.fallthrough [label %foo]
    ; After:
    %res = callbr i8* asm "", "=r,r,!i"(i8* %x)
    to label %asm.fallthrough [label %foo]

The benefit of this is that we can easily update the successors of
a callbr, without having to worry about also updating blockaddress
references. This should allow us to remove some limitations:

* Allow unrolling/peeling/rotation of callbr, or any other
  clone-based optimizations
  (https://github.com/llvm/llvm-project/issues/41834)
* Allow duplicate successors
  (https://github.com/llvm/llvm-project/issues/45248)

This is just the IR representation change though, I will follow up
with patches to remove limtations in various transformation passes
that are no longer needed.

Differential Revision: https://reviews.llvm.org/D129288
2022-07-15 10:18:17 +02:00
Craig Topper dcfc1fd26f [SelectionDAG][RISCV][AMDGPU][ARM] Improve SimplifyDemandedBits for SHL with variable shift amount.
If we have a variable shift amount and the demanded mask has leading
zeros, we can propagate those leading zeros to not demand those bits
from operand 0. This can allow zero_extend/sign_extend to become
any_extend. This pattern can occur due to C integer promotion rules.

This transform is already done by InstCombineSimplifyDemanded.cpp where
sign_extend can be turned into zero_extend for example.

Reviewed By: spatel, foad

Differential Revision: https://reviews.llvm.org/D121833
2022-07-14 16:10:14 -07:00
Amara Emerson d4f84df0a0 [GlobalISel] Change widenScalar of G_FCONSTANT to mutate into G_CONSTANT.
Widening a G_FCONSTANT by extending and then generating G_FPTRUNC doesn't produce
the same result all the time. Instead, we can just transform it to a G_CONSTANT
of the same bit pattern and truncate using a plain G_TRUNC instead.

Fixes https://github.com/llvm/llvm-project/issues/56454

Differential Revision: https://reviews.llvm.org/D129743
2022-07-14 11:05:10 -07:00
Guozhi Wei 2f11b3a6d7 [MachineCombiner] Don't compute the latency of transient instructions
If an MI will not generate a target instruction, we should not compute its
latency. Then we can compute more precise instruction sequence cost, and get
better result.

Differential Revision: https://reviews.llvm.org/D129615
2022-07-14 17:08:14 +00:00
Nikita Popov dcf4b733ef [SCEVExpander] Make CanonicalMode handing in isSafeToExpand() more robust (PR50506)
isSafeToExpand() for addrecs depends on whether the SCEVExpander
will be used in CanonicalMode. At least one caller currently gets
this wrong, resulting in PR50506.

Fix this by a) making the CanonicalMode argument on the freestanding
functions required and b) adding member functions on SCEVExpander
that automatically take the SCEVExpander mode into account. We can
use the latter variant nearly everywhere, and thus make sure that
there is no chance of CanonicalMode mismatch.

Fixes https://github.com/llvm/llvm-project/issues/50506.

Differential Revision: https://reviews.llvm.org/D129630
2022-07-14 14:41:51 +02:00
Jannik Silvanus e5c4cde451 [AMDGPU] SIMachineScheduler: Add support for several MachineScheduler features
The SI machine scheduler inherits from ScheduleDAGMI.
This patch adds support for a few features that are implemented
in ScheduleDAGMI (or its base classes) that were missing so far
because their support is implemented in overridden functions.

* Support cl::opt -view-misched-dags
  This option allows to open a graphical window of the scheduling DAG.

* Support cl::opt -misched-print-dags
  This option allows to print the scheduling DAG in text form.

* After constructing the scheduling DAG, call postprocessDAG()
  to apply any registered DAG mutations.
  Note that currently there are no mutations defined in AMDGPUTargetMachine.cpp
  in case SIScheduler is used.
  Still add this to avoid surprises in the future in case mutations are added.

Differential Revision: https://reviews.llvm.org/D128808
2022-07-14 09:45:31 +02:00
Kazu Hirata 611ffcf4e4 [llvm] Use value instead of getValue (NFC) 2022-07-13 23:11:56 -07:00
Amara Emerson 2824bdd92f [GlobalISel] Fix and(load)->zextload combine crash.
We shouldn't use getOpcodeDef() if we need to guarantee the def has only one
user since under the hood it may look through copies and optimization hints,
which themselves may have multiple users.
2022-07-13 14:58:45 -07:00
Philip Reames dde2a7fb6d [RISCV] Exploit fact that vscale is always power of two to replace urem sequence
When doing scalable vectorization, the loop vectorizer uses a urem in the computation of the vector trip count. The RHS of that urem is a (possibly shifted) call to @llvm.vscale.

vscale is effectively the number of "blocks" in the vector register. (That is, types such as <vscale x 8 x i8> and <vscale x 1 x i8> both fill one 64 bit block, and vscale is essentially how many of those blocks there are in a single vector register at runtime.)

We know from the RISCV V extension specification that VLEN must be a power of two between ELEN and 2^16. Since our block size is 64 bits, the must be a power of two numbers of blocks. (For everything other than VLEN<=32, but that's already broken.)

It is worth noting that AArch64 SVE specification explicitly allows non-power-of-two sizes for the vector registers and thus can't claim that vscale is a power of two by this logic.

Differential Revision: https://reviews.llvm.org/D129609
2022-07-13 10:54:47 -07:00
Simon Pilgrim d172842b51 [DAG] SimplifyDemandedVectorElts - adjust demanded elements for selection mask for known zero results
If an element is known zero from both selections then it shouldn't matter what the selection mask element is.
2022-07-13 17:36:05 +01:00
Philip Reames fd67992f9c [DAGCombine] fold (urem x, (lshr pow2, y)) -> (and x, (add (lshr pow2, y), -1))
We have the same fold in InstCombine - though implemented via OrZero flag on isKnownToBePowerOfTwo. The reasoning here is that either a) the result of the lshr is a power-of-two, or b) we have a div-by-zero triggering UB which we can ignore.

Differential Revision: https://reviews.llvm.org/D129606
2022-07-13 08:34:38 -07:00
esmeyi 100319cdb4 [AIX] follow-up of D124654.
Report an error when alias symbols are not emitted all.
2022-07-13 03:39:08 -04:00
Kai Nacke 4ae254e488 Revert "[GISel] Unify use of getStackGuard"
This reverts commit e60b4fb2b7.
2022-07-12 17:00:43 -04:00
Kai Nacke e60b4fb2b7 [GISel] Unify use of getStackGuard
Some rework of getStackGuard() based on comments in
https://reviews.llvm.org/D129505.

- getStackGuard() now creates and returns the destination
  register, simplifying calls
- the pointer type is passed to getStackGuard() to avoid
  recomputation
- removed PtrMemTy in emitSPDescriptorParent(), because
  this type is only used here when loading the value but
  not when storing the value

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D129576
2022-07-12 16:46:37 -04:00
Craig Topper 8eaf00e04d [TargetLowering][RISCV] Make expandCTLZ work for non-power of 2 types.
To convert CTLZ to popcount we do

x = x | (x >> 1);
x = x | (x >> 2);
...
x = x | (x >>16);
x = x | (x >>32); // for 64-bit input
return popcount(~x);

This smears the most significant set bit across all of the bits
below it then inverts the remaining 0s and does a population count.

To support non-power of 2 types, the last shift amount must be
more than half of the size of the type. For i15, the last shift
was previously a shift by 4, with this patch we add another shift
of 8.

Fixes PR56457.

Differential Revision: https://reviews.llvm.org/D129431
2022-07-12 11:36:37 -07:00
Kai Nacke 42f7364fcb [GISel] Check useLoadStackGuardNode() before generating LOAD_STACK_GUARD
When lowering llvm::stackprotect intrinsic, the SDAG implementation
checks useLoadStackGuardNode() to either create a LOAD_STACK_GUARD or use
the first argument of the intrinsic. This check is not present in the
IRTranslator, which results in always generating a LOAD_STACK_GUARD even
if the target does not support it.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D129505
2022-07-12 11:44:42 -04:00
Simon Pilgrim ded62411f7 [DAG] SimplifyDemandedBits - AND/OR/XOR - attempt basic knownbits simplifications before calling SimplifyMultipleUseDemandedBits
Noticed while investigating the SystemZ regressions in D77804, prefer handling the knownbits analysis/simplification in the bitop nodes directly before falling back to SimplifyMultipleUseDemandedBits
2022-07-12 14:09:00 +01:00
Jay Foad 0d1b5268e8 [MachineVerifier] Try harder to verify LiveStacks
Verify the LiveStacks analysis after a pass that claims to preserve it,
even if there are no further passes (apart from the verifier itself)
that would use the analysis.

Differential Revision: https://reviews.llvm.org/D129200
2022-07-12 09:54:54 +01:00
Nikita Popov c64aba5d93 [SDAG] Don't duplicate ParseConstraints() implementation SDAGBuilder (NFCI)
visitInlineAsm() in SDAGBuilder was duplicating a lot of the code
in ParseConstraints(), in particular all the logic to determine the
operand value and constraint VT.

Rely on the data computed by ParseConstraints() instead, and update
its ConstraintVT implementation to match getCallOperandValEVT()
more precisely.
2022-07-12 10:42:02 +02:00
Craig Topper b05160dbdf [SelectionDAG] Simplify how we drop poison flags in SimplifyDemandedBits.
As far as I can tell what was happening in the original code is
that the getNode call receives the same operands as the original
node with different SDNodeFlags. The logic inside getNode detects
that the node already exists and intersects the flags into the
existing node and returns it. This results in Op and NewOp for the
TLO.CombineTo call always being the same node.

We may have already called CombineTo as part of the recursive handling.
A second call to CombineTo as we unwind the recursion overwrites
the previous CombineTo. I think this means any time we updated the
poison flags that was the only change that ends up getting made
and we relied on DAGCombiner to revisit and call SimplifyDemandedBits
again. The second time the poison flags wouldn't need to be dropped
and we would keep the CombineTo call from further down the recursion.

We can instead call setFlags to drop the poison flags and remove the
call to TLO.CombineTo. This way we keep the CombineTo from deeper in
the recursion which should be more efficient.

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D129511
2022-07-11 13:42:33 -07:00
Sanjay Patel d0eec5f7e7 [SDAG] enhance sub->xor fold to ignore signbit
As suggested in the post-commit feedback for D128123,
we can ease the mask constraint to ignore the MSB
(and make the code easier to read by adjusting the check).

https://alive2.llvm.org/ce/z/bbvqWv
2022-07-11 12:37:50 -04:00
Mircea Trofin 24c6c35270 [mlgo] Don't provide default model URLs
Pointed out in Issue #56432: the current reference models may not be
quite friendly to open source projects. Their purpose is only
illustrative - the expectation is that projects would train their own.
To avoid unintentionally pulling such a model, made the URL cmake
setting require explicit user setting.

Differential Revision: https://reviews.llvm.org/D129342
2022-07-11 07:37:14 -07:00
Stephen Tozer f9ac161af9 [DebugInfo][InstrRef] Fix error in copy handling in InstrRefLDV
Currently, an error exists when InstrRefBasedLDV observes transfers of
variables across copies, which causes it to lose track of variables
under certain circumstances, resulting in shorter lifetimes for those
variables as LDV gives up searching for live locations for them. This
patch fixes this issue by storing the currently tracked values in
the destination first, then updating them manually later without
clobbering or assigning them the wrong value.

Differential Revision: https://reviews.llvm.org/D128101
2022-07-11 13:38:23 +01:00
Kazu Hirata 5b55b7f6d2 [CodeGen] Remove unused member variable NextCascade (NFC) 2022-07-10 18:57:40 -07:00
Kazu Hirata 1fd6611fc8 [SelectionDAG] Restore calls to has_value (NFC)
This patch restores calls to has_value to make it clear that we are
checking the presence of an optional value, not the underlying value.

This patch partially reverts d08f34b592.

Differential Revision: https://reviews.llvm.org/D129454
2022-07-10 14:37:23 -07:00
David Green 28b41237e6 [InterleaveAccessPass] Handle multi-use binop shuffles
D89489 added some logic to the interleaved access pass to attempt to
undo the folding of shuffles into binops, that instcombine performs. If
early-cse is run too, the binops may be commoned into a single operation
with multiple shuffle uses. It is still profitable reverse the transform
though, so long as all the uses are shuffles.

Differential Revision: https://reviews.llvm.org/D129419
2022-07-10 17:24:37 +01:00
Nicolai Hähnle ede600377c ManagedStatic: remove many straightforward uses in llvm
(Reapply after revert in e9ce1a5880 due to
Fuchsia test failures. Removed changes in lib/ExecutionEngine/ other
than error categories, to be checked in more detail and reapplied
separately.)

Bulk remove many of the more trivial uses of ManagedStatic in the llvm
directory, either by defining a new getter function or, in many cases,
moving the static variable directly into the only function that uses it.

Differential Revision: https://reviews.llvm.org/D129120
2022-07-10 10:29:15 +02:00
Nicolai Hähnle e9ce1a5880 Revert "ManagedStatic: remove many straightforward uses in llvm"
This reverts commit e6f1f06245.

Reverting due to a failure on the fuchsia-x86_64-linux buildbot.
2022-07-10 09:54:30 +02:00
Nicolai Hähnle e6f1f06245 ManagedStatic: remove many straightforward uses in llvm
Bulk remove many of the more trivial uses of ManagedStatic in the llvm
directory, either by defining a new getter function or, in many cases,
moving the static variable directly into the only function that uses it.

Differential Revision: https://reviews.llvm.org/D129120
2022-07-10 09:15:08 +02:00
Craig Topper 40866b74bd [DAGCombiner][X86] Fold sra (sub AddC, (shl X, N1C)), N1C --> sext (sub AddC1',(trunc X to (width - N1C)))
We already handled this case for add with a constant RHS. A
similar pattern can occur for sub with a constant left hand side.

Test cases use add and a mul representing (neg (shl X, C)) because
that's what I saw in the wild. The mul will be decomposed and then
the new transform can kick in.

Tests have not been committed, but this patch shows the changes.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D128769
2022-07-09 11:53:44 -07:00
Alexander Yermolovich a84e1e6c0d [DWARF] Add linkagename to hash
Originally encountered with RUST, but also there are cases with distributed LTO
where debug info dwo units contain structurally the same debug information, with
difference in DW_AT_linkage_name. This causes collision on DWO ID.

Differential Revision: https://reviews.llvm.org/D129317
2022-07-08 10:15:25 -07:00
Matt Arsenault 13ac4c3de9 GlobalISel: Add buildBoolExtInReg helper 2022-07-08 11:55:08 -04:00
Matt Arsenault e9a45d45d0 GlobalISel: Allow forming atomic/volatile G_SEXTLOAD
Mirror the change to G_ZEXTLOAD.
2022-07-08 11:55:08 -04:00
Matt Arsenault 1ee6ce9bad GlobalISel: Allow forming atomic/volatile G_ZEXTLOAD
SelectionDAG has a target hook, getExtendForAtomicOps, which it uses
in the computeKnownBits implementation for ATOMIC_LOAD. This is pretty
ugly (as is having a separate load opcode for atomics), so instead
allow making use of atomic zextload. Enable this for AArch64 since the
DAG path defaults in to the zext behavior.

The tablegen changes are pretty ugly, but partially helps migrate
SelectionDAG from using ISD::ATOMIC_LOAD to regular ISD::LOAD with
atomic memory operands. For now the DAG emitter will emit matchers for
patterns which the DAG will not produce.

I'm still a bit confused by the intent of the isLoad/isStore/isAtomic
bits. The DAG implementation rejects trying to use any of these in
combination. For now I've opted to make the isLoad checks also check
isAtomic, although I think having isLoad and isAtomic set on these
makes most sense.
2022-07-08 11:55:08 -04:00
Simon Pilgrim b53046122f [DAG] SimplifyDemandedBits - fold AND(INSERT_SUBVECTOR(C,X,I),M) -> INSERT_SUBVECTOR(AND(C,M),X,I)
If all the demanded bits of the AND mask covering the inserted subvector 'X' are known to be one, then the mask isn't affecting the subvector at all.

In which case, if the base vector 'C' is undef/constant, then move the AND mask up to just (constant) fold it directly.

Addresses some of the regressions from D129150, particularly the cases where we're attempting to zero the upper elements of a widened vector.

Differential Revision: https://reviews.llvm.org/D129290
2022-07-08 16:08:31 +01:00
Daniil Fukalov 6858a17f66 [LiveIntervals] Fix incorrect range (re)construction from subranges.
After D82916 `updateAllRanges()` started to fix holes in main range with
subranges but it fails on instructions with two subregs def which are parts of
one reg. The main range constructed with //all// subranges of subregs just after
processing the first operand. So the main range gets intervals from subranges
those are not updated yet.

The patch takes into account lane mask to update the main range.

Reviewed By: rampitec, arsenm

Differential Revision: https://reviews.llvm.org/D128553
2022-07-08 16:07:19 +03:00
Sanjay Patel 8b75671314 [SDAG] try to replace subtract-from-constant with xor
This is almost the same as the abandoned D48529, but it
allows splat vector constants too.

This replaces the x86-specific code that was added with
the alternate patch D48557 with the original generic
combine.

This transform is a less restricted form of an existing
InstCombine and the proposed SDAG equivalent for that
in D128080:
https://alive2.llvm.org/ce/z/OUm6N_

Differential Revision: https://reviews.llvm.org/D128123
2022-07-08 08:14:24 -04:00
OCHyams 6b62ca9043 [NFC][SelectionDAG] Fix debug prints in salvageUnresolvedDbgValue
The prints are printing pointer values - fix by dereferencing the pointers.
2022-07-08 12:09:30 +01:00
Petar Avramovic 2483f43d47 [AArch64][GlobalISel] Fix call lowering for <3 x i32> vector arguments
Differential Revision: https://reviews.llvm.org/D129194
2022-07-08 10:25:45 +02:00
Sergei Barannikov 2247fdc84d [SelectionDAG] computeKnownBits / ComputeNumSignBits for the remaining overflow-aware nodes
Some overflow-aware nodes were missing from the switches in
computeKnownBits and ComputeNumSignBits.
2022-07-08 09:19:19 +01:00
Joseph Huber 41fba3c107 [Metadata] Add 'exclude' metadata to add the exclude flags on globals
This patchs adds a new metadata kind `exclude` which implies that the
global variable should be given the necessary flags during code
generation to not be included in the final executable. This is done
using the ``SHF_EXCLUDE`` flag on ELF for example. This should make it
easier to specify this flag on a variable without needing to explicitly
check the section name in the target backend.

Depends on D129053 D129052

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D129151
2022-07-07 12:20:40 -04:00
Joseph Huber 1d2ce4da84 [Object] Add ELF section type for offloading objects
Currently we use the `.llvm.offloading` section to store device-side
objects inside the host, creating a fat binary. The contents of these
sections is currently determined by the name of the section while it
should ideally be determined by its type. This patch adds the new
`SHT_LLVM_OFFLOADING` section type to the ELF section types. Which
should make it easier to identify this specific data format.

Reviewed By: jhenderson

Differential Revision: https://reviews.llvm.org/D129052
2022-07-07 12:20:30 -04:00
Bradley Smith 60d6be5dd3 [LegalizeTypes] Replace vecreduce_xor/or/and with vecreduce_add/umax/umin if not legal
This is done during type legalization since the target representation of
these nodes may not be valid until after type legalization, and after
type legalization the fact that these are dealing with i1 types may be
lost.

Differential Revision: https://reviews.llvm.org/D128996
2022-07-07 09:33:54 +00:00
Sander de Smalen 15c3ba8a44 [AArc64] Legalisation of compares and truncates of nxv1i1 types.
Truncates and compares require some changes to generic legalisation functions
to use ElementCount instead of getNumElements.

Reviewed By: paulwalker-arm

Differential Revision: https://reviews.llvm.org/D129082
2022-07-07 07:39:27 +00:00
Eli Friedman 696f53665d [AsmPrinter] Fix bit pattern for i1 vectors.
Vectors are defined to be tightly packed, regardless of the element
type.  The AsmPrinter didn't realize this, and was allocating extra
padding.

Fixes https://github.com/llvm/llvm-project/issues/49286
Fixes https://github.com/llvm/llvm-project/issues/53246
Fixes https://github.com/llvm/llvm-project/issues/55522

Differential Revision: https://reviews.llvm.org/D129164
2022-07-06 12:56:47 -07:00
Edd Barrett ed8ef65f3d
[stackmaps] Start legalizing live variable operands
Prior to this change, live variable operands passed to
`llvm.experimental.stackmap` would be emitted directly to target nodes,
meaning that they don't get legalised. The upshot of this is that LLVM
may crash when encountering illegally typed target nodes.

e.g. https://github.com/llvm/llvm-project/issues/21657

This change introduces a platform independent stackmap DAG node whose
operands are legalised as per usual, thus avoiding aforementioned
crashes.

Note that some kinds of argument are still not handled properly, namely
vectors, structs, and large integers, like i128s. These will need to be
addressed in follow-up changes.

Note also that this does not change the behaviour of
`llvm.experimental.patchpoint`. A follow up change will do the same for
this intrinsic.

Differential review:
https://reviews.llvm.org/D125680
2022-07-06 14:01:54 +01:00
Shilei Tian 1023ddaf77 [LLVM] Add the support for fmax and fmin in atomicrmw instruction
This patch adds the support for `fmax` and `fmin` operations in `atomicrmw`
instruction. For now (at least in this patch), the instruction will be expanded
to CAS loop. There are already a couple of targets supporting the feature. I'll
create another patch(es) to enable them accordingly.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D127041
2022-07-06 10:57:53 -04:00
Nikita Popov f96cb66d19 [ValueTracking] Accept Instruction in isSafeToSpeculativelyExecute() (NFC)
As constant expressions can no longer trap, it only makes sense to
call isSafeToSpeculativelyExecute on Instructions, so limit the
API to accept only them, rather than general Operators or Values.
2022-07-06 11:12:49 +02:00
Nikita Popov bb84e5eeff [SelectionDAGISel] Drop unused variable (NFC) 2022-07-06 10:46:13 +02:00
Nikita Popov 8ee913d83b [IR] Remove Constant::canTrap() (NFC)
As integer div/rem constant expressions are no longer supported,
constants can no longer trap and are always safe to speculate.
Remove the Constant::canTrap() method and its usages.
2022-07-06 10:36:47 +02:00
Simon Pilgrim 7068c843d2 [DAG] visitREM - use isAllOnesOrAllOnesSplat instead of isConstOrConstSplat
We were only using the N1C scalar/splat value once, so for clarity use isAllOnesOrAllOnesSplat instead if we actually need it.
2022-07-05 16:44:31 +01:00
Simon Pilgrim e7a0fa4df0 [DAG] foldAddSubOfSignBit - don't bother creating the new shift node unless constant folding succeeds
Noticed by inspection - the new shift is only ever used if the constant fold occurs
2022-07-05 16:44:31 +01:00
Thomas Symalla 04c5fed5e0 [NFC] Fix wrong comment. 2022-07-05 13:37:44 +02:00
Simon Pilgrim cce64e7a9c [DAG] visitTRUNCATE - move GetDemandedBits AFTER SimplifyDemandedBits.
Another cleanup step before removing GetDemandedBits entirely.
2022-07-04 11:25:40 +01:00
Nikita Popov 7283f48a05 [IR] Remove support for insertvalue constant expression
This removes the insertvalue constant expression, as part of
https://discourse.llvm.org/t/rfc-remove-most-constant-expressions/63179.
This is very similar to the extractvalue removal from D125795.
insertvalue is also not supported in bitcode, so no auto-ugprade
is necessary.

ConstantExpr::getInsertValue() can be replaced with
IRBuilder::CreateInsertValue() or ConstantFoldInsertValueInstruction(),
depending on whether a constant result is required (with the latter
being fallible).

The ConstantExpr::hasIndices() and ConstantExpr::getIndices()
methods also go away here, because there are no longer any constant
expressions with indices.

Differential Revision: https://reviews.llvm.org/D128719
2022-07-04 09:27:22 +02:00
esmeyi d2a35e4d39 [AIX] Handling the label alignment of a global
variable with its multiple aliases.

This patch handles the case where a variable has
multiple aliases.
AIX's assembly directive .set is not usable for the
aliasing purpose, and using different labels allows
AIX to emulate symbol aliases. If a value is emitted
between any two labels, meaning they are not aligned,
XCOFF will automatically calculate the offset for them.

This patch implements:
1) Emits the label of the alias just before emitting
the value of the sub-element that the alias referred to.
2) A set of aliases that refers to the same offset
should be aligned.
3) We didn't emit aliasing labels for common and
zero-initialized local symbols in
PPCAIXAsmPrinter::emitGlobalVariableHelper, but
emitted linkage for them in
AsmPrinter::emitGlobalAlias, which caused a FAILURE.
This patch fixes the bug by blocking emitting linkage
for the alias without a label.

Reviewed By: shchenz

Differential Revision: https://reviews.llvm.org/D124654
2022-07-03 23:16:16 -04:00
Quentin Colombet f4145ddf5b [GISel] Don't fold convergent instruction across CFG
Before merging two instructions together, GISel does some sanity checks
that the folding is legal. However that check was missing that the
source of the pattern may be convergent. When the destination location
is in a different basic block, the folding is invalid.

Differential Revision: https://reviews.llvm.org/D128539
2022-07-01 10:24:24 -07:00
Sander de Smalen 690db16422 [AArch64] Make nxv1i1 types a legal type for SVE.
One motivation to add support for these types are the LD1Q/ST1Q
instructions in SME, for which we have defined a number of load/store
intrinsics which at the moment still take a `<vscale x 16 x i1>` predicate
regardless of their element type.

This patch adds basic support for the nxv1i1 type such that it can be passed/returned
from functions, as well as some basic support to support some existing tests that
result in a nxv1i1 type. It also adds support for splats.

Other operations (e.g. insert/extract subvector, logical ops, etc) will be
supported in follow-up patches.

Reviewed By: paulwalker-arm, efriedma

Differential Revision: https://reviews.llvm.org/D128665
2022-07-01 15:11:13 +00:00
Xiang1 Zhang 72a23cef7e [ISel] Match all bits when merge undefs for DAG combine
Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D128570
2022-07-01 09:09:43 +08:00
Xiang1 Zhang 64f44a90ef Revert "[ISel] Match all bits when merge undef(s) for DAG combine"
This reverts commit 5fe5aa284e.
2022-07-01 08:59:04 +08:00
Xiang1 Zhang 5fe5aa284e [ISel] Match all bits when merge undef(s) for DAG combine 2022-07-01 08:58:00 +08:00
Nuno Lopes 373571dbb4 [NFC] Switch a few uses of undef to poison as placeholders for unreachble code 2022-06-30 23:01:43 +01:00
jeff 09424f802c [AMDGPU] Check for CopyToReg PhysReg clobbers in pre-RA-sched
Differential Revision: https://reviews.llvm.org/D128681
2022-06-30 09:18:04 -07:00
Luo, Yuanke fa8656d28d [greedyalloc] Return early when there is no register to allocate.
In X86 we split greddy register allocation into 2 passes. The 1st pass
is to allocate tile register, and the 2nd pass is to allocate the rest
of virtual register. In most cases there is no tile register, so the 1st
pass is unnecessary. To improve the compiling time, we check if there is
any register need to be allocated by invoking callback
`ShouldAllocateClass`. If there is no register to be allocated, just
return false in the pass. This would improve the 1st greed RA pass for
normal cases.

Differential Revision: https://reviews.llvm.org/D128804
2022-06-30 11:12:05 +08:00
Stefan Pintilie e50a8c8435 [GlobalMerge] Ensure that the MustKeepGlobalVariables has all globals from each landingpad clause.
The filter clause in the landingpad may not have a GlobalVariable operand.
It may instead have a ConstantArray of operands and each operand within this
ConstantArray should also be checked to see if it is a GlobalVariable.

This patch add the check for the ConstantArray as well as a debug message that
outputs the contents of MustKeepGlobalVariables.

Reviewed By: lei, amyk, scui

Differential Revision: https://reviews.llvm.org/D128287
2022-06-29 15:55:47 -05:00
Nikita Popov 16033ffdd9 [ConstExpr] Remove more leftovers of extractvalue expression (NFC)
Remove some leftover bits of extractvalue handling after the
removal in D125795.
2022-06-29 10:45:19 +02:00
Nikita Popov 348ea34bcd [AsmPrinter] Further restrict expressions supported in global initializers
lowerConstant() currently accepts a number of constant expressions
which have corresponding MC expressions, but which cannot be
evaluated as a relocatable expression (unless the operands are
constant, in which case we'll just fold the expression to a constant).

The motivation here is to clarify which constant expressions are
really needed for https://discourse.llvm.org/t/rfc-remove-most-constant-expressions/63179,
and in particular clarify that we do not need to support any
division expressions, which are particularly problematic.

Differential Revision: https://reviews.llvm.org/D127972
2022-06-29 10:02:07 +02:00
Luo, Yuanke 5cb0979870 [X86][AMX] Split greedy RA for tile register
When we fill the shape to tile configure memory, the shape is gotten
from AMX pseudo instruction. However the register for the shape may be
split or spilled by greedy RA. That cause we fill the shape to config
memory after ldtilecfg is executed, so that the shape configuration
would be wrong.
This patch is to split the tile register allocation from greedy register
allocation, so that after tile registers are allocated the shape
registers are still virtual register. The shape register only may be
redefined or multi-defined by phi elimination pass, two address pass.
That doesn't affect tile register configuration.

Differential Revision: https://reviews.llvm.org/D128584
2022-06-29 10:35:43 +08:00
Guozhi Wei ddc9e8861c [MachineCombiner, AArch64] Add a new pattern A-(B+C) => (A-B)-C to reduce latency
Add a new pattern A - (B + C) ==> (A - B) - C to give machine combiner a chance
to evaluate which instruction sequence has lower latency.

Differential Revision: https://reviews.llvm.org/D124564
2022-06-28 21:42:51 +00:00
Rahman Lavaee 0aa6df6575 [Propeller] Encode address offsets of basic blocks relative to the end of the previous basic blocks.
This is a resurrection of D106421 with the change that it keeps backward-compatibility. This means decoding the previous version of `LLVM_BB_ADDR_MAP` will work. This is required as the profile mapping tool is not released with LLVM (AutoFDO). As suggested by @jhenderson we rename the original  section type value to `SHT_LLVM_BB_ADDR_MAP_V0` and assign a new value to the `SHT_LLVM_BB_ADDR_MAP` section type. The new encoding adds a version byte to each function entry to specify the encoding version for that function.  This patch also adds a feature byte to be used with more flexibility in the future. An use-case example for the feature field is encoding multi-section functions more concisely using a different format.

Conceptually, the new encoding emits basic block offsets and sizes as label differences between each two consecutive basic block begin and end label. When decoding, offsets must be aggregated along with basic block sizes to calculate the final offsets of basic blocks relative to the function address.

This encoding uses smaller values compared to the existing one (offsets relative to function symbol).
Smaller values tend to occupy fewer bytes in ULEB128 encoding. As a result, we get about 17% total reduction in the size of the bb-address-map section (from about 11MB to 9MB for the clang PGO binary).
The extra two bytes (version and feature fields) incur a small 3% size overhead to the `LLVM_BB_ADDR_MAP` section size.

Reviewed By: jhenderson

Differential Revision: https://reviews.llvm.org/D121346
2022-06-28 07:42:54 -07:00
Tim Northover 4aafebce52 SelectionDAG: allow FP extensions when folding extract/insert.
Before, we were trying to sign extend half -> float, and asserted in getNode.
2022-06-28 12:08:35 +01:00
Guillaume Chatelet 3c126d5fe4 [Alignment] Replace commonAlignment with std::min
`commonAlignment` is a shortcut to pick the smallest of two `Align`
objects. As-is it doesn't bring much value compared to `std::min`.

Differential Revision: https://reviews.llvm.org/D128345
2022-06-28 07:15:02 +00:00
Fangrui Song f1e27716cf [LiveInterval] Simplify with partition_point. NFC 2022-06-27 19:25:26 -07:00
Yuanfang Chen 6678f8e505 [ubsan] Using metadata instead of prologue data for function sanitizer
Information in the function `Prologue Data` is intentionally opaque.
When a function with `Prologue Data` is duplicated. The self (global
value) references inside `Prologue Data` is still pointing to the
original function. This may cause errors like `fatal error: error in backend: Cannot represent a difference across sections`.

This patch detaches the information from function `Prologue Data`
and attaches it to a function metadata node.

This and D116130 fix https://github.com/llvm/llvm-project/issues/49689.

Reviewed By: pcc

Differential Revision: https://reviews.llvm.org/D115844
2022-06-27 12:09:13 -07:00
Patrick Walton becbbb7e3c Round up zero-sized symbols to 1 byte in `.debug_aranges` (without breaking other logic).
This commit modifies the AsmPrinter to avoid emitting any zero-sized symbols to
the .debug_aranges table, by rounding their size up to 1. Entries with zero
length violate the DWARF 5 spec, which states:

> Each descriptor is a triple consisting of a segment selector, the beginning
> address within that segment of a range of text or data covered by some entry
> owned by the corresponding compilation unit, followed by the non-zero length
> of that range.

In practice, these zero-sized entries produce annoying warnings in lld and
cause GNU binutils to truncate the table when parsing it.

Other parts of LLVM, such as DWARFDebugARanges in the DebugInfo module
(specifically the appendRange method), already avoid emitting zero-sized
symbols to .debug_aranges, but not comprehensively in the AsmPrinter. In fact,
the AsmPrinter does try to avoid emitting such zero-sized symbols when labels
aren't involved, but doesn't when the symbol to emitted is a difference of two
labels; this patch extends that logic to handle the case in which the symbol is
defined via labels.

Furthermore, this patch fixes a bug in which `available_externally` symbols
would cause unpredictable values to be emitted into the `.debug_aranges` table
under certain circumstances. In practice I don't believe that this caused
issues up until now, but the root cause of this bug--an invalid DenseMap
lookup--triggered failures in Chromium when combined with an earlier version of
this patch. Therefore, this patch fixes that bug too.

This is a revised version of diff D126257, which was reverted due to breaking
tests. The now-reverted version of this patch didn't distinguish between
symbols that didn't have their size reported to the DwarfDebug handler and
those that had their size reported to be zero. This new version of the patch
instead restricts the special handling only to the symbols whose size is
definitively known to be zero.

Reviewed By: dblaikie

Differential Revision: https://reviews.llvm.org/D126835
2022-06-27 10:01:03 -07:00
Matt Arsenault 97ed2fbc5f MIR: Fix parse error on empty CustomRegMask 2022-06-27 08:50:35 -04:00
Bradley Smith a83aa33d1b [IR] Move vector.insert/vector.extract out of experimental namespace
These intrinsics are now fundemental for SVE code generation and have been
present for a year and a half, hence move them out of the experimental
namespace.

Differential Revision: https://reviews.llvm.org/D127976
2022-06-27 10:48:45 +00:00
Kazu Hirata 94460f5136 Don't use Optional::hasValue (NFC)
This patch replaces x.hasValue() with x where x is contextually
convertible to bool.
2022-06-26 19:54:41 -07:00
Kazu Hirata d08f34b592 [llvm] Don't use Optional::hasValue (NFC)
This patch replaces Optional::hasValue with the implicit cast to bool
in conditionals only.
2022-06-26 18:31:51 -07:00
Kazu Hirata a81b64a1fb [llvm] Use Optional::has_value instead of Optional::hasValue (NFC)
This patch replaces x.hasValue() with x.has_value() where x is not
contextually convertible to bool.
2022-06-26 16:10:42 -07:00
Craig Topper 44b456e5f0 [CodeGenPrepare] Avoid double map lookup. NFCI 2022-06-26 10:47:14 -07:00
Nikita Popov ec19223131 Revert "[LiveInterval] Simplify. NFC"
This reverts commit e733b80f3c.

This caused a major compile-time regression:
https://llvm-compile-time-tracker.com/compare.php?from=3b7c3a654c9175f41ac871a937cbcae73dfb3c5d&to=e733b80f3cba26bf2df9bd691120f37fc1af21ce&stat=instructions

About 1% on average, 10% on individual files.
2022-06-26 11:51:07 +02:00
Kazu Hirata a7938c74f1 [llvm] Don't use Optional::hasValue (NFC)
This patch replaces Optional::hasValue with the implicit cast to bool
in conditionals only.
2022-06-25 21:42:52 -07:00
Fangrui Song e733b80f3c [LiveInterval] Simplify. NFC 2022-06-25 11:59:33 -07:00
Kazu Hirata 3b7c3a654c Revert "Don't use Optional::hasValue (NFC)"
This reverts commit aa8feeefd3.
2022-06-25 11:56:50 -07:00
Kazu Hirata aa8feeefd3 Don't use Optional::hasValue (NFC) 2022-06-25 11:55:57 -07:00
Matt Arsenault e7bc73739a GlobalISel: Make LoadStoreOpt preserve all
Avoids dropping CSE info analysis
2022-06-25 09:24:54 -04:00
Matt Arsenault a397846cb0 CodeGen: Use else if between Value and PseudoSourceValue cases
These are mutually exclusive.
2022-06-25 09:24:25 -04:00
chenglin.bi 8c74205642 [SelectionDAG][DAGCombiner] Reuse exist node by reassociate
When already have (op N0, N2), reassociate (op (op N0, N1), N2) to (op (op N0, N2), N1) to reuse the exist (op N0, N2)

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D122539
2022-06-24 23:15:06 +08:00
Nabeel Omer 0d41794335 [SLP] Add cost model for `llvm.powi.*` intrinsics (REAPPLIED)
Patch was reverted in 4c5f10a due to buildbot failures, now being
reapplied with updated AArch64 and RISCV tests.

This patch adds handling for the llvm.powi.* intrinsics in
BasicTTIImplBase::getIntrinsicInstrCost() and improves vectorization.
Closes #53887.

Differential Revision: https://reviews.llvm.org/D128172
2022-06-24 10:23:19 +00:00
Carl Ritson 874fbe2cbb [MachineSink] Clear kill flags on operands outside loop
If an instruction is sunk into a loop then any kill flags on
operands declared outside the loop must be cleared as these
will be live for all loop iterations.

Fixes #46827

Reviewed By: MatzeB

Differential Revision: https://reviews.llvm.org/D126754
2022-06-24 14:02:48 +09:00
Lian Wang 1ce30457c1 [LegalizeTypes][NFC] Add an assert to WidenVecRes_EXTRACT_SUBVECTOR and adjust some code
Reviewed By: craig.topper, david-arm

Differential Revision: https://reviews.llvm.org/D128038
2022-06-24 03:06:16 +00:00
Lian Wang 770fe864fe [SelectionDAG] Enable WidenVecOp_VECREDUCE for scalable vector
Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D128239
2022-06-24 02:32:53 +00:00
Craig Topper 8b10ffabae [RISCV] Disable <vscale x 1 x *> types with Zve32x or Zve32f.
According to the vector spec, mf8 is not supported for i8 if ELEN
is 32. Similarily mf4 is not suported for i16/f16 or mf2 for i32/f32.

Since RVVBitsPerBlock is 64 and LMUL is calculated as
((MinNumElements * ElementSize) / RVVBitsPerBlock) this means we
need to disable any type with MinNumElements==1.

For generic IR, these types will now be widened in type legalization.
For RVV intrinsics, we'll probably hit a fatal error somewhere. I plan
to work on disabling the intrinsics in the riscv_vector.h header.

Reviewed By: arcbbb

Differential Revision: https://reviews.llvm.org/D128286
2022-06-23 08:49:18 -07:00
Baptiste Saleil 79e77a9f39 [AMDGPU] Flush the vmcnt counter in loop preheaders when necessary
waitcnt vmcnt instructions are currently generated in loop bodies before using
values loaded outside of the loop. In some cases, it is better to flush the
vmcnt counter in a loop preheader before entering the loop body. This patch
detects these cases and generates waitcnt instructions to flush the counter.

Reviewed By: foad

Differential Revision: https://reviews.llvm.org/D115747
2022-06-23 10:53:21 -04:00
Nico Weber 851a5efe45 Revert "[fastalloc] Support allocating specific register class in fastalloc"
This reverts commit 719658d078.
Breaks a few things, see comments on https://reviews.llvm.org/D128437
There's disagreement about the best fix.
So let's keep HEAD green while discussions are happening.
2022-06-23 10:44:24 -04:00
Luo, Yuanke 719658d078 [fastalloc] Support allocating specific register class in fastalloc
The base RA support infrastructure that only allow a specific register
class be allocated in RA pss. Since greedy RA, basic RA derived from
base RA, they all allow allocating specific register class. Fast RA
doesn't support allocating register for specific register class. This
patch is to enable ShouldAllocateClass in fast RA, so that it can
support allocating register for specific register class.

Differential Revision: https://reviews.llvm.org/D126771
2022-06-23 14:42:04 +08:00
chenglin.bi 9c2bf534f5 Revert "[SelectionDAG][DAGCombiner] Reuse exist node by reassociate"
This reverts commit 6c951c5ee6.
2022-06-23 13:21:51 +08:00
Matt Arsenault 370aa2f88f InlineSpiller: Don't fold spills into undef reads
This was producing a load into a dead register which was a verifier
error.
2022-06-22 20:47:55 -04:00
Guillaume Chatelet 57ffff6db0 Revert "[NFC] Remove dead code"
This reverts commit 8ba2cbff70.
2022-06-22 14:55:47 +00:00
Guillaume Chatelet 8ba2cbff70 [NFC] Remove dead code 2022-06-22 13:33:58 +00:00
Simon Pilgrim 2c3a4a9334 [DAG] SelectionDAG::GetDemandedBits - don't recurse back into GetDemandedBits
Another minor cleanup as we work toward removing GetDemandedBits entirely - call SimplifyMultipleUseDemandedBits directly.
2022-06-22 13:48:57 +01:00
Simon Pilgrim 1c2b756cd6 [DAG] visitTRUNCATE - move TRUNCATE(ADDE/ADDCARRY) folds to switch statement handling the other binops. NFC. 2022-06-21 22:07:41 +01:00
Simon Pilgrim 8cecb6be56 [DAG] Remove SelectionDAG::GetDemandedBits DemandedElts variant. NFC.
We're slowly removing SelectionDAG::GetDemandedBits and replacing it with SimplifyMultipleUseDemandedBits, we no longer have any uses for the vector demanded elt variant.
2022-06-21 21:23:10 +01:00
Nabeel Omer 4c5f10aeeb Revert rGe6ccb57bb3f6b761f2310e97fd6ca99eff42f73e "[SLP] Add cost model for `llvm.powi.*` intrinsics"
This reverts commit e6ccb57bb3.
2022-06-21 15:05:55 +00:00
Nabeel Omer e6ccb57bb3 [SLP] Add cost model for `llvm.powi.*` intrinsics
This patch adds handling for the llvm.powi.* intrinsics in
BasicTTIImplBase::getIntrinsicInstrCost() and improves vectorization.
Closes #53887.

Differential Revision: https://reviews.llvm.org/D128172
2022-06-21 14:40:34 +00:00
Markus Lavin 3815ae29b5 [machinesink] fix debug invariance issue
Do not include debug instructions when comparing block sizes with
thresholds.

Differential Revision: https://reviews.llvm.org/D127208
2022-06-21 08:13:09 +02:00
Kazu Hirata 7a47ee51a1 [llvm] Don't use Optional::getValue (NFC) 2022-06-20 22:45:45 -07:00
Kazu Hirata d66cbc565a Don't use Optional::hasValue (NFC) 2022-06-20 20:26:05 -07:00
Kazu Hirata 0916d96d12 Don't use Optional::hasValue (NFC) 2022-06-20 20:17:57 -07:00
Kazu Hirata 064a08cd95 Don't use Optional::hasValue (NFC) 2022-06-20 20:05:16 -07:00
chenglin.bi 6c951c5ee6 [SelectionDAG][DAGCombiner] Reuse exist node by reassociate
When already have (op N0, N2), reassociate (op (op N0, N1), N2) to (op (op N0, N2), N1) to reuse the exist (op N0, N2)

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D122539
2022-06-21 09:45:19 +08:00
Luo, Yuanke 44e8a205f4 [fastregalloc] Enhance the heuristics for liveout in self loop.
For below case, virtual register is defined twice in the self loop. We
don't need to spill %0 after the third instruction `%0 = def (tied %0)`,
because it is defined in the second instruction `%0 = def`.

1 bb.1
2 %0 = def
3 %0 = def (tied %0)
4 ...
5 jmp bb.1

Reviewed By: MatzeB

Differential Revision: https://reviews.llvm.org/D125079
2022-06-21 09:18:49 +08:00
Kazu Hirata 5413bf1bac Don't use Optional::hasValue (NFC) 2022-06-20 11:33:56 -07:00
David Green c0ecbfa4fd [AArch64] Known bits for AArch64ISD::DUP
An AArch64ISD::DUP is just a splat, where the known bits for each lane
are the same as the input. This teaches that to computeKnownBitsForTargetNode.

Problems arise for constants though, as a constant BUILD_VECTOR can be
lowered to an AArch64ISD::DUP, which SimplifyDemandedBits would then
turn back into a constant BUILD_VECTOR leading to an infinite cycle.
This has been prevented by adding a isTargetCanonicalConstantNode node
to prevent the conversion back into a BUILD_VECTOR.

Differential Revision: https://reviews.llvm.org/D128144
2022-06-20 19:11:57 +01:00
Kazu Hirata e0e687a615 [llvm] Don't use Optional::hasValue (NFC) 2022-06-20 10:38:12 -07:00
Guillaume Chatelet 03036061c7 [Alignment] Use 'previous()' method instead of scalar division
This is in preparation of integration with D128052.

Differential Revision: https://reviews.llvm.org/D128169
2022-06-20 11:01:43 +00:00
Simon Pilgrim e4a124dda5 [DAG] Fold (srl (shl x, c1), c2) -> and(shl/srl(x, c3), m)
Similar to the existing (shl (srl x, c1), c2) fold

Part of the work to fix the regressions in D77804

Differential Revision: https://reviews.llvm.org/D125836
2022-06-20 08:37:38 +01:00
Lian Wang ab25e263a9 [SelectionDAG] Enable WidenVecOp_VECREDUCE_SEQ for scalable vector
Reviewed By: sdesmalen

Differential Revision: https://reviews.llvm.org/D127710
2022-06-20 06:30:26 +00:00
Craig Topper 314dbde12c [DAGCombiner][ARM][RISCV] Teach ShrinkLoadReplaceStoreWithStore to use truncstore.
The VT we want to shrink to may not be legal especially after type
legalization.

Fixes PR56110.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D128135
2022-06-19 15:50:15 -07:00
Haojian Wu 44582afe48 Fix an unused-variable warning in release build, NFC. 2022-06-19 20:52:00 +02:00
David Green e995e34469 [MachinePipeliner] Handle failing constrainRegClass
The included test hits a verifier problems as one of the instructions:
```
%113:tgpreven, %114:tgprodd = MVE_VMLSLDAVas16 %12:tgpreven(tied-def 0), %11:tgprodd(tied-def 1), %7:mqpr, %8:mqpr, 0, $noreg, $noreg
```
Has two inputs that come from different PHIs with the same base reg, but
conflicting regclasses:
```
%11:tgprodd = PHI %103:gpr, %bb.1, %16:gpr, %bb.2
%12:tgpreven = PHI %103:gpr, %bb.1, %17:gpr, %bb.2
```

The MachinePipeliner would attempt to use %103 for both the %11 and %12
operands in the prolog, constraining the register class to the common
subset of both. Unfortunately there are no registers that are both odd
and even, so the second constrainRegClass fails. Fix this situation by
inserting a COPY for the second if the call to constrainRegClass fails.

The register allocation can then fold that extra copy away. The register
allocation of Q regs changed with this test, but the R regs were the
same and no new instructions are needed in the final assembly.

Differential Revision: https://reviews.llvm.org/D127971
2022-06-19 18:55:19 +01:00
Simon Pilgrim ba3f2667b6 [DAG] Add MaskedVectorIsZero helper
Equivalent to MaskedValueIsZero, except its checking if all of the demanded vectors elements are known to be zero
2022-06-19 17:56:30 +01:00
Simon Pilgrim 1ebe5cac46 [DAG] SimplifyDemandedBits - add DemandedElts handling to ISD::SIGN_EXTEND_INREG simplification 2022-06-19 15:35:29 +01:00
Simon Pilgrim db1be696c4 [DAG] SimplifyDemandedBits - add ISD::VSELECT handling 2022-06-19 15:18:25 +01:00
Kazu Hirata 129b531c9c [llvm] Use value_or instead of getValueOr (NFC) 2022-06-18 23:07:11 -07:00
Kazu Hirata 3cbe0bc4a1 [CodeGen] Use default member initialization (NFC)
Identified with modernize-use-default-member-init.
2022-06-18 12:01:34 -07:00
Kazu Hirata 4271a1ff33 [llvm] Call *set::insert without checking membership first (NFC) 2022-06-18 10:17:22 -07:00
Kazu Hirata b254d67160 [llvm] Call *set::insert without checking membership first (NFC) 2022-06-18 08:32:54 -07:00
Han-Kuan Chen e29133629b [MachineCopyPropagation][RISCV] Fix D125335 accidentally change control flow.
D125335 makes regsOverlap skip following control flow, which is not entended
in the original code.

Differential Revision: https://reviews.llvm.org/D128039
2022-06-17 21:40:08 -07:00
Vitaly Buka f0ca0a324f [CodeGen] Init EmptyExpr before the first use 2022-06-17 17:40:06 -07:00
Guillaume Chatelet 90f96ec7a5 [NFC][Alignment] Remove assumeAligned from MachineFrameInfo ctor 2022-06-17 15:21:17 +00:00
Paul Walker 0e21f1d56a [SelectionDAG] Extend WidenVecOp_INSERT_SUBVECTOR to cover more cases.
WidenVecOp_INSERT_SUBVECTOR only supported cases where widening
effectively converts the insert into a copy.  However, when the
widened subvector is no bigger than the vector being inserted into
and we can be sure there's no loss of data, we can simply emit
another INSERT_SUBVECTOR.

Fixes: #54982

Differential Revision: https://reviews.llvm.org/D127508
2022-06-17 12:39:42 +00:00
Mingming Liu 1e67385d28 [MachineBlockPlacementStats] Added check for "-filter-print-funcs"
option to the machine-block-placement-stats.

Differential Revision: https://reviews.llvm.org/D128019
2022-06-16 21:59:54 -07:00
Mingming Liu b7d09557f6 Revert "[MachineBlockPlacementStats] Add check for `-filter-print-funcs` option to machine-block-placement stats."
This reverts commit 46d45df451. Going to
add differential revision link to commit message and re-commit.
2022-06-16 21:56:08 -07:00
Mingming Liu 46d45df451 [MachineBlockPlacementStats] Add check for `-filter-print-funcs` option to machine-block-placement stats. 2022-06-16 21:48:08 -07:00
Lian Wang f2bcf33058 [LegalizeTypes][NFC] Merge promote SPLAT_VECTOR and promote SCALAR_TO_VECTOR to one function
Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D127825
2022-06-17 02:43:52 +00:00
Lian Wang 16215eb979 [LegalizeTypes][RISCV][NFC] Modify assert in PromoteIntRes_STEP_VECTOR and add some tests for RISCV
Reviewed By: david-arm

Differential Revision: https://reviews.llvm.org/D127939
2022-06-17 02:26:09 +00:00
Craig Topper e6c7a3a54f [SelectionDAG] Don't apply MinRCSize constraint in InstrEmitter::AddRegisterOperand for IMPLICIT_DEF sources.
MinRCSize is 4 and prevents constrainRegClass from changing the
register class if the new class has size less than 4.

IMPLICIT_DEF gets a unique vreg for each use and will be removed
by the ProcessImplicitDef pass before register allocation. I don't
think there is any reason to prevent constraining the virtual register
to whatever register class the use needs.

The attached test case was previously creating a copy of IMPLICIT_DEF
because vrm8nov0 has 3 registers in it.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D128005
2022-06-16 14:55:14 -07:00
Adrian Tong 55311801f0 Allow bitwidth difference when checking for isOneOrOneSplat.
This helps handling a case where the BUILD_VECTOR has i16 element type
and i32 constant operands

t2: v8i16 = setcc t8, t17, setult:ch
t3: v8i16 = BUILD_VECTOR Constant:i32<1>, ...
   t4: v8i16 = and t2, t3
      t5: v8i16 = add t8, t4

This can be turned into t5: v8i16 = sub t8, t2, and allows us to remove
t3 and t4 from the DAG.

Differential Revision: https://reviews.llvm.org/D127354
2022-06-16 16:04:20 +00:00
Kito Cheng e9f7263b38 Reland "[SplitKit] Handle early clobber + tied to def correctly"
This reverts commit 7207373e1e.

We found another RISC-V bug when landing D126048, and it has been fixed
by D127642 now.

Differential Revision: https://reviews.llvm.org/D126048
2022-06-16 17:13:09 +08:00
Craig Topper 3aa6ec619f [ValueTypes] Add types for nxv16bf16 and nxv32bf16.
This is needed by our downstream and makes bf16 and f16 have the
same set of scalable vector types.

Reviewed By: rui.zhang

Differential Revision: https://reviews.llvm.org/D127877
2022-06-15 23:00:53 -07:00
Benjamin Kramer 8c4a07c61f [DAGCombiner] Fold fold (fp_to_bf16 (bf16_to_fp op)) -> op 2022-06-15 19:54:39 +02:00
Benjamin Kramer ca50cb120b [SelectionDAG] Constant fold FP_TO_BF16 and BF16_TO_FP. 2022-06-15 18:51:32 +02:00
Paul Robinson ac2ad3b7bb [PS5] Support sin+cos->sincos optimization 2022-06-15 09:36:05 -07:00
Paul Robinson 654a835c3f [PS5] Trap after noreturn calls, with special case for stack-check-fail 2022-06-15 09:02:17 -07:00
Luo, Yuanke 16547f9fbb [CodeGen] Fix the bug of machine sink
The use operand may be undefined. In that case we can just continue to
check the next operand since it won't increase register pressure.

Differential Revision: https://reviews.llvm.org/D127848
2022-06-15 23:35:52 +08:00
Benjamin Kramer 8bc0bb9564 Add a conversion from double to bf16
This introduces a new compiler-rt function `__truncdfbf2`.
2022-06-15 12:56:31 +02:00
Benjamin Kramer fb34d531af Promote bf16 to f32 when the target doesn't support it
This is modeled after the half-precision fp support. Two new nodes are
introduced for casting from and to bf16. Since casting from bf16 is a
simple operation I opted to always directly lower it to integer
arithmetic. The other way round is more complicated if you want to
preserve IEEE semantics, so it's handled by a new __truncsfbf2
compiler-rt builtin.

This is of course very bare bones, but sufficient to get a semi-softened
fadd on x86.

Possible future improvements:
 - Targets with bf16 conversion instructions can now make fp_to_bf16 legal
 - The software conversion to bf16 can be replaced by a trivial
   implementation under fast math.

Differential Revision: https://reviews.llvm.org/D126953
2022-06-15 12:56:31 +02:00
Keith Walker 94fac097ad [DebugInfo][ARM] Not readonly check for RWPI globals
When compiling for the RWPI relocation model [1], the debug information
is wrong for readonly global variables.

Writable global variables are accessed by the static base register (R9
on ARM) in the RWPI relocation model.  This is being correctly generated

Readonly global variables are not accessed by the static base register
in the RWPI relocation model. This case is incorrectly generating the
same debugging information as for writable global variables.

References:
[1] ARM Read-Write Position Independence: https://github.com/ARM-software/abi-aa/blob/main/aapcs32/aapcs32.rst#read-write-position-independence-rwpi

Differential Revision: https://reviews.llvm.org/D126361
2022-06-15 11:52:12 +01:00
Simon Pilgrim f096d5926d [DAG] Fix SDLoc mismatch in (shl (srl x, c1), c2) -> and(shift(x,c3)) fold
Noticed by @craig.topper on D125836 which uses a tweaked copy of the same code.

Differential Revision: https://reviews.llvm.org/D127772
2022-06-15 11:07:59 +01:00
Ping Deng c06f77ec0d [SelectionDAG] fold 'Op0 - (X * MulC)' to 'Op0 + (X << log2(-MulC))'
Reviewed By: craig.topper, spatel

Differential Revision: https://reviews.llvm.org/D127474
2022-06-15 05:50:18 +00:00
Paul Robinson 0ce33c2941 [PS5] Default to 'sce' debugger tuning 2022-06-14 15:28:28 -07:00
Serguei Katkov d713f0eab8 Revert "[MachineSSAUpdater] compile time improvement in GetValueInMiddleOfBlock"
It looks like it causes buildbot failures.
As an example: https://lab.llvm.org/buildbot/#/builders/121/builds/20364

Revert to investigate...

This reverts commit 6bf2791814.
2022-06-14 20:27:21 +07:00
Florian Hahn e5c4308ba1
[InterleavedLoadComb] Rename uses when inserting new uses.
This fixes a crash due to uses needing to be renamed.
2022-06-14 13:15:23 +01:00
Serguei Katkov 6bf2791814 [MachineSSAUpdater] compile time improvement in GetValueInMiddleOfBlock
GetValueInMiddleOfBlock uses result of GetValueAtEndOfBlockInternal if there is no value
defined for current basic block.

If there is already a value it tries (in this order):

to find single register coming from all predecessors
find existing phi node which matches our incoming registers
build new phi.
The compile time improvement is to use current available value if
it is defined out of current BB or it is a PHI register.
This is due to it can be used in the middle basic block.

Reviewed By: sameerds
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D126523
2022-06-14 18:00:34 +07:00
Guillaume Chatelet c0e85f1c3b [NFC][Alignment] Use Align in SafeStack 2022-06-14 10:56:36 +00:00
Guillaume Chatelet 6725d80640 [NFC][Alignment] Use Align in shouldAlignPointerArgs 2022-06-14 10:56:36 +00:00
Denis Antrushin c0e965e222 [Statepoints] FixupStatepoint: Clear isKill flag if COPY is not deleted.
When spilling CSRs, FixupStatepoint pass does simple copy propagation,
trying to find COPY instruction which defines register being spilled
and spill COPY source instead. I.e., if we have CSR $x and found
  $x = COPY $y
we will spill $y instead.
But we may be unable to delete COPY instruction for some reason.
Then, spill will be inserted after it, adding another use of $y.
If COPY instruction was last use of $y (killed it), after insertion of
the spill it is not, so `isKill` flag must be cleared. We failed to do
so and this patch fixes this issue.

Reviewed By: skatkov

Differential Revision: https://reviews.llvm.org/D127308
2022-06-14 10:52:32 +03:00
Kazu Hirata a2232da2a5 [CodeGen] Remove addSEHCatchHandler and addSEHCleanupHandler (NFC)
The last uses of these functions are removed on Oct 9, 2015 in commit
14e773500e.
2022-06-13 23:08:49 -07:00
Kazu Hirata 34ff78c5cf [CodeGen] Remove restrictRef (NFC)
The last use was removed on Apr 14, 2017 in commit
4fe9d6c640.
2022-06-13 23:08:48 -07:00
Serguei Katkov 095bf6be28 [Greedy RegAlloc] Fix the handling of split register in last chance re-coloring.
This is a fix for https://github.com/llvm/llvm-project/issues/55827.

When register we are trying to re-color is split the original register (we tried to recover)
has no uses after the split. However in rollback actions we assign back physical register to it.
Later it causes different assertions. One of them is in attached test.

This CL fixes this by avoiding assigning physical register back to register which has no usage
or its live interval now is empty.

Reviewed By: arsenm, qcolombet
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D127281
2022-06-14 12:04:17 +07:00
Kazu Hirata 145cc9db2b [CodeGen] Remove futureWeight (NFC)
The last use was removed on Jun 5, 2022 in
commit 5c06f7168f, which itself was
a patch to remove unused code.
2022-06-13 17:10:23 -07:00
Guillaume Chatelet 5a293d21fc [NFC][Alignment] Use getAlign in SelectionDAGBuilder 2022-06-13 15:13:05 +00:00
Kazu Hirata 23d9ca10ae [CodeGen] Remove EvictionTrack (NFC)
The last of getEvictor use was removed on Jun 5, 2022 in commit
5c06f7168f, which was itself a patch to
remove unused code.

Once we remove getEvictor, EvictionTrack becomes a write-only data
structure.  The data in it won't affect compilation, so the entire
class is essentially dead.
2022-06-13 07:21:29 -07:00
Kazu Hirata 246e83e973 [GlobalISel] Remove buildSequence (NFC)
The last use was removed on Jun 27, 2019 in commit
8138996128.
2022-06-13 06:58:36 -07:00
Nikita Popov b9a7dea917 [SelectionDAG] Handle trapping aggregate (PR49839)
Call canTrap() on Constant to account for trapping
ConstantAggregate.
2022-06-13 15:06:53 +02:00
Simon Pilgrim 7d8fd4f5db [DAG] visitINSERT_VECTOR_ELT - attempt to reconstruct BUILD_VECTOR before other fold interfere
Another issue unearthed by D127115

We take a long time to canonicalize an insert_vector_elt chain before being able to convert it into a build_vector - even if they are already in ascending insertion order, we fold the nodes one at a time into the build_vector 'seed', leaving plenty of time for other folds to alter it (in particular recognising when they come from extract_vector_elt resulting in a shuffle_vector that is much harder to fold with).

D127115 makes this particularly difficult as we're almost guaranteed to have the lost the sequence before all possible insertions have been folded.

This patch proposes to begin at the last insertion and attempt to collect all the (oneuse) insertions right away and create the build_vector before its too late.

Differential Revision: https://reviews.llvm.org/D127595
2022-06-13 11:48:18 +01:00
Jez Ng d4bcb45db7 [MC][re-land] Omit DWARF unwind info if compact unwind is present where eligible
This reverts commit d941d59783.

Differential Revision: https://reviews.llvm.org/D122258
2022-06-12 17:24:19 -04:00
Simon Pilgrim 1cf9b24da3 [DAG] Enable ISD::FSHL/R SimplifyMultipleUseDemandedBits handling inside SimplifyDemandedBits
This patch allows SimplifyDemandedBits to call SimplifyMultipleUseDemandedBits in cases where the source operand has other uses, enabling us to peek through the shifted value if we don't demand all the bits/elts.

This helps with several of the regressions from D125836
2022-06-12 19:25:20 +01:00
Jez Ng d941d59783 Revert "[MC] Omit DWARF unwind info if compact unwind is present where eligible"
This reverts commit ef501bf85d.
2022-06-12 10:47:08 -04:00
Jez Ng ef501bf85d [MC] Omit DWARF unwind info if compact unwind is present where eligible
Previously, omitting unnecessary DWARF unwinds was only done in two
cases:
* For Darwin + aarch64, if no DWARF unwind info is needed for all the
  functions in a TU, then the `__eh_frame` section would be omitted
  entirely. If any one function needed DWARF unwind, then MC would emit
  DWARF unwind entries for all the functions in the TU.
* For watchOS, MC would omit DWARF unwind on a per-function basis, as
  long as compact unwind was available for that function.

This diff makes it so that we omit DWARF unwind on a per-function basis
for Darwin + aarch64 as well. In addition, we introduce the flag
`--emit-dwarf-unwind=` which can toggle between `always`,
`no-compact-unwind` (only emit DWARF when CU cannot be emitted for a
given function), and the target platform `default`.  `no-compact-unwind`
is particularly useful for newer x86_64 platforms: we don't want to omit
DWARF unwind for x86_64 in general due to possible backwards compat
issues, but we should make it possible for people to opt into this
behavior if they are only targeting newer platforms.

**Motivation:** I'm working on adding support for `__eh_frame` to LLD,
but I'm concerned that we would suffer a perf hit. Processing compact
unwind is already expensive, and that's a simpler format than EH frames.
Given that MC currently produces one EH frame entry for every compact
unwind entry, I don't think processing them will be cheap. I tried to do
something clever on LLD's end to drop the unnecessary EH frames at parse
time, but this made the code significantly more complex. So I'm looking
at fixing this at the MC level instead.

**Addendum:** It turns out that there was a latent bug in the X86
backend when `OmitDwarfIfHaveCompactUnwind` is naively enabled, which is
not too surprising given that this combination has not been heretofore
used.

For functions that have unwind info that cannot be encoded with CU, MC
would end up dropping both the compact unwind entry (OK; existing
behavior) as well as the DWARF entries (not OK).  This diff fixes things
so that we emit the DWARF entry, as well as a CU entry with encoding
`UNWIND_X86_MODE_DWARF` -- this basically tells the unwinder to look for
the DWARF entry. I'm not 100% sure the `UNWIND_X86_MODE_DWARF` CU entry
is necessary, this was the simplest fix. ld64 seems to be able to handle
both the absence and presence of this CU entry. Ultimately ld64 (and
LLD) will synthesize `UNWIND_X86_MODE_DWARF` if it is absent, so there
is no impact to the final binary size.

Reviewed By: davide, lhames

Differential Revision: https://reviews.llvm.org/D122258
2022-06-12 10:03:56 -04:00
Simon Pilgrim 54ae4ca755 [DAG] visitSRL - pull out ShiftVT. NFC. 2022-06-12 14:02:23 +01:00
Simon Pilgrim cf5c63d187 [DAG] visitVECTOR_SHUFFLE - fold splat(insert_vector_elt()) and splat(scalar_to_vector()) to build_vector splats
Addresses a number of regressions identified in D127115
2022-06-11 21:06:42 +01:00
Simon Pilgrim 44a0cd25df [DAG] visitINSERT_VECTOR_ELT - add <1 x ???> insert_vector_elt(v0,extract_vector_elt(v1,0),0) special case handling
Check if we're just replacing one v1x?? vector with another
2022-06-11 19:30:00 +01:00
Simon Pilgrim a71ad6a3c8 [DAG] visitINSERT_VECTOR_ELT - fold insert_vector_elt(scalar_to_vector(x),v,i) -> build_vector()
Allow scalar_to_vector nodes to be used for the start of a build_vector creation
2022-06-11 15:29:22 +01:00
Simon Pilgrim 693f4db1ec [DAG] visitINSERT_VECTOR_ELT - refactor BUILD_VECTOR insertion to remove early-out. NFCI.
Remove the early-out cases so we can more easily add additional folds in the future.
2022-06-11 12:01:13 +01:00
Paul Walker 10d55c4634 [SelectionDAG] Remove invalid TypeSize conversion from WidenVecOp_BITCAST.
Differential Revision: https://reviews.llvm.org/D127322
2022-06-11 10:41:13 +01:00
Kazu Hirata a98965d92f [CodeGen] Use llvm::erase_value (NFC) 2022-06-10 22:59:48 -07:00
Fangrui Song adf4142f76 [MC] De-capitalize SwitchSection. NFC
Add SwitchSection to return switchSection. The API will be removed soon.
2022-06-10 22:50:55 -07:00
Eli Friedman 0ff51d5dde Fix interaction of CFI instructions with MachineOutliner.
1. When checking if a candidate contains a CFI instruction, actually
iterate over all of the instructions, instead of stopping halfway
through.
2. Make sure copied CFI directives refer to the correct instruction.

Fixes https://github.com/llvm/llvm-project/issues/55842

Differential Revision: https://reviews.llvm.org/D126930
2022-06-10 13:37:49 -07:00
Guillaume Chatelet 95083fa3b8 [NFC] Remove deadcode 2022-06-10 15:13:42 +00:00
Simon Pilgrim 91adbc3208 [DAG] SimplifyDemandedVectorElts - adding SimplifyMultipleUseDemandedVectorElts handling to ISD::CONCAT_VECTORS
Attempt to look through multiple use operands of ISD::CONCAT_VECTORS nodes

Another minor improvement for D127115
2022-06-10 16:06:43 +01:00
Guillaume Chatelet 38637ee477 [clang] Add support for __builtin_memset_inline
In the same spirit as D73543 and in reply to https://reviews.llvm.org/D126768#3549920 this patch is adding support for `__builtin_memset_inline`.

The idea is to get support from the compiler to easily write efficient memory function implementations.

This patch could be split in two:
 - one for the LLVM part adding the `llvm.memset.inline.*` intrinsics.
 - and another one for the Clang part providing the instrinsic as a builtin.

Differential Revision: https://reviews.llvm.org/D126903
2022-06-10 13:13:59 +00:00
Nikita Popov c10921fa1a [CGP] Also freeze ctlz/cttz operand when despeculating
D125887 changed the ctlz/cttz despeculation transform to insert
a freeze for the introduced branch on zero. While this does fix
the "branch on poison" issue, we may still get in trouble if we
pick a different value for the branch and for the ctz argument
(i.e. non-zero for the branch, but zero for the ctz). To avoid
this, we should use the same frozen value in both positions.

This does cause a regression in RISCV codegen by introducing an
additional sext. The DAG looks like this:

    t0: ch = EntryToken
        t2: i64,ch = CopyFromReg t0, Register:i64 %3
      t4: i64 = AssertSext t2, ValueType:ch:i32
    t23: i64 = freeze t4
          t9: ch = CopyToReg t0, Register:i64 %0, t23
          t16: ch = CopyToReg t0, Register:i64 %4, Constant:i64<32>
        t18: ch = TokenFactor t9, t16
            t25: i64 = sign_extend_inreg t23, ValueType:ch:i32
          t24: i64 = setcc t25, Constant:i64<0>, seteq:ch
        t28: i64 = and t24, Constant:i64<1>
      t19: ch = brcond t18, t28, BasicBlock:ch<cond.end 0x8311f68>
    t21: ch = br t19, BasicBlock:ch<cond.false 0x8311e80>

I don't see a really obvious way to improve this, as we can't push
the freeze past the AssertSext (which may produce poison).

Differential Revision: https://reviews.llvm.org/D126638
2022-06-10 09:46:10 +02:00
Simon Moll b8c2781ff6 [NFC] format InstructionSimplify & lowerCaseFunctionNames
Clang-format InstructionSimplify and convert all "FunctionName"s to
"functionName".  This patch does touch a lot of files but gets done with
the cleanup of InstructionSimplify in one commit.

This is the alternative to the less invasive clang-format only patch: D126783

Reviewed By: spatel, rengolin

Differential Revision: https://reviews.llvm.org/D126889
2022-06-09 16:10:08 +02:00
Simon Pilgrim 7dbfcfa735 [DAG] combineInsertEltToShuffle - if EXTRACT_VECTOR_ELT fails to match an existing shuffle op, try to replace an undef op if there is one.
This should fix a number of shuffle regressions in D127115 where the re-ordered combines mean we fail to fold a EXTRACT_VECTOR_ELT/INSERT_VECTOR_ELT sequence into a BUILD_VECTOR if we extract from more than one vector source.
2022-06-09 14:56:14 +01:00
Guillaume Chatelet dc3367970e [SelectionDAG] Handle bzero/memset libcalls globally instead of per target
Differential Revision: https://reviews.llvm.org/D127279
2022-06-09 08:34:55 +00:00
Craig Topper 4bcfc41846 [SelectionDAG] Teach computeKnownBits that a nsw self multiply produce a positive value.
This matches what we do in IR. For the RISC-V test case, this allows
us to use -8 for the AND mask instead of materializing a constant in a register.

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D127335
2022-06-08 14:55:58 -07:00
Kai Nacke d897a14c2e [SystemZ] Fix check for zero size when lowering memcmp.
During lowering of memcmp/bcmp, the check for a size of 0 is done
in 2 different ways. In rare cases this can lead to a crash in
SystemZSelectionDAGInfo::EmitTargetCodeForMemcmp(). The root cause
is that SelectionDAGBuilder::visitMemCmpBCmpCall() checks for a
constant int value which is not yet evaluated. When the value is
turned into a SDValue, then the evaluation is done and results in
a ConstantSDNode. But EmitTargetCodeForMemcmp() expects the special
case of 0 length to be handled, which results in an assertion.

The fix is to turn the value into a SDValue, so that both functions
use the same check.

Reviewed By: uweigand

Differential Revision: https://reviews.llvm.org/D126900
2022-06-08 14:52:13 -04:00
Simon Pilgrim b84c10d4bc [DAG] visitVSELECT - don't wait for truncation of sub before attempting to match with getTruncatedUSUBSAT
Fixes some X86 PSUBUS regressions encountered in D127115 where the truncate was being replaced with a PACKSS/PACKUS before the fold got called again
2022-06-08 16:16:35 +01:00
Joseph Huber 9e0dbd2a2a [Target] Remove `startswith` for adding `SHF_EXCLUDE` to offload section
Summary:
We use the special section name `.llvm.offloading` to store device
imagees in the host object file. We want these to be stripped by the
linker as they are not used after linking so we use the `SHF_EXCLUDE`
flag to instruct the linker to drop them. We used to do this for all
sections that started with `.llvm.offloading` when we encoded metadata
in the section name itself. Now we embed a special binary containing the
metadata, we should only add the flag on this name specifically.
2022-06-08 09:56:51 -04:00
Paul Walker d88354213c [SelectionDAG] Remove invalid TypeSize conversion from PromoteIntRes_BITCAST.
Extend the TypeWidenVector case of PromoteIntRes_BITCAST to work
with TypeSize directly rather than silently casting to unsigned.

To accomplish this I've extended TypeSize with an interface that
essentially allows TypeSize division when both operands have the
same number of dimensions.

There still exists combinations of scalable vector bitcasts that
cause compiler crashes. I call these out by adding "is missing"
entries to sve-bitcast.

Depends on D126957.
Fixes: #55114

Differential Revision: https://reviews.llvm.org/D127126
2022-06-08 10:30:07 +01:00
Paul Walker a1121c31d8 [SVE] Fix incorrect code generation for bitcasts of unpacked vector types.
Bitcasting between unpacked scalable vector types of different
element counts is not a NOP because the live elements are laid out
differently.
               01234567
e.g. nxv2i32 = XX??XX??
     nxv4f16 = X?X?X?X?

Differential Revision: https://reviews.llvm.org/D126957
2022-06-08 10:30:07 +01:00
Chuanqi Xu 0e10f12844 [NFC] Remove commented cerr debugging loggings
There are some unused cerr debugging loggings in the codes. It is weird
to remain such commented debug helpers in the product.
2022-06-08 15:58:06 +08:00
Kito Cheng 7207373e1e Revert "[SplitKit] Handle early clobber + tied to def correctly"
Revert due to failed on LLVM_ENABLE_EXPENSIVE_CHECKS.

This reverts commit e14d04909d.
2022-06-08 13:05:35 +08:00
Kito Cheng e14d04909d [SplitKit] Handle early clobber + tied to def correctly
Spliter will try to extend a live range into `r` slot for a use operand,
that's works on most situaion, however that not work correctly when the operand
has tied to def, and the def operand is early clobber.

Give an example to demo what's wrong:
  0  %0 = ...
 16  early-clobber %0 = Op %0 (tied-def 0), ...
 32  ... = Op %0

Before extend:
 %0 = [0r, 0d) [16e, 32d)

The point we want to extend is 0d to 16e not 16r in this case, but if
we use 16r here we will extend nothing because that already contained
in [16e, 32d).

This patch add check for detect such case and adjust the extend point.

Detailed explanation for testcase: https://reviews.llvm.org/D126047

Reviewed By: MatzeB

Differential Revision: https://reviews.llvm.org/D126048
2022-06-08 11:33:05 +08:00
David Penry 907aedbb3d [NFC] Fix spelling/newlines in comments/debug messages
Just a few spelling mistakes and missing newlines

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D127162
2022-06-07 09:38:53 -07:00
Simon Pilgrim a083f3caa1 [DAG] combineShuffleOfSplatVal - fold shuffle(splat,undef) -> splat, iff the splat contains no UNDEF elements
As noticed on D127115 - we were missing this fold, instead just having the shuffle(shuffle(x,undef,splatmask),undef) fold. We should be able to merge these into one using SelectionDAG::isSplatValue, but we'll need to match the shuffle's undef handling first.

This also exposed an issue in SelectionDAG::isSplatValue which was incorrectly propagating the undef mask across a bitcast (it was trying to just bail with a APInt::isSubsetOf if it found any undefs but that was actually the wrong way around so didn't fire for partial undef cases).
2022-06-07 16:42:24 +01:00
Matt Arsenault 56303223ac llvm-reduce: Don't assert on functions which don't track liveness
Use the query that doesn't assert if TracksLiveness isn't set, which
needs to always be available. We also need to start printing liveins
regardless of TracksLiveness.
2022-06-07 10:00:25 -04:00
Guillaume Chatelet 0788186182 [Alignment][NFC] Remove usage of MemSDNode::getAlignment
I can't remove the function just yet as it is used in the generated .inc files.
I would also like to provide a way to compare alignment with TypeSize since it came up a few times.

Differential Revision: https://reviews.llvm.org/D126910
2022-06-07 13:52:20 +00:00
Nikita Popov 5a64bc207e [DAGCombiner] Remove overzealous assertion when folding assert+trunc+assert (PR55846)
These assert that there are no "useless" assertzext/assertsext nodes
(that assert a wider width than a following trunc), but I don't think
there is anything preventing such nodes from reaching this code.
I don't think the assertion is relevant for correctness of this
transform either -- if such an assert is present, then the other
one will always be to a smaller width, and we'll pick that one.
The assertion dates back to D37017.

Fixes https://github.com/llvm/llvm-project/issues/55846.

Differential Revision: https://reviews.llvm.org/D126952
2022-06-07 09:50:26 +02:00
Fangrui Song 15d82c62dc [MC] De-capitalize MCStreamer functions
Follow-up to c031378ce0 .
The class is mostly consistent now.
2022-06-07 00:31:02 -07:00
Hendrik Greving a43d25734a [ModuloSchedule] Fix terminator update when peeling.
Fixes a bug of us not correctly updating the terminator of the loop's
preheader, if multiple terminating branch instructions are present.

This is tested through existing tests. The bug itself is hard or not
possible to get exposed with the upstream Hexagon backend, because
the machine pipeliner checks for an existing preheader, which is
defined as a block with only 1 edge into the header.

The condition of this bug is a block into the loop with more than 1
edge, and not every downstream target checks for an existing preheader.

Differential Revision: https://reviews.llvm.org/D126386
2022-06-06 19:52:28 +00:00
Michael Kitzan b7fcf6632f [GISel] Add new combines for G_ADD
Patch adds new GICombineRules for G_ADD:

G_ADD(x, G_SUB(y, x)) -> y
G_ADD(G_SUB(y, x), x) -> y

Patch additionally adds new combine tests for AArch64 target for
these new rules.

Reviewed by: paquette

Differential Revision: https://reviews.llvm.org/D87936
2022-06-06 11:19:45 -07:00
Craig Topper be398100ea [SelectionDAG] Further improve computeKnownBits for (smax X, C) where C is non-negative.
Move the code that was added for D126896 after the normal recursive calls
to computeKnownBits. This allows us to calculate trailing zeros.
Previously we would break out of the switch before the recursive calls.
2022-06-06 09:59:23 -07:00
Kazu Hirata 5c06f7168f [CodeGen] Remove splitCanCauseEvictionChain and its helpers (NFC)
The last use was removed on Mar 7, 2022 in commit
294eca35a0.
2022-06-05 20:22:47 -07:00
Kazu Hirata 43d4585e64 [GlobalISel] Remove widenWithUnmerge (NFC)
The last use was removed on Dec 23, 2021 in commit
29f88b93fd.
2022-06-05 19:58:18 -07:00
Kazu Hirata 61abcb0b37 [GlobalISel] Remove valueIsSplit (NFC)
The last use was removed on Jun 27, 2019 in commit
8138996128.
2022-06-05 19:51:03 -07:00
Lian Wang 20cf77f776 [LegalizeTypes][VP] Add widen and split support for vp.fptrunc and vp.fpext
Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D126439
2022-06-06 02:28:01 +00:00
Kazu Hirata 3b9707dbc0 [llvm] Convert for_each to range-based for loops (NFC) 2022-06-05 12:07:14 -07:00
Alexey Lapshin 501d5b24db [Debuginfo][DWARF][NFC] Refactor DwarfStringPoolEntryRef - remove isIndexed().
This patch is extraction from the https://reviews.llvm.org/D126883.
It removes DwarfStringPoolEntryRef::isIndexed() and isIndexed bit
since they are not used.

Differential Revision: https://reviews.llvm.org/D126958
2022-06-05 21:18:31 +03:00
Fangrui Song 95a134254a Remove unneeded cl::ZeroOrMore for cl::opt/cl::list options 2022-06-05 01:07:51 -07:00
Fangrui Song d86a206f06 Remove unneeded cl::ZeroOrMore for cl::opt/cl::list options 2022-06-05 00:31:44 -07:00
Kazu Hirata bcf4fa458a [CodeGen] Use a range-based for loop (NFC) 2022-06-04 22:26:55 -07:00
Kazu Hirata 4969a6924d Use llvm::less_first (NFC) 2022-06-04 21:23:18 -07:00
Kazu Hirata 32ce076d78 [CodeGen] Use StringRef::contains (NFC) 2022-06-04 20:58:58 -07:00
Fangrui Song 36c7d79dc4 Remove unneeded cl::ZeroOrMore for cl::opt options
Similar to 557efc9a8b.
This commit handles options where cl::ZeroOrMore is more than one line below
cl::opt.
2022-06-04 00:10:42 -07:00
Fangrui Song 557efc9a8b [llvm] Remove unneeded cl::ZeroOrMore for cl::opt options. NFC
Some cl::ZeroOrMore were added to avoid the `may only occur zero or one times!`
error. More were added due to cargo cult. Since the error has been removed,
cl::ZeroOrMore is unneeded.

Also remove cl::init(false) while touching the lines.
2022-06-03 21:59:05 -07:00
Benjamin Kramer e8e4b741dd [DAGCombiner] Add bf16 to the matrix of types that we don't promote to integer stores
Remove a few stray semicolons while there.
2022-06-03 13:28:34 +02:00
Nikita Popov ad742cf85d [DAGCombine] Handle promotion of shift with both operands the same
When promoting a shift, make sure we only fetch the second operand
after promoting the first. Load promotion may replace users of the
old load, and we don't want to be left with a dangling reference to
the old load instruction.

The crashing test case is from https://reviews.llvm.org/D126689#3553212.

Differential Revision: https://reviews.llvm.org/D126886
2022-06-03 10:00:44 +02:00
Craig Topper fa20bf1636 [DAGCombiner][RISCV] Improve computeKnownBits for (smax X, C) where C is non-negative.
If C is non-negative, the result of the smax must also be
non-negative, so all sign bits of the result are 0.

This allows DAGCombiner to remove a zext_inreg in the modified test.
This zext_inreg started as a sext that became zext before type
legalization then was promoted to a zext_inreg.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D126896
2022-06-02 12:34:24 -07:00
jacquesguan 5482ae6328 [LegalizeTypes][VP] Add widen and split support for VP FP integer casting op.
This patch adds widen and split support for VP_FPTOSI, VP_FPTOUI, VP_SITOFP and VP_UITOFP.

Differential Revision: https://reviews.llvm.org/D126847
2022-06-02 09:05:27 +00:00
jacquesguan 058791d8f2 [LegalizeTypes][VP] Add widen and split support for VP_SIGN_EXTEND and VP_ZERO_EXTEND.
Differential Revision: https://reviews.llvm.org/D126442
2022-06-02 02:21:22 +00:00
Matt Arsenault 4cb722acbc BranchFolder: Require NoPHIs
The pass doesn't handle SSA and breaks any phis.
2022-06-01 21:14:49 -04:00
Hendrik Greving a92ed167f2 [ValueTypes] Define MVTs for v128i2/v64i4 as well as i2 and i4.
Adds MVT::v128i2, MVT::v64i4, and implied MVT::i2, MVT::i4.

Keeps MVT::i2, MVT::i4 lowering actions as expand, which should be
removed once targets set this explicitly.

Adjusts 11 lit tests to reflect slightly different behavior during
DAG combine.

Differential Revision: https://reviews.llvm.org/D125247
2022-06-02 00:49:11 +00:00
Quentin Colombet 1a155ee7de [RegisterClassInfo] Invalidate cached information if ignoreCSRForAllocationOrder changes
Even if CSR list is same between functions, we could have had a different
allocation order if ignoreCSRForAllocationOrder is evaluated differently.
Hence invalidate cached register class information if
ignoreCSRForAllocationOrder changes.

Patch by Srividya Karumuri <srividya_karumuri@apple.com>

Differential Revision: https://reviews.llvm.org/D126565
2022-06-01 17:15:51 -07:00
Hendrik Greving e9d05cc7d8 Revert "[ValueTypes] Define MVTs for v128i2/v64i4 as well as i2 and i4."
This reverts commit 430ac5c302.

Due to failures in Clang tests.

Differential Revision: https://reviews.llvm.org/D125247
2022-06-01 13:27:49 -07:00
Hendrik Greving 430ac5c302 [ValueTypes] Define MVTs for v128i2/v64i4 as well as i2 and i4.
Adds MVT::v128i2, MVT::v64i4, and implied MVT::i2, MVT::i4.

Keeps MVT::i2, MVT::i4 lowering actions as `expand`, which should be
removed once targets set this explicitly.

Adjusts 11 lit tests to reflect slightly different behavior during
DAG combine.

Differential Revision: https://reviews.llvm.org/D125247
2022-06-01 12:48:01 -07:00
Denis Antrushin 7047d79fde [TwoAddressInstructionPass] Relax assert in statepoint processing.
D124631 added special processing for STATEPOINT instructions.
It appears that assertion added there is too strong. We can get two
tied operands with the same register tied to different defs. If we
hit such case, do not process it in statepoint-specific code and
delegate it to common case.
2022-06-01 21:34:52 +07:00
Matt Arsenault 0e1c71e4a4 CodeGen: Move getAddressSpaceForPseudoSourceKind into TargetMachine
Avoid the dependency on TargetInstrInfo, which depends on the subtarget
and therefore the individual function.

Currently AMDGPU is constructing PseudoSourceValue instances in MachineFunctionInfo.
In order to facilitate copying MachineFunctionInfo, we need to stop allocating these
there. Alternatively we could allow targets to subclass PseudoSourceValueManager,
and allocate them similarly to MachineFunctionInfo.
2022-06-01 09:45:40 -04:00
Martin Storsjö 6b75a3523f [ARM] [MC] Add support for writing ARM WinEH unwind info
This includes .seh_* directives for generating it from assembly.
It is designed fairly similarly to the ARM64 handling.

For .seh_handler directives, such as
".seh_handler __C_specific_handler, @except" (which is supported
on x86_64 and aarch64 so far), the "@except" bit doesn't work in
ARM assembly, as '@' is used as a comment character (on all current
platforms).

Allow using '%' instead of '@' for this purpose. This convention
is used by GAS in similar contexts already,
e.g. [1]:

    Note on targets where the @ character is the start of a comment
    (eg ARM) then another character is used instead. For example the
    ARM port uses the % character.

In practice, this unfortunately means that all such .seh_handler
directives will need ifdefs for ARM.

Contrary to ARM64, on ARM, it's quite common that we can't evaluate
e.g. the function length at this point, due to instructions whose
length is finalized later. (Also, inline jump tables end with
a ".p2align 1".)

If unable to to evaluate the function length immediately, emit
it as an MCExpr instead. If we'd implement splitting the unwind
info for a function (which isn't implemented for ARM64 yet either),
we wouldn't know whether we need to split it though.

Avoid calling getFrameIndexOffset() on an unset
FuncInfo.UnwindHelpFrameIdx, to avoid triggering asserts in the
preexisting testcase CodeGen/ARM/Windows/wineh-basic.ll. (Once
MSVC exception handling is fully implemented, those changes
can be reverted.)

[1] https://sourceware.org/binutils/docs/as/Section.html#Section

Differential Revision: https://reviews.llvm.org/D125645
2022-06-01 11:25:48 +03:00