Commit Graph

31928 Commits

Author SHA1 Message Date
Jeremy Morse 536b9eb31e [DebugInfo][InstrRef] Add extra indirection for NRVO tests
In some scenarios, usually involving NRVO, we can issue indirect DBG_VALUEs
after SelectionDAG, even in instruction referencing mode (if the variable
is an argument). If the corresponding argument value is spilt to the stack,
then we have:
 * Indirection from it being on the stack,
 * Indirection from it being a dbg.declare or a dbg.addr.

However InstrRefBasedLDV only emits one level of indirection. This patch
adds the second, by adding an extra DW_OP_deref if necessary. The two
tests modified fail otherwise -- they feature some NRVO, and require two
levels of indirection to be correct.

Differential Revision: https://reviews.llvm.org/D114364
2021-11-25 21:43:38 +00:00
Jeremy Morse 3107081e94 [DebugInfo][InstrRef] Avoid some quadratic behaviour in LiveDebugVariables
This is a performance patch -- LiveDebugVariables can behave quadratically
if a lot of debug instructions are inserted back into the same place, and
we have to repeatedly step-over hte ones we've already inserted.

To get around it, whenever we insert a debug instruction at a slot index,
check whether there are more debug instructions to insert at this point,
and insert them too. That avoids the repeated lookup and stepping through.
It relies on the container for unlinked debug instructions being recorded
in-order, which is how LiveDebugVariables currently does it.

Differential Revision: https://reviews.llvm.org/D114587
2021-11-25 20:31:00 +00:00
Kazu Hirata bfd5dd1568 [llvm] Use range-based for loops (NFC) 2021-11-25 08:55:16 -08:00
Jeremy Morse 102d2a8a99 [DebugInfo][InstrRef] Track variable assignments in out-of-scope blocks
DBG_INSTR_REF's and  DBG_VALUE's can end up in blocks that aren't in the
lexical scope of their variable. It's arguable as to what we should do
about this, however VarLocBasedLDV permits such variable locations to be
propagated, so let's allow it in InstrRefBasedLDV.

It's necessary for the modified test to work.

Differential Revision: https://reviews.llvm.org/D114578
2021-11-25 14:52:11 +00:00
Simon Pilgrim 63b1e58f07 [DAG] SimplifyDemandedBits - simplify rotl/rotr to shl/srl (REAPPLIED)
If we only demand bits from one half of a rotation pattern, see if we can simplify to a logical shift.

For the ARM/AArch64 rev16/32 patterns, I had to drop a fold to prevent srl(bswap()) -> rotr(bswap) -> srl(bswap) infinite loops. I've replaced this with an isel PatFrag which should do the same task.

Reapplied with fix for AArch64 rev patterns to matching the ARM fix.

https://alive2.llvm.org/ce/z/iroxki (rol -> shl by amt iff demanded bits has at least as many trailing zeros as the shift amount)
https://alive2.llvm.org/ce/z/4ez_U- (ror -> shl by revamt iff demanded bits has at least as many trailing zeros as the reverse shift amount)
https://alive2.llvm.org/ce/z/cD7dR- (ror -> lshr by amt iff demanded bits has at least as many leading zeros as the shift amount)
https://alive2.llvm.org/ce/z/_XGHtQ (rol -> lshr by revamt iff demanded bits has at least as many leading zeros as the reverse shift amount)

Differential Revision: https://reviews.llvm.org/D114354
2021-11-25 11:14:15 +00:00
David Green 3a700cabdc [SDAG] Allow Unknown sizes when refining MMO alignments. NFC
The changes in D113888 / 32b6c17b29 altered the memory size of a
masked store, as it will store an unknown number of bytes not the full
vector size. We can have situations where the masked stores is legalized
and then turned to a normal store, as the mask is known to be all ones.
This creates a store with an unknown size MMO that was hitting this
assert.

The store created can be given a better size in a followup patch. This
currently adjusts the assert to handle unknown sizes.
2021-11-25 10:19:29 +00:00
Jameson Nash 0332d105b9 GlobalISel: remove assert that memcpy Src and Dst addrspace must be identical
The LangRef does not require these arguments to have the same type.

Differential Revision: https://reviews.llvm.org/D93154
2021-11-24 20:23:05 -05:00
Zarko Todorovski 95875d246a [LLVM][NFC]Inclusive language: remove occurances of sanity check/test from llvm
Part of work to use more inclusive language in clang/llvm. Rewording
some comments and change function and variable names.
2021-11-24 17:29:55 -05:00
Jeremy Morse bfadc5dcbf [DebugInfo][InstrRef] Cope with win32 calls changing SP in LiveDebugValues
Almost all of the time, call instructions don't actually lead to SP being
different after they return. An exception is win32's _chkstk, which which
implements stack probes. We need to recognise that as modifying SP, so
that copies of the value are tracked as distinct vla pointers.

This patch adds a target frame-lowering hook to see whether stack probe
functions will modify the stack pointer, store that in an internal flag,
and if it's true then scan CALL instructions to see whether they're a
stack probe. If they are, recognise them as defining a new stack-pointer
value.

The added test exercises this behaviour: two calls to _chkstk should be
considered as producing two different values.

Differential Revision: https://reviews.llvm.org/D114443
2021-11-24 19:56:21 +00:00
Jeremy Morse 133e25f946 [DebugInfo][InstrRef] Ignore SP clobbers on call instructions even more
Avoid un-necessarily recreating DBG_VALUEs on call instructions.

In LiveDebugvalues we choose to ignore any clobbers of SP by call
instructions, as they're irrelevant to our model of the machine. We
currently do so for tracking register values (MTracker); do the same for
tracking variable locations (TTracker).

Test modified to endure that a duplicate DBG_VALUE is not created after the
call in struction in this test.

Differential Revision: https://reviews.llvm.org/D114365
2021-11-24 17:25:48 +00:00
Benjamin Kramer d32787230d Revert "[DAG] SimplifyDemandedBits - simplify rotl/rotr to shl/srl"
This reverts commit 3cf4a2c620.

It makes llc hang on the following test case.
```
target datalayout = "e-m:e-i8:8:32-i16:16:32-i64:64-i128:128-n32:64-S128"
target triple = "aarch64-unknown-linux-gnu"

define dso_local void @_PyUnicode_EncodeUTF16() local_unnamed_addr #0 {
entry:
  br label %while.body117.i

while.body117.i:                                  ; preds = %cleanup149.i, %entry
  %out.6269.i = phi i16* [ undef, %cleanup149.i ], [ undef, %entry ]
  %0 = load i16, i16* undef, align 2
  %1 = icmp eq i16 undef, -10240
  br i1 %1, label %fail.i, label %cleanup149.i

cleanup149.i:                                     ; preds = %while.body117.i
  %or130.i = call i16 @llvm.bswap.i16(i16 %0) #2
  store i16 %or130.i, i16* %out.6269.i, align 2
  br label %while.body117.i

fail.i:                                           ; preds = %while.body117.i
  ret void
}

; Function Attrs: nofree nosync nounwind readnone speculatable willreturn
declare i16 @llvm.bswap.i16(i16) #1

attributes #0 = { "target-features"="+neon,+v8a" }
attributes #1 = { nofree nosync nounwind readnone speculatable willreturn }
attributes #2 = { mustprogress nofree norecurse nosync nounwind readnone uwtable willreturn "frame-pointer"="non-leaf" "min-legal-vector-width"="0" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="generic" "target-features"="+neon,+v8a" }
```
2021-11-24 14:42:54 +01:00
Simon Pilgrim 3cf4a2c620 [DAG] SimplifyDemandedBits - simplify rotl/rotr to shl/srl
If we only demand bits from one half of a rotation pattern, see if we can simplify to a logical shift.

For the ARM rev16 patterns, I had to drop a fold to prevent srl(bswap()) -> rotr(bswap) -> srl(bswap) infinite loops. I've replaced this with an isel PatFrag which should do the same task.

https://alive2.llvm.org/ce/z/iroxki (rol -> shl by amt iff demanded bits has at least as many trailing zeros as the shift amount)
https://alive2.llvm.org/ce/z/4ez_U- (ror -> shl by revamt iff demanded bits has at least as many trailing zeros as the reverse shift amount)
https://alive2.llvm.org/ce/z/cD7dR- (ror -> lshr by amt iff demanded bits has at least as many leading zeros as the shift amount)
https://alive2.llvm.org/ce/z/_XGHtQ (rol -> lshr by revamt iff demanded bits has at least as many leading zeros as the reverse shift amount)

Differential Revision: https://reviews.llvm.org/D114354
2021-11-24 11:28:35 +00:00
David Sherwood cf40ca026f [NFC] Tidy up SelectionDAGBuilder::visitIntrinsicCall to use existing sdl debug loc
In quite a few places we were calling getCurSDLoc() to get the debug
location, but this is already a local variable `sdl`.

Differential Revision: https://reviews.llvm.org/D114447
2021-11-24 10:35:49 +00:00
Jeremy Morse b8f68ad9cd [DebugInfo][InstrRef] Avoid crash when values optimised out late in sdag
It appears that we can emit all the instructions for a function, including
debug instructions, and then optimise some of the values out late.
Specifically, in the attached test case, an argument gets optimised out
after DBG_VALUE / DBG_INSTR_REFs are created. This confuses
MachineFunction::finalizeDebugInstrRefs, which expects to be able to find a
defining instruction, and crashes instead.

Fix this by identifying when there's no defining instruction, and
translating that instead into a DBG_VALUE $noreg.

Differential Revision: https://reviews.llvm.org/D114476
2021-11-24 10:34:48 +00:00
Jun Ma 17eb6b61de Revert "[Taildup] Don't tail-duplicate loop header with multiple successors as its latches"
This reverts commit 1f9fa54984.
2021-11-24 10:26:37 +08:00
Matt Arsenault 273a0c8bc9 PrologEpilogInserter: Use explicit control for scavenge slot placement
AMDGPU is unusual in that the both stack is indexed in the same
direction as stack growth (up). We therefore always need the emergency
stack slots placed as low as possible to ensure they are in range of
load/store instruction immediate offsets. The existing logic is mostly
OK, but failed if we required stack realignment.

I don't understand what the existing control isFPCloseToIncomingSP is
supposed to mean, but can only be used to stop placing the scavenge
slots earlier. Make this explicit so that targets can opt-in rather
than opt-out only.
2021-11-23 18:01:12 -05:00
Rong Xu bf1138491a [SampleFDO] Recompute BFI if the sample loader changes BPI
The MIR sample loader changes the branch probability but not BFI.
Here we force a recompute of BFI if the branch probabilities are
changed.

Also register the MIR FSAFDO passes properly.

Differential Revision: https://reviews.llvm.org/D114400
2021-11-23 13:24:31 -08:00
Quinn Pham 1345bc5e16 [NFC][llvm] Inclusive language: remove instance of master in LiveRangeUtils.h
[NFC] As part of using inclusive language within the llvm project, this patch
replaces master with primary in `LiveRangeUtils.h`.

Reviewed By: MatzeB

Differential Revision: https://reviews.llvm.org/D114191
2021-11-23 13:07:42 -06:00
Kazu Hirata d45cb1d7ea [llvm] Use range-based for loops (NFC) 2021-11-23 08:54:48 -08:00
Simon Moll 1e65b93f3a [VP] Canonicalize macros of VPIntrinsics.def
Usage and naming of macros in VPIntrinsics.def has been inconsistent. Rename all property macros to VP_PROPERTY_<name>.  Use BEGIN/END scope macros to attach properties to vp intrinsics and SDNodes (instead of specifying either directly with the property macro).
A follow-up patch has documentation on how the macros are (intended) to be used.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D114144
2021-11-23 16:51:11 +01:00
David Green 32b6c17b29 [SDAG] Use UnknownSize for masked load/store MMO size
A masked load or store will load a potentially unknown number of bytes
from a memory location - that is not generally known at compile time.
They do not necessarily load/store the entire vector width, and treating
them as such can lead to incorrect aliasing information (for example, if
the underlying object is smaller than the size of the vector).

This makes sure that the MMO is given an unknown size to represent this.
which is less accurate that "may load/store from up to 16 bytes", but
less incorrect that "will load/store from 16 bytes".

Differential Revision: https://reviews.llvm.org/D113888
2021-11-23 09:47:56 +00:00
Kazu Hirata d5b73a70a0 [llvm] Use range-based for loops (NFC) 2021-11-22 20:33:28 -08:00
Nico Weber 2fb3c05b34 [asm] Merge EmitMSInlineAsmStr() and EmitGCCInlineAsmStr()
This basically reverts 1778831a3d, which split them.
Since they were split 9 years ago, EmitGCCInlineAsmStr() grew a bunch of
features that usually weren't added to EmitMSInlineAsmStr(), and
that was usually a mistake.  D71677, D113932, D114167 are all examples
of where things were backported to EmitMSInlineAsmStr().

The names were also not great. EmitMSInlineAsmStr() used to be called for `asm
inteldialect`, which clang produces for Microsoft-style __asm { ... } blocks as
well for GCC-style __asm__ / asm statements with -masm=intel. On the other hand,
EmitGCCInlineAsmStr() used to be called for `asm`, whic clang produces for
GCC-style __asm__ / asm statements with -masm=att (the default).

It's also less code (23 insertions, 188 deletions).

No behavior change.

Differential Revision: https://reviews.llvm.org/D114330
2021-11-22 11:49:57 -05:00
Nico Weber 7c2d51474a [asm] Allow labels as operands in intel asm syntax
This makes a line in llvm/test/CodeGen/X86/asm-block-labels.ll pass
with `asm inteldialect` too.

I don't know if this is something one can hit in practice with inline
asm. The test is from 2007 (4646aa3e33) but in 2009 blockaddr was
introduced and e.g. `__asm__ __volatile__("brl %0" :: "X"(&&foo) : "memory");`
compiles to

    call void asm sideeffect "brl $0", "X,..."(i8* blockaddress(@func, %1))

nowadays (thanks to jrtc27 for that example!).

(6c4d255bf3 switched clang to blockaddress on an opt-in basis,
e4801f7844 added docs for it, 31b132c0b7 added IR support.)

I half-heartedly tried to build clang 2.8 locally, but it didn't
just build. And 2.8 didn't have a prebuilt clang binary yet.

The motivation is to make EmitGCCInlineAsmStr() and EmitMSInlineAsmStr()
more alike, and maybe we should delete this code form EmitGCCInlineAsmStr()
instead. But since it's just 3 lines and it's reachable from LLVM IR,
let's do the safer thing for now.

Differential Revision: https://reviews.llvm.org/D114329
2021-11-22 11:49:29 -05:00
Kazu Hirata c133fb321f [CodeGen] Use llvm::is_contained (NFC) 2021-11-21 10:36:20 -08:00
Kazu Hirata fc981cedea [llvm] Use range-based for loops (NFC) 2021-11-21 10:36:18 -08:00
Kazu Hirata f6bce30cf9 [llvm] Use range-based for loops (NFC) 2021-11-20 18:42:10 -08:00
Nico Weber 8b76d33c59 [asm] Allow block address operands in `asm inteldialect`
This makes the following program build with -masm=intel:

    int foo(int count) {
      asm goto ("dec %0; jb %l[stop]" : "+r" (count) : : : stop);
      return count;
    stop:
      return 0;
    }

It's also is another step towards merging EmitGCCInlineAsmStr() and
EmitMSInlineAsmStr().

Differential Revision: https://reviews.llvm.org/D114167
2021-11-19 09:27:30 -05:00
Nico Weber 4f9a5c2a14 [asm] Remove explicit branch for modifier 'l'
No intended behavior change.

EmitGCCInlineAsmStr() used to explicitly check for modifier 'l'
after handling block address and machine basic block operands.
This prevented passing a MachineOperand with 'l' modifier to
PrintAsmMemoryOperand(). Conceptually that seems kind of nice,
but in practice the overrides of PrintAsmMemoryOperand() in all (*)
AsmPrinter subclasses already reject modifiers they don't know about,
and none of them don't know about 'l'. So removing this doesn't have
a behavior difference, is less code, and it makes EmitGCCInlineAsmStr()
and EmitMSInlineAsmStr() more similar, to prepare for merging them later.

(Why not _add_ the branch to EmitMSInlineAsmStr() instead? Because that
always works with X86AsmPrinter I think, and
X86AsmPrinter::PrintAsmMemoryOperand() very decisively rejects the 'l'
modifier, so it's hard to motivate adding that branch.)

*: The one exception was AVRAsmPrinter, which had an llvm_unreachable instead
of returning true. So this commit changes that, so that the AVR target keeps
emitting an error instead of crashing when passing a mem operand with a :l
modifier to it. All the other targets already don't crash on this.

Differential Revision: https://reviews.llvm.org/D114216
2021-11-19 09:19:53 -05:00
Simon Pilgrim 812e64ef0c [DAG] MatchRotate - support rotate-by-constant of illegal types
Patch to fix some of the regressions in D77804.

By folding to rotate/funnel-shift by constant amounts for illegal types, we prevent SimplifyDemandedBits from destroying the patterns prematurely, allowing us to use the rotate/funnel-shift legalization that was added in D112443.

Differential Revision: https://reviews.llvm.org/D113192
2021-11-19 11:12:04 +00:00
Kazu Hirata 7ca14f6044 [llvm] Use range-based for loops (NFC) 2021-11-18 09:09:52 -08:00
Eric Tang 9fe6b9e802 [TargetLowering][RISCV] Fixed a scalable vector issue when lowering [s|u]mul.overflow intrinsics
Fixed the vector type issue that where we used getVectorNumElements()
    should be replaced by getVectorElementCount() when lowering these
    intrinsics.

    This is similar to D94149

Signed-off-by: Eric Tang <tangxingxin1008@gmail.com>

Reviewed By: craig.topper, frasercrmck

Differential Revision: https://reviews.llvm.org/D109809
2021-11-18 10:16:08 +00:00
Craig Topper d78fdf111d [LegalizeTypes] Further limit expansion of CTTZ during type promotion.
Don't expand CTTZ if CTPOP or CTLZ is supported on the promoted type.
We have special handling for CTTZ expansion to use those ops with a
small conversion. The setup for that doesn't generate extra code or
large constants so we don't gain anything from expanding early and we
make CTTZ_ZERO_UNDEF codegen worse.

Follow up from post commit feedback on D112268. We don't seem to have
any in tree tests that care about this.
2021-11-17 15:27:29 -08:00
Nico Weber bf834b2629 [x86/asm] Let EmitMSInlineAsmStr() handle variants too
This is preparation for D113707, where I want to make `-masm=intel`
emit `asm inteldialect` instructions.

`{movq %rbx, %rax|mov rax, rbx}` is supposed to evaluate to the bit
between { and | for att and to the bit between | and } for intel.
Since intel will become `asm inteldialect`, which alls EmitMSInlineAsmStr(),
EmitMSInlineAsmStr() has to support variants as well.

(clang translates `{...|...}` to `$(...$|...$)`. I'm not sure why
it doesn't just send along only the first `...` or the second `...`
to LLVM, but given the notes in PR23933 let's not do a big
reorganization in this codepath.)

Differential Revision: https://reviews.llvm.org/D113932
2021-11-17 13:31:59 -05:00
Craig Topper 0274be28d7 [RISCV] Lower vector CTLZ_ZERO_UNDEF/CTTZ_ZERO_UNDEF by converting to FP and extracting the exponent.
If we have a large enough floating point type that can exactly
represent the integer value, we can convert the value to FP and
use the exponent to calculate the leading/trailing zeros.

The exponent will contain log2 of the value plus the exponent bias.
We can then remove the bias and convert from log2 to leading/trailing
zeros.

This doesn't work for zero since the exponent of zero is zero so we
can only do this for CTLZ_ZERO_UNDEF/CTTZ_ZERO_UNDEF. If we need
a value for zero we can use a vmseq and a vmerge to handle it.

We need to be careful to make sure the floating point type is legal.
If it isn't we'll continue using the integer expansion. We could split the vector
and concatenate the results but that needs some additional work and evaluation.

Differential Revision: https://reviews.llvm.org/D111904
2021-11-17 10:29:41 -08:00
Nico Weber 103cc914d6 [x86/asm] Make variants work when converting at&t inline asm input to intel asm output
`asm` always has AT&T-style input (`asm inteldialect` has Intel-style asm
input), so EmitGCCInlineAsmStr() always has to pick the same variant since it
cares about the input asm string, not the output asm string.

For PowerPC, that default variant is 1. For other targets, it's 0.

Without this, the included test case errors out with

    error: unknown use of instruction mnemonic without a size suffix
             mov rax, rbx

since it picks the intel branch and then tries to interpret it as AT&T
when selecting intel-style output with `-x86-asm-syntax=intel`.

Differential Revision: https://reviews.llvm.org/D113894
2021-11-17 13:23:18 -05:00
DianQK 1e9fa0b12a Fix the side effect of outlined function when the register is implicit use and implicit-def in the same instruction.
This is the diff associated with {D95267}, and we need to mark $x0 as live whether or not $x0 is dead.

The compiler also needs to mark register $x0 as live in for the following case.

```
$x1 = ADDXri $sp, 16, 0
BL @spam, csr_darwin_aarch64_aapcs, implicit-def dead $lr, implicit $sp, implicit $x0, implicit killed $x1, implicit-def $sp, implicit-def $x0
```

This change fixes an issue where the wrong registers were used when -machine-outliner-reruns>0.
As an example:

```
lang=c
typedef struct {
    double v1;
    double v2;
} D16;

typedef struct {
    D16 v1;
    D16 v2;
} D32;

typedef long long LL8;
typedef struct {
    long long v1;
    long long v2;
} LL16;
typedef struct {
    LL16 v1;
    LL16 v2;
} LL32;

typedef struct {
    LL32 v1;
    LL32 v2;
} LL64;

LL8 needx0(LL8 v0, LL8 v1);

void bar(LL64 v1, LL32 v2, LL16 v3, LL32 v4, LL8 v5, D16 v6, D16 v7, D16 v8);

LL8 foo(LL8 v0, LL64 v1, LL32 v2, LL16 v3, LL32 v4, LL8 v5, D16 v6, D16 v7, D16 v8)
{
  LL8 result = needx0(v0, 0);
  bar(v1, v2, v3, v4, v5, v6, v7, v8);
  return result + 1;
}
```

As you can see from the `foo` function, we should not modify the value of `x0` until we call `needx0`.
This code is compiled to give the following instruction MIR code.

```
$sp = frame-setup SUBXri $sp, 256, 0
frame-setup STPDi killed $d13, killed $d12, $sp, 16
frame-setup STPDi killed $d11, killed $d10, $sp, 18
frame-setup STPDi killed $d9, killed $d8, $sp, 20

frame-setup STPXi killed $x26, killed $x25, $sp, 22
frame-setup STPXi killed $x24, killed $x23, $sp, 24
frame-setup STPXi killed $x22, killed $x21, $sp, 26
frame-setup STPXi killed $x20, killed $x19, $sp, 28
...
$x1 = MOVZXi 0, 0
BL @needx0, csr_darwin_aarch64_aapcs, implicit-def dead $lr, implicit $sp, implicit $x0, implicit $x1, implicit-def $sp, implicit-def $x0
...
```

Since there are some other instruction sequences that duplicate `foo`, after the first execution of Machine Outliner you will get:
```
$sp = frame-setup SUBXri $sp, 256, 0
frame-setup STPDi killed $d13, killed $d12, $sp, 16
frame-setup STPDi killed $d11, killed $d10, $sp, 18
frame-setup STPDi killed $d9, killed $d8, $sp, 20

$x7 = ORRXrs $xzr, $lr, 0
BL @OUTLINED_FUNCTION_0, implicit-def $lr, implicit $sp, implicit-def $lr, implicit $sp, implicit $xzr, implicit $x7, implicit $x19, implicit $x20, implicit $x21, implicit $x22, implicit $x23, implicit $x24, implicit $x25, implicit $x26
$lr = ORRXrs $xzr, $x7, 0
...
BL @OUTLINED_FUNCTION_1, implicit-def $lr, implicit $sp, implicit-def $lr, implicit-def $sp, implicit-def $x0, implicit-def $x1, implicit $sp
...
```

For the first time we outlined the following sequence:
```
frame-setup STPXi killed $x26, killed $x25, $sp, 22
frame-setup STPXi killed $x24, killed $x23, $sp, 24
frame-setup STPXi killed $x22, killed $x21, $sp, 26
frame-setup STPXi killed $x20, killed $x19, $sp, 28
```
and
```
$x1 = MOVZXi 0, 0
BL @needx0, csr_darwin_aarch64_aapcs, implicit-def dead $lr, implicit $sp, implicit $x0, implicit $x1, implicit-def $sp, implicit-def $x0
```

When we execute the outline again, we will get:
```
$x0 = ORRXrs $xzr, $lr, 0 <---- here
BL @OUTLINED_FUNCTION_2_0, implicit-def $lr, implicit $sp, implicit-def $sp, implicit-def $lr, implicit $sp, implicit $xzr, implicit $d8, implicit $d9, implicit $d10, implicit $d11, implicit $d12, implicit $d13, implicit $x0
$lr = ORRXrs $xzr, $x0, 0

$x7 = ORRXrs $xzr, $lr, 0
BL @OUTLINED_FUNCTION_0, implicit-def $lr, implicit $sp, implicit-def $lr, implicit $sp, implicit $xzr, implicit $x7, implicit $x19, implicit $x20, implicit $x21, implicit $x22, implicit $x23, implicit $x24, implicit $x25, implicit $x26
$lr = ORRXrs $xzr, $x7, 0
...
BL @OUTLINED_FUNCTION_1, implicit-def $lr, implicit $sp, implicit-def $lr, implicit-def $sp, implicit-def $x0, implicit-def $x1, implicit $sp
```

When calling `OUTLINED_FUNCTION_2_0`, we used `x0` to save the `lr` register.
The reason for the above error appears to be that:
```
BL @OUTLINED_FUNCTION_1, implicit-def $lr, implicit $sp, implicit-def $lr, implicit-def $sp, implicit-def $x0, implicit-def $x1, implicit $sp
```
should be:
```
BL @OUTLINED_FUNCTION_1, implicit-def $lr, implicit $sp, implicit-def $lr, implicit-def $sp, implicit-def $x0, implicit-def $x1, implicit $sp, implicit $x0
```

When processing the same instruction with both `implicit-def $x0` and `implicit $x0` we should keep `implicit $x0`.
A reproducible demo is available at: [https://github.com/DianQK/reproduce_outlined_function_use_live_x0](https://github.com/DianQK/reproduce_outlined_function_use_live_x0).

Reviewed By: jinlin

Differential Revision: https://reviews.llvm.org/D112911
2021-11-17 09:44:10 -08:00
Mirko Brkusanin db6bc2ab51 [AMDGPU][GlobalISel] Fold G_FNEG above when users cannot fold mods
If possible fold fneg into instruction above if users cannot fold mods and we
know it will decrease instruction count.
Follows same logic as SDAG combiner in choosing opportunities to combine.

Differential Revision: https://reviews.llvm.org/D112827
2021-11-17 14:25:13 +01:00
David Sherwood 8d77555b12 [Analysis] Ensure getTypeLegalizationCost returns a simple VT for TypeScalarizeScalableVector
When getTypeConversion returns TypeScalarizeScalableVector we were
sometimes returning a non-simple type from getTypeLegalizationCost.
However, many callers depend upon this being a simple type and will
crash if not. This patch changes getTypeLegalizationCost to ensure
that we always a return sensible simple VT. If the vector type
contains unusual integer types, e.g. <vscale x 2 x i3>, then we just
set the type to MVT::i64 as a reasonable default.

A test has been added here that demonstrates the vectoriser can
correctly calculate the cost of vectorising a "zext i3 to i64"
instruction with a VF=vscale x 1:

  Transforms/LoopVectorize/AArch64/sve-inductions-unusual-types.ll

Differential Revision: https://reviews.llvm.org/D113777
2021-11-17 13:11:58 +00:00
Simon Pilgrim 5fedbd5b18 [DAG] SimplifyDemandedVectorElts - zero_extend_vector_inreg(and(x,c)) -> and(x,c')
If we've only demanded the 0'th element, and it comes from a (one-use) AND, try to convert the zero_extend_vector_inreg into a mask and constant fold it with the AND.
2021-11-17 12:41:48 +00:00
Jay Foad 3264e95938 [CodeGen] Update LiveIntervals in TargetInstrInfo::convertToThreeAddress
Delegate updating of LiveIntervals to each target's
convertToThreeAddress implementation, instead of repairing LiveIntervals
after the fact in TwoAddressInstruction::convertInstTo3Addr.

Differential Revision: https://reviews.llvm.org/D113493
2021-11-17 10:16:47 +00:00
Eric Tang f7eb061a5f [SelectionDAG] Make WidenVecRes_SELECT work for scalable vectors
This change make WidenVecRes_SELECT work for scalable vectors.

    This patch is split from [D110319](https://reviews.llvm.org/D110319)

Signed-off-by: Eric Tang <tangxingxin1008@gmail.com>

Reviewed By: david-arm

Differential Revision: https://reviews.llvm.org/D110388
2021-11-17 08:55:11 +00:00
Aaron Puchert b20da5117f Don't add irrelevant items to queue in DwarfCompileUnit::createScopeChildrenDIE (NFC)
Instead of popping them and then immediately throwing them away, we can
just filter out globals and items in different scopes before adding them
to WorkList. Shouldn't change anything but keep the queue smaller.

Reviewed By: aprantl

Differential Revision: https://reviews.llvm.org/D113864
2021-11-17 00:01:20 +01:00
Aaron Puchert 86b3100cde [DebugInfo] Use DbgEntityKind in DbgEntity interface (NFC)
It was being used occasionally already, and using it on the constructor
and getDbgEntityID has obvious type safety benefits.

Also use llvm_unreachable in the switch as usual, but since only these
two values are used in constructor calls I think it's still NFC.

Reviewed By: probinson

Differential Revision: https://reviews.llvm.org/D113862
2021-11-17 00:01:20 +01:00
Mircea Trofin c6b9b702a0 [NFC][Regalloc] Factor out eviction decision from eviction attempt
This splits tryEvict into a const tryFindEvictionCandidate, which
attempts to find a candidate, and the actual eviction (should the former
be successful)

The newly introduced tryFindEvictionCandidate will move subsequently
into the RegAllocEvictionAdvisor.

RFC: https://lists.llvm.org/pipermail/llvm-dev/2021-November/153639.html

Differential Revision: https://reviews.llvm.org/D113941
2021-11-16 10:50:23 -08:00
Kazu Hirata ee0133dc6d [llvm] Use range-for loops (NFC) 2021-11-16 09:01:56 -08:00
Frederik Gossen 3f3d4e8a15 Fix unused variable warning in LoadStoreOpt.cpp with (void) 2021-11-16 12:03:59 +01:00
Frederik Gossen 2bceb7c8da Revert "Fix unused variable in llvm/lib/CodeGen/GlobalISel/LoadStoreOpt.cpp"
This reverts commit 40a609aebe.
2021-11-16 12:00:17 +01:00
Frederik Gossen ecfe7a3404 Revert "Fix unused variable warning."
This reverts commit a062e2a8ca.
2021-11-16 11:59:34 +01:00
Frederik Gossen 9a6817b7ed Revert "Fix another unused variable error."
This reverts commit 5b84ae7c48.
2021-11-16 11:58:02 +01:00
Adrian Kuegel 5b84ae7c48 Fix another unused variable error. 2021-11-16 11:32:44 +01:00
Adrian Kuegel a062e2a8ca Fix unused variable warning. 2021-11-16 11:17:33 +01:00
Frederik Gossen 40a609aebe Fix unused variable in llvm/lib/CodeGen/GlobalISel/LoadStoreOpt.cpp 2021-11-16 11:05:18 +01:00
Amara Emerson dcd8728d83 Remove unnecessary <any> include. 2021-11-16 00:50:30 -08:00
Kazu Hirata 7f00806a6a [llvm] Use make_early_inc_range (NFC) 2021-11-15 21:28:46 -08:00
Amara Emerson dc84770d55 [GlobalISel] Add a store-merging optimization pass and enable for AArch64.
This is a first attempt at a constant value consecutive store merging pass,
a counterpart to the DAGCombiner's store merging optimization.

The high level goals of this pass:

* Have a simple and efficient algorithm. As close to linear time as we can get.
  Thus, prioritizing scalability of the algorithm over merging every corner case
  we can find. The DAGCombiner's store merging code has been the source of
  compile time and complexity issues in the past and I wanted to avoid that.
* Don't introduce any new data structures for ordering memory operations. In MIR,
  we don't have the concept of chains like we do in the DAG, and the instruction
  order is stricter than enforcing ordering with graph edges. Although I
  considered adding something similar, I couldn't justify the overhead.

The pass is current split into 3 main parts. The main store merging code focuses
on identifying candidate stores and managing the candidate group that's under
consideration for merging. Analyzing addressing of stores is a potentially
complex part and for now there's just a basic implementation to identify easy
cases. Finally, the other main bit of complexity is the alias analysis, which
tries to follow the same logic as the DAG's AA.

Currently this implementation only supports merging of constant stores. Stores
of arbitrary variables are technically possible with a very small change, but
the DAG chooses not to do this. Doing so here makes most code worse since
there's extra overhead in merging values into wider registers.

On AArch64 -Os, this optimization results in very minor savings on CTMark.

Differential Revision: https://reviews.llvm.org/D109131
2021-11-15 21:10:39 -08:00
Fabian Wolff b484fa8289 [X86] Fix crash with inline asm using wrong register name
Fixes PR#48678. `X86TargetLowering::getRegForInlineAsmConstraint()` can adjust the register class to match the type, e.g. change `VR128X` to `VR256X` if the type needs 256 bits. However, the function currently returns the unadjusted register and the adjusted register class, e.g. `xmm15` and `VR256X`, which then causes an assertion failure later because the register class does not contain that register. This patch fixes this behavior.

Reviewed By: pengfei

Differential Revision: https://reviews.llvm.org/D113834
2021-11-16 10:38:12 +08:00
Craig Topper 233def40f7 [DAGCombiner] Prevent unfoldMaskedMerge from creating an AND with two inverted inputs.
It's possible that the mask is already a NOT. At least if InstCombine
hasn't canonicalized the input. In that case we will form an ANDN with
X instead of with Y. So we don't need to worry about Y being a constant.

We might need to check that X isn't a constant instead, but we don't
have a test case for that yet.

This fixes a size regression found when trying to enable this combine
for RISCV in D113937.

Differential Revision: https://reviews.llvm.org/D113948
2021-11-15 17:15:51 -08:00
Mircea Trofin 19e6b730ce [NFC][Regalloc] Factor types that would be used by the eviction advisor
This is in prepartion of pulling the eviction decision-making into an
analysis pass, which would then allow swapping that decision making
process.

RFC: https://lists.llvm.org/pipermail/llvm-dev/2021-November/153639.html

Differential Revision: https://reviews.llvm.org/D113929
2021-11-15 13:15:14 -08:00
Nico Weber b4e50e5228 [asm] Make EmitMSInlineAsmStr and EmitGCCInlineAsmStr more alike
https://reviews.llvm.org/D71677 copied a bunch of code from
EmitGCCInlineAsmStr() to EmitMSInlineAsmStr() but made a few small
(likely unintentional) changes. This makes these pieces look the same.

No behavior change.

(Why are these functions two copies? No great reason as far as I can tell.
https://reviews.llvm.org/rG1778831a3d1d24ab6545635f63da4d9c5f8f0ac7 did the
split; we might want to undo them at some point. But PR23933 suggests
that a bigger change is planned for this file in the future, so keeping
this incremental for now.)

Differential Revision: https://reviews.llvm.org/D113924
2021-11-15 15:43:01 -05:00
Nico Weber 0be836b7dd [asm] Convert AsmPrinter::PrintSpecial() to StringRef
No behavior change.

Differential Revision: https://reviews.llvm.org/D113911
2021-11-15 15:38:27 -05:00
Nico Weber 833393e021 [asm] Correctly handle special names in variants
There's really no reason why anyone should use these special names in a variant.
I noticed this while reading the code: all other writes to OS are guarded by
this conditional, and the behavior with the check seems more correct, so
let's add the check.

Differential Revision: https://reviews.llvm.org/D113909
2021-11-15 15:37:09 -05:00
Simon Pilgrim 7bac1985f4 [DAG] SimplifyVBinOp - add SDLoc() argument
Pass in SDLoc instead of (repeated) local creations in SimplifyVBinOp and scalarizeBinOpOfSplats
2021-11-15 10:43:56 +00:00
Simon Pilgrim 8658d20724 [DAG] SimplifyVBinOp - pull out repeated getValueType() call. NFC. 2021-11-15 10:43:55 +00:00
Jay Foad 4119da2f7c [MachineVerifier] Live interval for a subreg must have subranges
MachineVerifier verified the subranges of a live interval if
they existed, but did not complain if they did not exist.

This patch changes the verifier to complain if there are no
subranges in the live interval for a subreg operand (so long
as MachineRegisterInfo says we should be tracking subreg
liveness for that register). This matches the conditions for
LiveIntervalCalc to create subranges in the first place.

Differential Revision: https://reviews.llvm.org/D112556
2021-11-15 10:13:35 +00:00
Kyungwoo Lee 6747d44bda [DebugInfo] Fix end_sequence of debug_line in LTO Object
In a LTO build, the `end_sequence` in debug_line table for each compile unit (CU) points the end of text section which merged all CUs. The `end_sequence` needs to point to the end of each CU's range. This bug often causes invalid `debug_line` table in the final `.dSYM` binary for MachO after running `dsymutil` which tries to compensate an out-of-range address of `end_sequence`.
The fix is to sync the line table termination with the range operations that are already maintained in DwarfDebug. When CU or section changes, or nodebug functions appear or module is finished, the prior pending line table is terminated using the last range label. In the MC path where no range is tracked, the old logic is conservatively used to end the line table using the section end symbol.

Reviewed By: dblaikie

Differential Revision: https://reviews.llvm.org/D108261
2021-11-14 20:19:47 -08:00
Kazu Hirata feb40a3a47 [llvm] Use range-based for loops with instructions (NFC) 2021-11-14 19:40:48 -08:00
Kazu Hirata d243cbf8ea [llvm] Use isa instead of dyn_cast (NFC) 2021-11-14 19:40:46 -08:00
Mircea Trofin a32c2c3808 [NFC] Use Optional<ProfileCount> to model invalid counts
ProfileCount could model invalid values, but a user had no indication
that the getCount method could return bogus data. Optional<ProfileCount>
addresses that, because the user must dereference the optional. In
addition, the patch removes concept duplication.

Differential Revision: https://reviews.llvm.org/D113839
2021-11-14 19:03:30 -08:00
Kazu Hirata 7379736774 [llvm] Use range-based for loops with User::operands (NFC) 2021-11-14 09:32:38 -08:00
Sanjay Patel 254c5246e9 [DAGCombiner] match inverted/swapped patterns for vselect of mask of signbit
This was noted as a follow-up to D113212 / D113426:
4fc1fc4005
7e30404c3b
11522cfcad

https://alive2.llvm.org/ce/z/e4o96b

The canonicalization rules for these IR patterns are complicated,
and we were not matching the expected forms in 2 out of the 3
cases. We can make codegen more robust by matching the swapped
forms (and that will also work if these patterns are created late).
2021-11-14 09:35:26 -05:00
David Green 355ee18c5d [TypePromotion] Extend TypePromotion::isSafeWrap
This modifies the preconditions of TypePromotion's isSafeWrap
method, to allow it to work from all constants from the ICmp.
Using the code:
  %a = add %x, C1
  %c = icmp ult %a, C2

According to Alive, we can prove that is equivalent to
icmp ult (add zext(%x), sext(C1)), zext(C2)  given
C1 <=s 0 and C1 >s C2.
https://alive2.llvm.org/ce/z/CECYZB
Which is similar to what is already present. We can also
prove icmp ult (add zext(%x), sext(C1)), sext(C2) given
C1 <=s 0 and C1 <=s C2.
https://alive2.llvm.org/ce/z/KKgyeL

The PrepareWrappingAdds method was removed, and the
constants are now altered to sext or zext directly as
required by the above methods.

Differential Revision: https://reviews.llvm.org/D113678
2021-11-14 11:18:31 +00:00
Kristina Bessonova 5b4bfd8c24 [DwarfCompileUnit] getOrCreateCommonBlock(): check for existing entity first. NFCI
For global variables and common blocks there is no way to create entities
through getOrCreateContextDIE(), so no need to obtain the context first.

Differential Revision: https://reviews.llvm.org/D113651
2021-11-14 10:58:24 +02:00
Kristina Bessonova 90c5ab54a9 [DwarfCompileUnit] getOrCreateGlobalVariableDIE(): remove outdated comment. NFC 2021-11-14 10:56:54 +02:00
Craig Topper 82bc6a094e [X86] Promote f16 STRICT_FROUND to f32 and call libc.
Reviewed By: pengfei

Differential Revision: https://reviews.llvm.org/D113817
2021-11-12 21:37:03 -08:00
Kazu Hirata 99d5cbbd7e [CodeGen] Use SDNode::uses (NFC) 2021-11-12 07:33:29 -08:00
Markus Lavin 4e94e25c90 Fix minor deficiency in machine-sink.
Register uses that are MRI->isConstantPhysReg() should not inhibit
sinking transformation.

Reviewed By: StephenTozer

Differential Revision: https://reviews.llvm.org/D111531
2021-11-12 08:01:13 +01:00
Kazu Hirata 2ca45adf24 [CodeGen, Target] Use MachineRegisterInfo::use_operands (NFC) 2021-11-11 22:28:55 -08:00
Simon Pilgrim 010b09b0c5 [DAG] reassociateOpsCommutative - test getNode result directly. NFC
Matches the clean code style we use directly above
2021-11-11 18:45:50 +00:00
Sanjay Patel 11522cfcad [DAGCombiner] add fold for vselect based on mask of signbit, part 3
(Cond0 s> -1) ? N1 : 0 --> ~(Cond0 s>> BW-1) & N1

https://alive2.llvm.org/ce/z/mGCBrd

This was suggested as a potential enhancement in D113212 (also 7e30404c3b ).
There's an improvement for AArch that could be generalized ( X > -1 --> X >= 0 ).
For x86, we have a counter-acting fold for most cases that turns the shift+not
back into a setcc, so that needs a work-around to get more cases to use "pandn":
D113603

Note that this pattern (and a previous one) are not currently canonical forms
in IR:
https://alive2.llvm.org/ce/z/e4o96b

Adding swapped variants is left as a TODO item here, but is planned as
a near-term follow-up patch.

Differential Revision: https://reviews.llvm.org/D113426
2021-11-11 10:27:37 -05:00
Jay Foad 417add4d4e [CodeGen] Tweak whitespace in LiveInterval printing
When printing a LiveInterval, tweak the use of single and double spaces
to try to make it clearer that the valnos are associated with the
preceding range or subrange, not the following subrange.

Compare the output before and then after this patch:
%1 [32r,144r:0)  0@32r L000000000000000C [32r,144r:0)  0@32r L00000000000000F3 [32r,32d:0)  0@32r weight:0.000000e+00
%1 [32r,144r:0) 0@32r  L000000000000000C [32r,144r:0) 0@32r  L00000000000000F3 [32r,32d:0) 0@32r  weight:0.000000e+00

Differential Revision: https://reviews.llvm.org/D113671
2021-11-11 15:19:32 +00:00
Kazu Hirata ce227ce3b3 [CodeGen] Use MachineInstr::operands (NFC) 2021-11-11 07:10:30 -08:00
Jay Foad 491beae71d [TwoAddressInstruction] Update LiveIntervals after rewriting INSERT_SUBREG to COPY
Also add subranges to an existing live interval when introducing a new
subreg def.

Differential Revision: https://reviews.llvm.org/D113044
2021-11-11 12:24:59 +00:00
Jay Foad 6abbc3a420 [LiveIntervals] Update subranges in processTiedPairs
In TwoAddressInstructionPass::processTiedPairs when updating live
intervals after moving the last use of RegB back to the newly inserted
copy, update any affected subranges as well as the main range.

Differential Revision: https://reviews.llvm.org/D110411
2021-11-11 12:24:59 +00:00
Simon Pilgrim 82b74363a9 [DAG] reassociateOpsCommutative - peek through bitcasts to find constants
Now that FoldConstantArithmetic can fold bitcasted constants, we should peek through bitcasts of binop operands to try and find foldable constants
2021-11-11 12:00:22 +00:00
Simon Pilgrim 098ea29641 [DAG] FoldConstantArithmetic - fold intop(bitcast(buildvector(c1)),bitcast(buildvector(c1))) -> bitcast(intop(buildvector(c1'),buildvector(c2')))
Enable FoldConstantArithmetic to constant fold bitcasted constant build vectors. These have typically been bitcasted for type legalization purposes.

By extracting the raw constant bit data, performing the constant fold, and then casting the constant bit data back to the (legalized) type, we can perform constant folding on integer types after legalization.

This in particular helps 32-bit targets which need to handle vXi64 build vectors - during legalization the (unsupported) i64 elements are split to create a bitcasted v2Xi32 build vector.

Addresses some regressions in D113192.

Differential Revision: https://reviews.llvm.org/D113564
2021-11-11 11:35:18 +00:00
Craig Topper 0963291991 [TypePromotion] Fix a hardcoded use of 32 as the size being promoted to.
At least I think that's what the 32 here is. Use RegisterBitWidth
instead.

While there replace zext with zextOrSelf to simplify the code.

Reviewed By: samparker, dmgreen

Differential Revision: https://reviews.llvm.org/D113495
2021-11-10 22:12:39 -08:00
Kazu Hirata 642a361b7e [llvm] Use make_early_inc_range (NFC) 2021-11-10 19:56:35 -08:00
Fraser Cormack b1d8d70b9d [SelectionDAG] Replace the Chain in LOAD->VP_LOAD widening
The introduction of this legalization, D111248, forgot to replace the
old chain with the new. This could manifest itself in the old
(illegally-typed) value remaining in the DAG, though the simple test
cases didn't catch this.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D113561
2021-11-10 17:49:12 +00:00
Simon Pilgrim 381d14775e [DAG] reassociateOpsCommutative - pull out repeated getOperand() calls. NFC. 2021-11-10 15:19:13 +00:00
Simon Pilgrim ed80761b50 [DAG] Split BuildVectorSDNode::getConstantRawBits into BuildVectorSDNode::recastRawBits helper. NFC.
NFC refactor of D113351, pulling out the APInt split/merge code from the BuildVectorSDNode bits extraction into a BuildVectorSDNode::recastRawBits helper. This is to allow us to reuse the code when we're packing constant folded APInt data back together.
2021-11-10 13:06:19 +00:00
Fraser Cormack 332318ffb6 [SelectionDAG] Widen scalable-vector loads/stores via VP_LOAD/VP_STORE
This patch fixes a compiler crash when widening scalable-vector loads
and stores which end up breaking down to element-wise store operations.
It does so by providing a way for targets with support for
vector-predicated loads and stores to use those instead. By widening the
operation but maintaining the original effective operation length via
the EVL, only the intended vector elements are loaded or stored.

This method should in theory be possible and even preferred for
fixed-length vector types, but all fixed-length types can be broken down
into their elements, and regardless I have observed regressions in the
generated code when doing so. I believe this is simply due to
VP_LOAD/VP_STORE not being up to par with LOAD/STORE in terms of
optimization. It does improve performance on smaller self-contained
examples, however, so the potential is there.

While the only target that benefits from this is RISCV, the legalization
is generic and so was placed centrally.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D111248
2021-11-10 09:55:03 +00:00
Adrian Kuegel f0d997c472 Revert "[DebugInfo] Only create concrete DIEs of concrete functions"
This reverts commit f19471a249.
This leads to a crash. Still working on a reproducer to share.
2021-11-10 10:52:15 +01:00
Kazu Hirata ef2d0e0f20 [llvm] Use MachineBasicBlock::{successors,predecessors} (NFC) 2021-11-09 23:05:15 -08:00
Jessica Paquette 3eabcda814 [GlobalISel] Ensure that translateInvoke adds all successors for inlineasm
The existing code didn't add all necessary successors, which resulted in
disjoint basic blocks. These would end up not being legalized which, in the
best case, caused a fallback only in assert builds.

Here's an example:

https://godbolt.org/z/ndx15Enfj

We also end up getting weird codegen here as well.

Refactoring the code here allows us to correctly attach all successors. With
this patch, the above example gives correct codegen at -O0 with and without
asserts.

Also autogen the testcase to show that we add all the successors now.

Differential Revision: https://reviews.llvm.org/D113437
2021-11-09 16:20:34 -08:00
Arthur Eubanks 05963a3d66 Revert "[DebugInfo] Enforce implicit constraints on `distinct` MDNodes"
This reverts commit ee76525698.

Causes crashes, see comments in D104827.
2021-11-09 14:27:55 -08:00
Ilya Yanok 3c47c5ca13 [RegAllocFast] Fix nondeterminism in debuginfo generation
Changes from commit 1db137b185
added iteration over hash map that can result in non-deterministic
order. Fix that by using a SmallMapVector to preserve the order.

Differential Revision: https://reviews.llvm.org/D113468
2021-11-09 21:42:50 +01:00
Ellis Hoag f19471a249 [DebugInfo] Only create concrete DIEs of concrete functions
At the begining of the module we can iterate through the functions to
see which SPs should have concrete DIEs. Then when we need to reference
a DIE for a SP we can decide if it's ok to create a concrete DIE or not.

Fixes
 * https://bugs.llvm.org/show_bug.cgi?id=52159
 * https://bugs.llvm.org/show_bug.cgi?id=30637

Reviewed By: dblaikie

Differential Revision: https://reviews.llvm.org/D112337
2021-11-09 10:52:34 -08:00
Scott Linder ee76525698 [DebugInfo] Enforce implicit constraints on `distinct` MDNodes
Add UNIQUED and DISTINCT properties in Metadata.def and use them to
implement restrictions on the `distinct` property of MDNodes:

* DIExpression can currently be parsed from IR or read from bitcode
  as `distinct`, but this property is silently dropped when printing
  to IR. This causes accepted IR to fail to round-trip. As DIExpression
  appears inline at each use in the canonical form of IR, it cannot
  actually be `distinct` anyway, as there is no syntax to describe it.
* Similarly, DIArgList is conceptually always uniqued. It is currently
  restricted to only appearing in contexts where there is no syntax for
  `distinct`, but for consistency it is treated equivalently to
  DIExpression in this patch.
* DICompileUnit is already restricted to always being `distinct`, but
  along with adding general support for the inverse restriction I went
  ahead and described this in Metadata.def and updated the parser to be
  general. Future nodes which have this restriction can share this
  support.

The new UNIQUED property applies to DIExpression and DIArgList, and
forbids them to be `distinct`. It also implies they are canonically
printed inline at each use, rather than via MDNode ID.

The new DISTINCT property applies to DICompileUnit, and requires it to
be `distinct`.

A potential alternative change is to forbid the non-inline syntax for
DIExpression entirely, as is done with DIArgList implicitly by requiring
it appear in the context of a function. For example, we would forbid:

    !named = !{!0}
    !0 = !DIExpression()

Instead we would only accept the equivalent inlined version:

    !named = !{!DIExpression()}

This essentially removes the ability to create a `distinct` DIExpression
by construction, as there is no syntax for `distinct` inline. If this
patch is accepted as-is, the result would be that the non-canonical
version is accepted, but the following would be an error and produce a diagnostic:

    !named = !{!0}
    ; error: 'distinct' not allowed for !DIExpression()
    !0 = distinct !DIExpression()

Also update some documentation to consistently use the inline syntax for
DIExpression, and to describe the restrictions on `distinct` for nodes
where applicable.

Reviewed By: StephenTozer, t-tye

Differential Revision: https://reviews.llvm.org/D104827
2021-11-09 18:19:11 +00:00
Chih-Ping Chen cf0e32d197 [CodeView] Properly handle a DISubprogram in getScopeIndex.
Differential Revision: https://reviews.llvm.org/D113142
2021-11-09 13:18:07 -05:00
Kazu Hirata cba40c4ede [llvm] Use MachineBasicBlock::{successors,predecessors} (NFC) 2021-11-09 07:11:14 -08:00
Simon Pilgrim 58c01ef270 [SelectionDAG] Merge FoldConstantVectorArithmetic into FoldConstantArithmetic (PR36544)
This patch merges FoldConstantVectorArithmetic back into FoldConstantArithmetic.

Like FoldConstantVectorArithmetic we now handle vector ops with any operand count, but we currently still only handle binops for scalar types - this can be improved in future patches - in particular some common unary/trinary ops still have poor constant folding.

There's one change in functionality causing test changes - FoldConstantVectorArithmetic bails early if the build/splat vector isn't all constant (with some undefs) elements, but FoldConstantArithmetic doesn't - it instead attempts to fold the scalar nodes and bails if they fail to regenerate a constant/undef result, allowing some additional identity/undef patterns to be handled.

Differential Revision: https://reviews.llvm.org/D113300
2021-11-09 11:31:01 +00:00
Jay Foad 5c3c7adf3a [CodeGen] Fix assertion failure in TwoAddressInstructionPass::rescheduleMIBelowKill
This fixes an assertion failure with -early-live-intervals when trying
to update the live intervals for a debug instruction, which don't even
have slot indexes.

Differential Revision: https://reviews.llvm.org/D113116
2021-11-09 09:24:21 +00:00
Akira Hatanaka 1fe8993ad8 [ObjC][ARC] Replace uses of ObjC intrinsics that are arguments of
operand bundle "clang.arc.attachedcall" with ObjC runtime functions

The existing code only handles the case where the intrinsic being
rewritten is used as the called function pointer of a call/invoke.
2021-11-08 21:19:07 -08:00
Wouter van Oortmerssen 62eeb3e57e [WebAssembly] fix __stack_pointer being added to .debug_aranges
When emitting a reloc for the Wasm global __stack_pointer, it was inadvertedly added to the symbols used for generating aranges, which caused some aranges to use it as the end symbol in a symbol diff, which caused a reloc for it to be emitted, which then caused an assert in `wasm64` since we have no 64-bit relocs for Wasm globals.
Fixes: https://bugs.llvm.org/show_bug.cgi?id=52376
Differential Revision: https://reviews.llvm.org/D113438
2021-11-08 16:30:31 -08:00
Kazu Hirata 3c06920cd1 [llvm] Use make_early_inc_range (NFC) 2021-11-08 09:09:39 -08:00
Simon Pilgrim f059b04f7b [DAG] Add SelectionDAG::ComputeMinSignedBits helper
As suggested on D113371, this adds a wrapper to SelectionDAG::ComputeNumSignBits, similar to the llvm::ComputeMinSignedBits wrapper.

I've included some usage, its not exhaustive, just the more obvious cases where the intention is obvious.

Differential Revision: https://reviews.llvm.org/D113396
2021-11-08 14:12:45 +00:00
Simon Pilgrim f60d3ec0c7 [DAG] Add BuildVectorSDNode::getConstantRawBits helper
We have several places where we need to extract the raw bits data from a BUILD_VECTOR node, so consolidate this to a single helper function that handles Undefs and Integer/FP constants, including implicit truncation.

This should make it easier to extend D113202 to handle more constant folding of bitcasted constant data.

Differential Revision: https://reviews.llvm.org/D113351
2021-11-08 12:07:38 +00:00
Chen Zheng 50acbbe3cd [AsmPrinter][ORE] use correct opcode name
Reviewed By: jsji

Differential Revision: https://reviews.llvm.org/D113173
2021-11-08 01:51:24 +00:00
Benjamin Kramer 9b8b16457c Put implementation details into anonymous namespaces. NFCI. 2021-11-07 15:18:30 +01:00
Simon Pilgrim 0ff1edeeec [DAG] SimplifyVBinOp - replace FoldConstantVectorArithmetic with FoldConstantArithmetic
Currently FoldConstantArithmetic only handles binops, so replacing other uses of FoldConstantVectorArithmetic (in particular for SETCC nodes), still require more work.
2021-11-07 12:11:46 +00:00
Kazu Hirata 843d1eda18 [llvm] Use llvm::reverse (NFC) 2021-11-06 19:31:18 -07:00
Sanjay Patel 39c4c7d391 [DAGCombiner] remove vselect fold that was accidentally added
This diff snuck into the unrelated:
025a2f73a3

It's a suggested follow-up for D113212, but I need to add test
coverage first.
2021-11-06 09:34:30 -04:00
Sanjay Patel 025a2f73a3 [InstCombine] add tests for umax with sub; NFC 2021-11-06 08:32:52 -04:00
Kazu Hirata 87e53a0ad8 [llvm] Use make_early_inc_range (NFC) 2021-11-05 19:39:07 -07:00
Jay Foad bdaa181007 [TwoAddressInstructionPass] Update existing physreg live intervals
In TwoAddressInstructionPass::processTiedPairs with
-early-live-intervals, update any preexisting physreg live intervals,
as well as virtreg live intervals. By default (without
-precompute-phys-liveness) physreg live intervals only exist for
registers that are live-in to some basic block.

Differential Revision: https://reviews.llvm.org/D113191
2021-11-05 21:20:30 +00:00
Sanjay Patel 7e30404c3b [DAGCombiner] add fold for vselect based on mask of signbit, part 2
This is the 'or' sibling for the fold added with:
D113212

https://alive2.llvm.org/ce/z/tgnp7K

Note that neither of these transforms is poison-safe,
but it does not seem to matter at this level. We have
had the scalar version of D113212 for a long time, so
this is just making optimizer behavior consistent.

We do not have the scalar version of *this* fold,
however, so that is another follow-up.
2021-11-05 15:02:12 -04:00
Michael Liao af2ae2cf42 [BranchRelaxation] Fix warning on unused variable. NFC. 2021-11-05 11:18:27 -04:00
Simon Pilgrim 9e6506299a [DAG] FoldConstantVectorArithmetic - remove SDNodeFlags argument
Another minor step towards merging FoldConstantVectorArithmetic into FoldConstantArithmetic.

We don't use SDNodeFlags in any constant folding inside DAG, so passing the Flags argument is a waste of time - an alternative would be to wire up FoldConstantArithmetic to take SDNodeFlags just-in-case we someday start using it, but we don't have any way to test it and I'd prefer to avoid dead code.

Differential Revision: https://reviews.llvm.org/D113276
2021-11-05 14:36:17 +00:00
Sanjay Patel 4fc1fc4005 [DAGCombiner] add fold for vselect based on mask of signbit
(X s< 0) ? Y : 0 --> (X s>> BW-1) & Y

We canonicalize to the icmp+select form in IR, and we already have this fold
for scalar select in SDAG, so I think it's an oversight that we don't have
the fold for vectors. It seems neutral for AArch64 and saves some instructions
on x86.

Whether we should also have the sibling folds for the inverse condition or
all-ones true value may depend on target-specific factors such as whether
there's an "and-not" instruction.

Differential Revision: https://reviews.llvm.org/D113212
2021-11-05 10:06:16 -04:00
Simon Pilgrim f2703c3c33 [DAG] FoldConstantArithmetic - rename NumOps -> NumElts. NFC.
NumOps represents the number of elements for vector constant folding, rename this NumElts so in future we can the consistently use NumOps to represent the number of operands of the opcode.

Minor cleanup before trying to begin generalizing FoldConstantArithmetic to support opcodes other than binops.
2021-11-05 13:32:34 +00:00
Alfredo Dal'Ava Junior 1cb9f37a17 [FreeBSD] Do not mark __stack_chk_guard as dso_local
This symbol is defined in libc.so so it is definitely not DSO-Local.
Marking it as such causes problems on some platforms (such as PowerPC).

Differential revision: https://reviews.llvm.org/D109090
2021-11-05 07:29:50 -05:00
Simon Pilgrim c1e7911c3b [DAG] FoldConstantArithmetic - fold bitlogic(bitcast(x),bitcast(y)) -> bitcast(bitlogic(x,y))
To constant fold bitwise logic ops where we've legalized constant build vectors to a different type (e.g. v2i64 -> v4i32), this patch adds a basic ability to peek through the bitcasts and perform the constant fold on the inner operands.

The MVE predicate v2i64 regressions will be addressed by future support for basic v2i64 type support.

One of the yak shaving fixes for D113192....

Differential Revision: https://reviews.llvm.org/D113202
2021-11-05 12:00:59 +00:00
Jay Foad 0321bd64e6 Revert "[TwoAddressInstructionPass] Update existing physreg live intervals"
This reverts commit ec0e1e88d2.

It was pushed by mistake.
2021-11-05 09:54:26 +00:00
Jay Foad ec0e1e88d2 [TwoAddressInstructionPass] Update existing physreg live intervals
In TwoAddressInstructionPass::processTiedPairs with
-early-live-intervals, update any preexisting physreg live intervals,
as well as virtreg live intervals. By default (without
-precompute-phys-liveness) physreg live intervals only exist for
registers that are live-in to some basic block.

Differential Revision: https://reviews.llvm.org/D113191
2021-11-05 09:10:24 +00:00
Mircea Trofin 34f4fe3a90 [NFC][Regalloc] Ensure Query::interferingVRegs is accurate.
To correctly use Query, one had to first call collectInterferingVRegs to
pre-cache the query result, then call interferingVRegs. Failing the
former, interferingVRegs could be stale. This did cause a bug which was
addressed in D98232, but the underlying usability issue of the Query API
wasn't.

This patch addresses the latter by making collectInterferingVRegs an
implementation detail, and having interferingVRegs play both roles. One
side-effect of this is that interferingVRegs is not const anymore.

Differential Revision: https://reviews.llvm.org/D112882
2021-11-02 18:26:54 -07:00
Chih-Ping Chen 2ed29d87ef [CodeView] Fortran debug info emission in Code View.
Differential Revision: https://reviews.llvm.org/D112826
2021-11-02 15:06:21 -04:00
Arthur Eubanks e2024d72fa Revert "[NFC] Remove LinkAll*.h"
This reverts commit fe364e5dc7.

Causes breakages, e.g. https://lab.llvm.org/buildbot/#/builders/188/builds/5266
2021-11-02 09:08:09 -07:00
Arthur Eubanks fe364e5dc7 [NFC] Remove LinkAll*.h
These were added to prevent functions from being removed by WPO.

But that doesn't make sense, correct WPO will not remove functions we actually use.

I noticed these because compiling cc1_main.cpp was pulling in random LLVM pass headers.

Reviewed By: MaskRay

Differential Revision: https://reviews.llvm.org/D112971
2021-11-02 08:43:17 -07:00
Jay Foad be1a8f8834 [AMDGPU] Really preserve LiveVariables in SILowerControlFlow
https://bugs.llvm.org/show_bug.cgi?id=52204

Differential Revision: https://reviews.llvm.org/D112731
2021-11-02 15:03:37 +00:00
jacquesguan a39eadcf16 [DAGCombiner] Teach combineShiftToMULH to handle constant and const splat vector.
Fold (srl (mul (zext i32:$a to i64), i64:c), 32) -> (mulhu $a, $b),
if c can truncate to i32 without loss.

Reviewed By: frasercrmck, craig.topper, RKSimon

Differential Revision: https://reviews.llvm.org/D108129
2021-11-02 12:04:23 +00:00
Simon Pilgrim 325031786e [SelectionDAG] Optimize expansion for rotates/funnel shifts
If the type of a funnel shift needs to be expanded, expand it to two funnel shifts instead of regular shifts. For constant shifts, this doesn't make much difference, but for variable shifts it allows a more optimal lowering.

Also use the optimized funnel shift lowering for rotates.

Alive2: https://alive2.llvm.org/ce/z/TvHDB- / https://alive2.llvm.org/ce/z/yzPept

(Branched from D108058 as getting this completed should help unlock some other WIP patches).

Original Patch: @efriedma (Eli Friedman)

Differential Revision: https://reviews.llvm.org/D112443
2021-11-02 11:38:25 +00:00
Simon Pilgrim 37e17f278f [DAG] MatchRotate - remove (redundant) legal type check.
Rely on the hasOperation() instead - as commented on D77804, the mid-term intention is to recognise rotate/funnel-by-constant pre-legalization to help avoid SimplifyDemandedBits regressions.
2021-11-02 11:24:50 +00:00
Kazu Hirata 6bdb61c58a [CodeGen] Use make_early_inc_range (NFC) 2021-11-01 22:38:49 -07:00
Jay Foad b8016b626e [CodeGen] Tweak coding style in LivePhysRegs::stepForward. NFC. 2021-11-01 16:01:24 +00:00
Jun Ma 1f9fa54984 [Taildup] Don't tail-duplicate loop header with multiple successors as its latches
when Taildup hit loop with multiple latches like:
  //    1 -> 2 <-> 3                 |
  //          \  <-> 4               |
  //           \   <-> 5             |
  //            \---> rest           |
it may transform this loop into multiple loops by duplicate loop header.
However, this change may has little benefit while makes cfg much complex.
In some uncommon cases, it causes large compile time regression (offered by
@alexfh in D106056).

This patch disable tail-duplicate of such cases.

TestPlan: check-llvm

Differential Revision: https://reviews.llvm.org/D110613
2021-11-01 15:32:00 +08:00
Craig Topper ada5458521 [RISCV] Expand scalable vector bswap. Fix crash for bitreverse.
Fix LegalizeVectorOps to not try shuffle or unrolling expansions for
scalable vectors.

Differential Revision: https://reviews.llvm.org/D112236
2021-10-31 10:01:27 -07:00
Kazu Hirata 1a605f395f [CodeGen] Use make_early_inc_range (NFC) 2021-10-31 07:57:36 -07:00
Kazu Hirata 72710af233 [CodeGen, Target] Use MachineBasicBlock::terminators (NFC) 2021-10-31 07:57:34 -07:00
Kazu Hirata 4cc7c4724f [MachineCSE] Use make_early_inc_range (NFC) 2021-10-30 19:00:23 -07:00
Roman Lebedev 25043c8276
[NFCI] Introduce `ICmpInst::compare()` and use it where appropriate
As noted in https://reviews.llvm.org/D90924#inline-1076197
apparently this is a pretty common pattern,
let's not repeat it yet again, but have it in a common place.

There may be some more places where it could be used,
but these are the most obvious ones.
2021-10-30 17:50:06 +03:00
Christudasan Devadasan aa2d3b59ce GlobalISel/Utils: Use incoming regbank while constraining the superclasses
Register operands with superclasses can possibly have multiple regBanks
if they have different register types. The regBank ambiguity resolved
during regbankselect should be used to constrain the operand regclass
instead of obtaining one from the MCInstrDesc.

This is a prerequisite patch for D109300 that introduces allocatable AV_*
Superclasses for AMDGPU by combining both VGPRs and AGPRs and we want to
restrain the regclass to either A or V based on the incoming regbank.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D112323
2021-10-30 07:20:45 -04:00
Fraser Cormack 8314a04ede [SelectionDAG] Allow FindMemType to fail when widening loads & stores
This patch removes an internal failure found in FindMemType and "bubbles
it up" to the users of that method: GenWidenVectorLoads and
GenWidenVectorStores. FindMemType -- renamed findMemType -- now returns
an optional value, returning None if no such type is found.

Each of the aforementioned users now pre-calculates the list of types it
will use to widen the memory access. If the type breakdown is not
possible they will signal a failure, at which point the compiler will
crash as it does currently.

This patch is preparing the ground for alternative legalization
strategies for vector loads and stores, such as using vector-predication
versions of loads or stores.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D112000
2021-10-29 18:27:31 +01:00
Neubauer, Sebastian c78640ee6a [TailDuplicator] Fix merging block with terminator
The TailDuplicator merged two blocks, even if the first one ended with
a terminator, resulting in invalid MIR, where a terminator is in the
middle of a block.

Abort merging if the first block ends with a terminator.

Differential Revision: https://reviews.llvm.org/D112226
2021-10-29 10:52:46 +02:00
Abinav Puthan Purayil db8d7b6e2d [DAGCombine][NFC] s/it's/its in the comment of hasNoInfs(). 2021-10-29 07:36:38 +05:30
Daniel Kiss d8075e8781 Reland "[ARM] __cxa_end_cleanup should be called instead of _UnwindResume."
This is relanding commit da1d1a0869 .
This patch additionally addresses failures found in buildbots & post review comments.

ARM EHABI[1] specifies the __cxa_end_cleanup to be called after cleanup.
It will call the UnwindResume.
__cxa_begin_cleanup will be called from libcxxabi while __cxa_end_cleanup is never called.
This will trigger a termination when a foreign exception is processed while UnwindResume is called
because the global state will be wrong due to the missing __cxa_end_cleanup call.

Additional test here: D109856
[1] https://github.com/ARM-software/abi-aa/blob/main/ehabi32/ehabi32.rst#941compiler-helper-functions

Reviewed By: logan

Differential Revision: https://reviews.llvm.org/D111703
2021-10-28 21:45:09 +02:00
Guozhi Wei 1e46dcb77b [TwoAddressInstructionPass] Put all new instructions into DistanceMap
In function convertInstTo3Addr, after converting a two address instruction into
three address instruction, only the last new instruction is inserted into
DistanceMap. This is wrong, DistanceMap should track all instructions from the
beginning of current MBB to the working instruction. When a two address
instruction is converted to three address instruction, multiple instructions may
be generated (usually an extra COPY is generated), all of them should be
inserted into DistanceMap.

Similarly when unfolding memory operand in function tryInstructionTransform
DistanceMap is not maintained correctly.

Differential Revision: https://reviews.llvm.org/D111857
2021-10-28 11:11:59 -07:00
Nicolai Hähnle b437aaa672 MachineDominators: Define MachineDomTree type alias
This is a (very) small move towards making the machine dominators more
aligned with the IR dominators:

* DominatorTree / MachineDomTree is the class holding the dominator tree
* DominatorTreeWrapperPass / MachineDominatorTree is the corresponding
  (machine) function pass

This alignment will be used by analyses that are designed as templates
that work with LLVM IR as well as Machine IR.

Reviewed By: critson

Differential Revision: https://reviews.llvm.org/D112690
2021-10-28 22:30:35 +05:30
Daniel Kiss 66e03db814 Revert "Reland "[ARM] __cxa_end_cleanup should be called instead of _UnwindResume.""
This reverts commit b6420e575f.
2021-10-28 17:24:53 +02:00
Daniel Kiss b6420e575f Reland "[ARM] __cxa_end_cleanup should be called instead of _UnwindResume."
This is relanding commit da1d1a0869 .
This patch additionally addresses failures found in buildbots & post review comments.

ARM EHABI[1] specifies the __cxa_end_cleanup to be called after cleanup.
It will call the UnwindResume.
__cxa_begin_cleanup will be called from libcxxabi while __cxa_end_cleanup is never called.
This will trigger a termination when a foreign exception is processed while UnwindResume is called
because the global state will be wrong due to the missing __cxa_end_cleanup call.

Additional test here: D109856
[1] https://github.com/ARM-software/abi-aa/blob/main/ehabi32/ehabi32.rst#941compiler-helper-functions

Reviewed By: logan

Differential Revision: https://reviews.llvm.org/D111703
2021-10-28 16:49:19 +02:00
Neubauer, Sebastian 50d8d963e3 [GlobalISel] Simplify RegBankSelect
Save the instruction list of a block before selecting banks.
This allows to cope with moved instructions, even if they are reordered
or splitted into multiple basic blocks.

Differential Revision: https://reviews.llvm.org/D111223
2021-10-28 10:30:55 +02:00
Michael Liao e6a4ba3aa6 [amdgpu] Handle the case where there is no scavenged register.
- When an unconditional branch is expanded into an indirect branch, if
  there is no scavenged register, an SGPR pair needs spilling to enable
  the destination PC calculation. In addition, before jumping into the
  destination, that clobbered SGPR pair need restoring.
- As SGPR cannot be spilled to or restored from memory directly, the
  spilling/restoring of that SGPR pair reuses the regular SGPR spilling
  support but without spilling it into memory. As that spilling and
  restoring points are fully controlled, we only need to spill that SGPR
  into the temporary VGPR, which needs spilling into its emergency slot.
- The target-specific hook is revised to take additional restore block,
  where the restoring code is filled. After that, the relaxation will
  place that restore block directly before the destination block and
  insert an unconditional branch in any fall-through block into the
  destination block.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D106449
2021-10-27 18:37:27 -04:00
Kerry McLaughlin f01fafdcd4 [SVE][CodeGen] Fix incorrect legalisation of zero-extended masked loads
PromoteIntRes_MLOAD always sets the extension type to `EXTLOAD`, which
results in a sign-extended load. If the type returned by getExtensionType()
for the load being promoted is something other than `NON_EXTLOAD`, we
should instead pass this to getMaskedLoad() as the extension type.

Reviewed By: CarolineConcatto

Differential Revision: https://reviews.llvm.org/D112320
2021-10-27 14:15:41 +01:00
Caroline Concatto 1137b7207d [SelectionDAG] Widening the result of INSERT_SUBVECTOR.
Widens the result and first input vector because they have the same size.
The subvector to be inserted is widened in the operand widen function.

Differential Revision: https://reviews.llvm.org/D112187
2021-10-27 13:52:25 +01:00
Daniel Kiss 894ddba1c9 Revert "[ARM] __cxa_end_cleanup should be called instead of _UnwindResume."
This reverts commit da1d1a0869.
2021-10-27 14:29:35 +02:00
Jay Foad b9e3af124b [LiveInterval] Add RemoveDeadValNo argument to removeSegment(iterator)
Add an optional bool RemoveDeadValNo argument to the
removeSegment(iterator) overload, for consistency with the other
overloads. This gives clients a way to remove dead valnos while also
getting an updated iterator returned (in the manner of vector::erase).

Use this to clean up some inefficient code in
LiveIntervals::repairOldRegInRange. NFC.

Differential Revision: https://reviews.llvm.org/D110560
2021-10-27 09:43:32 +01:00
Daniel Kiss da1d1a0869 [ARM] __cxa_end_cleanup should be called instead of _UnwindResume.
ARM EHABI[1] specifies the __cxa_end_cleanup to be called after cleanup.
It will call the UnwindResume.
__cxa_begin_cleanup will be called from libcxxabi while __cxa_end_cleanup is never called.
This will trigger a termination when a foreign exception is processed while UnwindResume is called
because the global state will be wrong due to the missing __cxa_end_cleanup call.

Additional test here: D109856
[1] https://github.com/ARM-software/abi-aa/blob/main/ehabi32/ehabi32.rst#941compiler-helper-functions

Reviewed By: logan

Differential Revision: https://reviews.llvm.org/D111703
2021-10-27 10:40:00 +02:00
Kazu Hirata c3e698e2f5 [CodeGen, Hexagon] Use MachineBasicBlock::phis (NFC) 2021-10-26 09:01:29 -07:00
Craig Topper d51e3a2139 [LegalizeTypes][TargetLowering] Merge getShiftAmountTyForConstant into TargetLowering::getShiftAmountTy.
getShiftAmountTyForConstant is a special helper that changes the
shift amount to i32 if the type chosen by
TargetLowering::getShiftAmountTy can't represent all possible values.
This is needed to satisfy an assert in SelectionDAG::getNode.

It requires additional consideration to know when this helper should be used.
I'm not sure that we are always using it when we should.

This patch merges the getShiftAmountTyForConstant handling into
TargetLowering::getShiftAmountTy so we don't need to think about it
anymore.

Technically this may slightly increase compile times since the majority
of callers of getShiftAmountTy won't need this. Hopefully, this isn't
an issue in practice.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D112469
2021-10-25 14:06:53 -07:00
Jeremy Morse 4136897bd4 [DebugInfo][InstrRef][NFC] Switch to using DenseMaps and similar
There are a few STL containers hanging around that can become DenseMaps,
SmallVectors and similar. This recovers a modest amount of compile time
performance.

While I'm here, adjust the bit layout of ValueIDNum: this was always
supposed to act like a value type, however it seems that clang doesn't
compile the comparison functions to act that way. Add a uint64_t to a
union that explicitly aliases the bitfields, so that we can compare the
whole value as a single integer.

Differential Revision: https://reviews.llvm.org/D112333
2021-10-25 18:07:17 +01:00
Jeremy Morse 97ddf49e43 [DebugInfo][InstrRef] Recover stack-slot tracking performance
This patch is like D111627 -- instead of calculating IDF for every location
on the stack, only do it for the smallest units of interference, and copy
the PHIs for those units to any aliases.

The test added runs placeMLocPHIs directly, and tests that:
 * A def of the lower 8 bits of a stack slot causes all aliasing regs to
   have PHIs placed,
 * It doesn't cause the equivalent location to x86's $ah, which isn't
   aliased, to have a PHI placed.

Differential Revision: https://reviews.llvm.org/D112324
2021-10-25 17:31:09 +01:00
Danila Malyutin 7b102fcc91 [CodeGen] Fix dependence breaking for tied operands
Differential Revision: https://reviews.llvm.org/D107582
2021-10-25 18:52:27 +03:00
Jeremy Morse ee3eee71e4 [DebugInfo][InstrRef] Track values fused into stack spills
During register allocation, some instructions can have stack spills fused
into them. It means that when vregs are allocated on the stack we can
convert:

    SETCCr %0
    DBG_VALUE %0

to

    SETCCm %stack.0
    DBG_VALUE %stack.0

Unfortunately instruction referencing finds this harder: a store to the
stack doesn't have a specific operand number, therefore we don't substitute
the old operand for a new operand, and the location is dropped. This patch
implements a solution: just recognise the memory operand attached to an
instruction with a Special Number (TM), and record a substitution between
the old value and the new one.

This patch adds substitution code to InlineSpiller to record such fused
spills, and tracking in InstrRefBasedLDV to recognise such values, and
produce the value numbers for them. Everything to do with the movement of
stack-defined values is already handled in InstrRefBasedLDV.

Differential Revision: https://reviews.llvm.org/D111317
2021-10-25 15:14:53 +01:00
Jeremy Morse 2eb96e1711 [DebugInfo][NFC] Avoid a use-after-free
This patch swaps two lines -- the CurSucc reference can be invalidated
by the call to DFS.push_back, therefore that should happen last. The
usual hat-tip to asan for catching this.

This patch also swaps an ealier call to ToAdd.insert and DFS.push_back,
where a stable iterator (from successors()) is being used. This isn't
strictly necessary, but is good for consistency and avoiding readers
asking themselves why the two code portions have a different order.
2021-10-25 14:16:30 +01:00
Sanjay Patel 6e46b66e2a [DAGCombiner] make matching bit-hack form of usubsat more flexible
(i8 X ^ 128) & (i8 X s>> 7) --> usubsat X, 128

As suggested in D112085, we can substitute 'xor' with 'add'
in this pattern, and it is logically equivalent:
https://alive2.llvm.org/ce/z/eJtWWC

We canonicalize to 'xor' in IR, but SDAG does not do that
(and it probably should not - https://llvm.org/PR52267 ), so
it is possible to see either pattern in codegen. Note that
'sub' is a another potential pattern, but that is
canonicalized to 'add' in DAGCombiner, so we don't need to
worry about that variation.

Differential Revision: https://reviews.llvm.org/D112377
2021-10-25 09:01:52 -04:00
Tim Northover f9089accba CodeGenPrep: remove all copies of GEP from list if there are duplicates.
Unfortunately ToT has changed enough from the revision where this actually
caused problems that the test no longer triggers an assertion failure.
2021-10-25 14:00:02 +01:00
Kazu Hirata 4bd46501c3 Use llvm::any_of and llvm::none_of (NFC) 2021-10-24 17:35:33 -07:00
Kazu Hirata 1c35973c77 [llvm] Call *(Set|Map)::erase directly (NFC)
We can erase an item in a set or map without checking its membership
first.
2021-10-24 09:32:59 -07:00
Kazu Hirata d8e4170b0a Ensure newlines at the end of files (NFC) 2021-10-23 08:45:29 -07:00
Kazu Hirata d14d7068b6 [llvm] Use StringRef::contains (NFC) 2021-10-23 08:45:27 -07:00
Jay Foad 2915889d74 [ScheduleDAGInstrs] Call adjustSchedDependency in more cases
This removes a condition and the corresponding FIXME comment, because
the Hexagon assertion it refers to has apparently been fixed, probably
by D76134.

NFCI. This just gives targets the opportunity to adjust latencies that
were set to 0 by the generic code because they involve "implicit pseudo"
operands.

Differential Revision: https://reviews.llvm.org/D112306
2021-10-22 20:03:29 +01:00
Jeremy Morse e7084ceab3 [DebugInfo][Instr] Track subregisters across stack spills/restores
Sometimes we generate code that writes to a subregister, then spills /
restores a super-register to the stack, for example:

    $eax = MOV32ri 0
    MOV64mr $rsp, 1, $noreg, 16, $noreg, $rax
    $rcx = MOV64rm $rsp, 1, $noreg, 8, $noreg

This patch takes a different approach: it adds another index to
MLocTracker that identifies a size/offset within a stack slot. A location
on the stack is then a pari of {FrameIndex, SlotNum}. Spilling and
restoring now involves pairing up the src/dest register numbers, and the
dest/src stack position to be transferred to/from. Location coverage
improves as a result, compile-time performance decreases, alas.

One limitation is that if a PHI occurs inside a stack slot:

    DBG_PHI %stack.0, 1

We don't know how large the resulting value is, and so might have
difficulty picking which value to use. DBG_PHI might need to be augmented
in the future with such a size.

Unit tests added ensure that spills and restores correctly transfer to
positions in the Location => Value map, and that different register classes
written to the stack will correctly clobber all other positions in the
stack slot.

Differential Revision: https://reviews.llvm.org/D112133
2021-10-22 19:20:55 +01:00
Craig Topper 93139a3c32 [LegalizeTypes] Only expand CTLZ/CTTZ/CTPOP during type promotion if the new type is legal.
We might be promoting a large non-power of 2 type and the new type
may need to be split. Once we split it we may have a ctlz/cttz/ctpop
instruction for the split type.

I'm also concerned that we may create large shifts with shift amounts
that are too small.
2021-10-22 11:02:35 -07:00
Simon Pilgrim a5f56342b0 [DAG] narrowExtractedVectorLoad - EXTRACT_SUBVECTOR indices are always constant
EXTRACT_SUBVECTOR indices are always constant, we don't need to check for ConstantSDNode, we should just use getConstantOperandVal which will assert for the constant.
2021-10-22 18:32:14 +01:00
Jeremy Morse d9eebe3cd7 [DebugInfo][InstrRef] Add unit tests for transfer-function building
This patch adds some unit tests for the machine-location transfer-function
building parts of InstrRefBasedLDV: i.e., test that if we feed some MIR
into the transfer-function building code, does it create the correct
transfer function.

There are a number of minor defects that get corrected in the process:
 * The unit test was selecting the x86 (i.e. 32 bit) backend rather than
   x86_64's 64 bit backend,
 * COPY instructions weren't actually having their subregister values
   correctly represented in the transfer function. Subregisters were being
   defined by the COPY, rather than taking the value in the source register.
 * SP aliases were at risk of being clobbered, if an SP subregister was
   clobbered.

Differential Revision: https://reviews.llvm.org/D112006
2021-10-22 18:29:03 +01:00
Craig Topper 04c184bba7 [TargetLowering] Simplify the interface of expandABS. NFC
Instead of returning a bool to indicate success and a separate
SDValue, return the SDValue and have the callers check if it is
null.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D112331
2021-10-22 10:22:23 -07:00
Craig Topper 0766aef3f3 [LegalizeTypes][RISCV][PowerPC] Expand CTLZ/CTTZ/CTPOP instead of promoting if they'll be expanded later.
Expanding these requires multiple constants. If we promote during type
legalization when they'll end up getting expanded in LegalizeDAG, we'll
use larger constants. These constants may be harder to materialize.
For example, 64-bit constants on 64-bit RISCV are very expensive.

This is similar to what has already been done to BSWAP and BITREVERSE.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D112268
2021-10-22 09:10:01 -07:00
Zarko Todorovski 0bd6a9f2d1 [clang/llvm] Inclusive language: replace segregate with separate 2021-10-22 09:59:35 -04:00
Craig Topper 996123e5e8 [TargetLowering] Simplify the interface for expandCTPOP/expandCTLZ/expandCTTZ.
There is no need to return a bool and have an SDValue output
parameter. Just return the SDValue and let the caller check if it
is null.

I have another patch to add more callers of these so I thought
I'd clean up the interface first.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D112267
2021-10-21 15:35:28 -07:00
Craig Topper ff37b1105d [LegalizeVectorOps][X86] Don't defer BITREVERSE expansion to LegalizeDAG.
By expanding early it allows the shifts to be custom lowered in
LegalizeVectorOps. Then a DAG combine is able to run on them before
LegalizeDAG handles the BUILD_VECTORS for the masks used.

v16Xi8 shift lowering on X86 requires a mask to be applied to a v8i16
shift. The BITREVERSE expansion applied an AND mask before SHL ops and
after SRL ops. This was done to share the same mask constant for both shifts.
It looks like this patch allows DAG combine to remove the AND mask added
after v16i8 SHL by X86 lowering. This maintains the mask sharing that
BITREVERSE was trying to achieve. Prior to this patch it looks like
we kept the mask after the SHL instead which required an extra constant
pool or a PANDN to invert it.

This is dependent on D112248 because RISCV will end up scalarizing the BSWAP
portion of the BITREVERSE expansion if we don't disable BSWAP scalarization in
LegalizeVectorOps first.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D112254
2021-10-21 15:23:23 -07:00
Craig Topper 458ed5fcc3 [TargetLowering][RISCV] Prevent scalarization of fixed vector bswap.
It's better to do the ands, shifts, ors in the vector domain than
to scalarize it and do those operations on each element.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D112248
2021-10-21 14:34:01 -07:00
Yonghong Song f6811cec84 [DebugInfo] Support typedef with btf_decl_tag attributes
Clang patch ([1]) added support for btf_decl_tag attributes with typedef
types. This patch added llvm support including dwarf generation.
For example, for typedef
   typedef unsigned * __u __attribute__((btf_decl_tag("tag1")));
   __u u;
the following shows llvm-dwarfdump result:
   0x00000033:   DW_TAG_typedef
                   DW_AT_type      (0x00000048 "unsigned int *")
                   DW_AT_name      ("__u")
                   DW_AT_decl_file ("/home/yhs/work/tests/llvm/btf_tag/t.c")
                   DW_AT_decl_line (1)

   0x0000003e:     DW_TAG_LLVM_annotation
                     DW_AT_name    ("btf_decl_tag")
                     DW_AT_const_value     ("tag1")

   0x00000047:     NULL

  [1] https://reviews.llvm.org/D110127

Differential Revision: https://reviews.llvm.org/D110129
2021-10-21 08:42:58 -07:00
Sanjay Patel d2198771e9 [DAGCombiner] fold bit-hack form of usubsat
(i8 X ^ 128) & (i8 X s>> 7) --> usubsat X, 128

I haven't found a generalization of this identity:
https://alive2.llvm.org/ce/z/_sriEQ

Note: I was actually looking at the first form of the pattern in that link,
but that's part of a long chain of potential missed transforms in codegen
and IR....that I hope ends here!

The predicates for when this is profitable are a bit tricky. This version of
the patch excludes multi-use but includes custom lowering (as opposed to
legal only).

On x86 for example, we have custom lowering for some vector types, and that
uses umax and sub. So to enable that fold, we need add use checks to avoid
regressions. Even with legal-only lowering, we could see code with extra
reg move instructions for extra uses, so that constraint would have to be
eased very carefully to avoid penalties.

Differential Revision: https://reviews.llvm.org/D112085
2021-10-21 09:47:19 -04:00
Kerry McLaughlin 0d153df69e [SVE] Fix selection failure when splitting extended masked loads
When splitting a masked load, `GetDependentSplitDestVTs` is used to get the
MemVTs of the high and low parts. If the masked load is extended, this
may return VTs with different element types which are used to create the
high & low masked load instructions.
This patch changes `GetDependentSplitDestVTs` to ensure we return VTs with
the same element type.

Reviewed By: david-arm

Differential Revision: https://reviews.llvm.org/D111996
2021-10-21 13:04:38 +01:00
Arthur Eubanks 6ea7437ca5 [SelectionDAG] Bail out of mergeTruncStores when not optimizing
With unoptimized code, we may see lots of stores and spend too much time in mergeTruncStores.

Fixes PR51827.

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D111596
2021-10-20 16:58:22 -07:00
Jon Roelofs b046eb19b8 [AArch64][GlobalISel] combine (and (or x, c1), c2) => (and x, c2) iff c1 & c2 == 0
https://godbolt.org/z/h8ejrG4hb

rdar://83597585

Differential Revision: https://reviews.llvm.org/D111856
2021-10-20 12:11:52 -07:00
Stanislav Mekhanoshin c80d8a8cea [AMDGPU] MachineLICM cannot hoist VALU
MachineLoop::isLoopInvariant() returns false for all VALU
because of the exec use. Check TII::isIgnorableUse() to
allow hoisting.

That unfortunately results in higher register consumption
since MachineLICM does not adequately estimate pressure.
Therefor I think it shall only be enabled after D107677 even
though it does not depend on it.

Differential Revision: https://reviews.llvm.org/D107859
2021-10-20 11:47:24 -07:00
Itay Bookstein 08ed216000 [IR] Refactor GlobalIFunc to inherit from GlobalObject, Remove GlobalIndirectSymbol
As discussed in:
* https://reviews.llvm.org/D94166
* https://lists.llvm.org/pipermail/llvm-dev/2020-September/145031.html

The GlobalIndirectSymbol class lost most of its meaning in
https://reviews.llvm.org/D109792, which disambiguated getBaseObject
(now getAliaseeObject) between GlobalIFunc and everything else.
In addition, as long as GlobalIFunc is not a GlobalObject and
getAliaseeObject returns GlobalObjects, a GlobalAlias whose aliasee
is a GlobalIFunc cannot currently be modeled properly. Creating
aliases for GlobalIFuncs does happen in the wild (e.g. glibc). In addition,
calling getAliaseeObject on a GlobalIFunc will currently return nullptr,
which is undesirable because it should return the object itself for
non-aliases.

This patch refactors the GlobalIFunc class to inherit directly from
GlobalObject, and removes GlobalIndirectSymbol (while inlining the
relevant parts into GlobalAlias and GlobalIFunc). This allows for
calling getAliaseeObject() on a GlobalIFunc to return the GlobalIFunc
itself, making getAliaseeObject() more consistent and enabling
alias-to-ifunc to be properly modeled in the IR.

I exercised some judgement in the API clients of GlobalIndirectSymbol:
some were 'monomorphized' for GlobalAlias and GlobalIFunc, and
some remained shared (with the type adapted to become GlobalValue).

Reviewed By: MaskRay

Differential Revision: https://reviews.llvm.org/D108872
2021-10-20 10:29:47 -07:00
Fraser Cormack eabf11f9ea [CodeGenPrepare] Avoid a scalable-vector crash in ctlz/cttz
This patch fixes a crash when despeculating ctlz/cttz intrinsics with
scalable-vector types. It is not safe to speculatively get the size of
the vector type in bits in case the vector type is not a fixed-length type. As
it happens this isn't required as vector types are skipped anyway.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D112141
2021-10-20 16:45:55 +01:00
Craig Topper fe1f0de003 [RISCV][WebAssembly][TargetLowering] Allow expandCTLZ/expandCTTZ to rely on CTPOP expansion for vectors.
Our fallback expansion for CTLZ/CTTZ relies on CTPOP. If CTPOP
isn't legal or custom for a vector type we would scalarize the
CTLZ/CTTZ. This is different than CTPOP itself which would use a
vector expansion.

This patch teaches expandCTLZ/CTTZ to rely on the vector CTPOP
expansion instead of scalarizing. To do this I had to add additional
checks to make sure the operations used by CTPOP expansions are all
supported. Some of the operations were already needed for the CTLZ/CTTZ
expansion.

This is a huge improvement to the RISCV which doesn't have a scalar
ctlz or cttz in the base ISA.

For WebAssembly, I've added Custom lowering to keep the scalarizing
behavior. I've also extended the scalarizing to CTPOP.

Differential Revision: https://reviews.llvm.org/D111919
2021-10-20 07:46:41 -07:00
Jeremy Morse 89950ade21 [DebugInfo][InstrRef] Track a single variable at a time
Here's another performance patch for InstrRefBasedLDV: rather than
processing all variable values in a scope at a time, instead, process one
variable at a time. The benefits are twofold:
 * It's easier to reason about one variable at a time in your mind,
 * It improves performance, apparently from increased locality.

The downside is that the value-propagation code gets indented one level
further, plus there's some churn in the unit tests.

Differential Revision: https://reviews.llvm.org/D111799
2021-10-20 15:03:52 +01:00
Sander de Smalen be6c8dc765 [SelectionDAG] Fix getVectorSubVecPointer for scalable subvectors.
When inserting a scalable subvector into a scalable vector through
the stack, the index to store to needs to be scaled by vscale.
Before this patch, that didn't yet happen, so it would generate the
wrong offset, thus storing a subvector to the incorrect address
and overwriting the wrong lanes.

For some insert:
  nxv8f16 insert_subvector(nxv8f16 %vec, nxv2f16 %subvec, i64 2)

The offset was not scaled by vscale:
  orr     x8, x8, #0x4
  st1h    { z0.h }, p0, [sp]
  st1h    { z1.d }, p1, [x8]
  ld1h    { z0.h }, p0/z, [sp]

And is changed to:
  mov x8, sp
  st1h { z0.h }, p0, [sp]
  st1h { z1.d }, p1, [x8, #1, mul vl]
  ld1h { z0.h }, p0/z, [sp]

Differential Revision: https://reviews.llvm.org/D111633
2021-10-20 13:55:24 +01:00
Simon Pilgrim 71e39e3f18 [ADT] Add APInt::isNegatedPowerOf2() helper
Inspired by D111968, provide a isNegatedPowerOf2() wrapper instead of obfuscating code with (-Value).isPowerOf2() patterns, which I'm sure are likely avenues for typos.....

Differential Revision: https://reviews.llvm.org/D111998
2021-10-19 14:38:21 +01:00
Jeremy Morse 849b17949f [DebugInfo][InstrRef] Avoid un-necessary densemap copies and comparisons
This is purely a performance patch: InstrRefBasedLDV used to use three
DenseMaps to store variable values, two for long term storage and one as a
working set. This patch eliminates the working set, and updates the long
term storage in place, thus avoiding two DenseMap comparisons and two
DenseMap assignments, which can be expensive.

Differential Revision: https://reviews.llvm.org/D111716
2021-10-19 11:10:14 +01:00
Jeremy Morse cf033bb2d3 [DebugInfo][NFC] Zero-initialize a class field
This field gets assigned when the relevant object starts being used; but it
remains uninitialized beforehand. This risks introducing hard-to-detect
bugs if something changes, so zero-initialize the field.
2021-10-19 10:24:12 +01:00
Alexandros Lamprineas 04dc68710a [DebugInfo][ARM] Fix incorrect debug information for RWPI accessed globals
When compiling for the RWPI relocation model the debug information is wrong:

* the debug location is described as { DW_OP_addr Var }
  instead of { DW_OP_constNu Var DW_OP_bregX 0 DW_OP_plus }
* the relocation type is R_ARM_ABS32 instead of R_ARM_SBREL32

Differential Revision: https://reviews.llvm.org/D111404
2021-10-18 21:29:46 +01:00
Nikita Popov 54d868991a [ExpandMemCmp] Update CFG before DTU
The applyUpdates() API requires that the CFG is already updated,
so make sure to insert the new terminator first.
2021-10-18 21:49:47 +02:00
Jon Roelofs 1300677f97 [AArch64][GlobalISel] combine and + [la]sr => ubfx
https://godbolt.org/z/h8ejrG4hb

rdar://83597585

Differential Revision: https://reviews.llvm.org/D111839
2021-10-18 10:33:01 -07:00
Kazu Hirata 8568ca789e Use llvm::erase_if (NFC) 2021-10-18 09:33:42 -07:00
Sanjay Patel 2a3cc4d461 [Analysis] add utility function for unary shuffle mask creation
This is NFC-intended for the callers. Posting in case there are
other potential users that I missed.
I would also use this from VectorCombine in a patch for:
https://llvm.org/PR52178 ( D111901 )

Differential Revision: https://reviews.llvm.org/D111891
2021-10-18 09:00:39 -04:00
Jeremy Morse ea970661dc Fix signed/unsigned comparison after b5426ced71
gcc11 warns that this counter causes a signed/unsigned comaprison when it's
later compared with a SmallVector::difference_type. gcc appears to be
correct, clang does not warn one way or the other.
2021-10-18 10:28:52 +01:00
Jay Foad 012248b0bc Remove the verifyAfter mechanism that was replaced by D111397
Differential Revision: https://reviews.llvm.org/D111872
2021-10-18 10:26:46 +01:00
Jay Foad 36deb9a670 Add new MachineFunction property FailsVerification
TargetPassConfig::addPass takes a "bool verifyAfter" argument which lets
you skip machine verification after a particular pass. Unfortunately
this is used in generic code in TargetPassConfig itself to skip
verification after a generic pass, only because some previous target-
specific pass damaged the MIR on that specific target. This is bad
because problems in one target cause lack of verification for all
targets.

This patch replaces that mechanism with a new MachineFunction property
called "FailsVerification" which can be set by (usually target-specific)
passes that are known to introduce problems. Later passes can reset it
again if they are known to clean up the previous problems.

Differential Revision: https://reviews.llvm.org/D111397
2021-10-18 10:26:46 +01:00
Fraser Cormack 3d850d03ae [SelectionDAG] Fix illegal widening of scalable-vector loads
The process of widening simple vector loads attempts to use a load of a
wider vector type if the original load is sufficiently aligned to avoid
memory faults.

However this optimization is only legal when performed on fixed-length
vector types. For scalable vector types this is invalid (unless vscale
happens to be 1).

This patch does increase the likelihood of compiler crashes (from
`FindMemType` failing to find a suitable type) but this now better
matches how widening non-simple loads, insufficiently-aligned loads, and
scalable-vector stores are handled.

Patches will be introduced later by which loads and stores can be
widened on targets with support for masked or predicated operations.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D111885
2021-10-18 10:00:00 +01:00
Bing1 Yu f383c53311 [MachineSink] Compile time improvement for large testcases which has many kill flags
We did a experiment and observed dramatic decrease on compilation time which spent on clearing kill flags.
Before:
Number of BasicBlocks:33357
Number of Instructions:162067
Number of Cleared Kill Flags:32869
Time of handling kill flags(ms):1.607509e+05

After:
Number of BasicBlocks:33357
Number of Instructions:162067
Number of Cleared Kill Flags:32869
Time of handling kill flags:3.987371e+03

Reviewed By: MatzeB

Differential Revision: https://reviews.llvm.org/D111688
2021-10-18 15:44:07 +08:00
Mingming Liu cfd155c41b [SelectionDAG] Fix typo in option help
Reviewed By: MaskRay

Differential Revision: https://reviews.llvm.org/D111867
2021-10-15 11:27:40 -07:00
Ellis Hoag aa80034ab9 [DebugInfo] retainedTypes should not have subprograms
After D80369, the retainedTypes in CU's should not have any subprograms
so we should not handle that case when emitting debug info.

Differential Revision: https://reviews.llvm.org/D111593
2021-10-15 12:42:25 -04:00
Dávid Bolvanský 6678db00e6 [X86] Enable promotion of i16 popcnt (PR52056)
Solves https://bugs.llvm.org/show_bug.cgi?id=52056

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D111507
2021-10-15 15:41:37 +02:00
Kazu Hirata 81e9c90686 [llvm] Use llvm::is_contained (NFC) 2021-10-14 22:44:09 -07:00
Jeremy Morse b5426ced71 [DebugInfo][InstrRef] Place variable-values PHI using LLVM utilities
This patch is very similar to D110173 / a3936a6c19, but for variable
values rather than machine values. This is for the second instr-ref
problem, calculating the correct variable value on entry to each block.
The previous lattice based implementation was broken; we now use LLVMs
existing PHI placement utilities to work out where values need to merge,
then eliminate un-necessary ones through value propagation.

Most of the deletions here happen in vlocJoin: it was trying to pick a
location for PHIs to happen in, badly, leading to an infinite loop in the
MIR test added, where it would repeatedly switch between register
locations. The new approach is simpler: either PHIs can be eliminated, or
they can't, and the location of the value is a different problem.

Various bits and pieces move to the header so that they can be tested in
the unit tests. The DbgValue class grows a "VPHI" kind to represent
variable value PHIS that haven't been eliminated yet.

Differential Revision: https://reviews.llvm.org/D110630
2021-10-14 14:43:43 +01:00
Simon Pilgrim 88487662f7 [Codegen] TargetLowering::getCanonicalIndexType - early out scaled MVT::i8 indices. NFCI.
Avoids unused assignment scan-build warning.
2021-10-14 13:08:40 +01:00
Jeremy Morse e3e1da20d4 Follow up to a3936a6c19, correctly select LiveDebugValues implementation
Some functions get opted out of instruction referencing if they're being
compiled with no optimisations, however the LiveDebugValues pass picks one
implementation and then sticks with it through the rest of compilation.
This leads to a segfault if we encounter a function that doesn't use
instr-ref (because it's optnone, for example), but we've already decided
to use InstrRefBasedLDV which expects to be passed a DomTree.

Solution: keep both implementations around in the pass, and pick whichever
one is appropriate to the current function.
2021-10-14 11:28:53 +01:00
Jeremy Morse fbf269c71e [DebugInfo][InstrRef] Only calculate IDF for reg units
In D110173 we start using the existing LLVM IDF calculator to place PHIs as
we reconstruct an SSA form of machine-code program. Sadly that's slower
than the old (but broken) way, this patch attempts to recover some of that
performance.

The key observation: every time we def a register, we also have to def it's
register units. If we def'd $rax, in the current implementation we
independently calculate PHI locations for {al, ah, ax, eax, hax, rax}, and
they will all have the same PHI positions. Instead of doing that, we can
calculate the PHI positions for {al, ah} and place PHIs for any aliasing
registers in the same positions. Any def of a super-register has to def
the unit, and vice versa, so this is sound. It cuts down the SSA placement
we need to do significantly.

This doesn't work for stack slots, or registers we only ever read, so place
PHIs normally for those. LiveDebugValues choses to ignore writes to SP at
calls, and now have to ignore writes to SP register units too.

Differential Revision: https://reviews.llvm.org/D111627
2021-10-13 16:08:18 +01:00
Jeremy Morse e845ca2ff1 Follow up a3936a6c19 to work around an old compiler bug
Old versions of gcc want template specialisations to happen within the
namespace where the template lives; this is still present in gcc 5.1, which
we officially support, so it has to be worked around.
2021-10-13 13:27:25 +01:00
Jeremy Morse a3936a6c19 [DebugInfo][InstrRef] Use PHI placement utilities for machine locations
InstrRefBasedLDV used to try and determine which values are in which
registers using a lattice approach; however this is hard to understand, and
broken in various ways. This patch replaces that approach with a standard
SSA approach using existing LLVM utilities. PHIs are placed at dominance
frontiers; value propagation then eliminates un-necessary PHIs.

This patch also adds a bunch of unit tests that should cover many of the
weirder forms of control flow.

Differential Revision: https://reviews.llvm.org/D110173
2021-10-13 12:49:04 +01:00
Kerry McLaughlin 1a2e90199f [SVE][CodeGen] Add patterns for ADD/SUB + element count
This patch adds patterns to match the following with INC/DEC:
 - @llvm.aarch64.sve.cnt[b|h|w|d] intrinsics + ADD/SUB
 - vscale + ADD/SUB

For some implementations of SVE, INC/DEC VL is not as cheap as ADD/SUB and
so this behaviour is guarded by the "use-scalar-inc-vl" feature flag, which for SVE
is off by default. There are no known issues with SVE2, so this feature is
enabled by default when targeting SVE2.

Reviewed By: david-arm

Differential Revision: https://reviews.llvm.org/D111441
2021-10-13 11:36:15 +01:00
Heejin Ahn 9261ee32dc [WebAssembly] Make EH work with dynamic linking
This makes Wasm EH work with dynamic linking. So far we were only able
to handle destructors, which do not use any tags or LSDA info.

1. This uses `TargetExternalSymbol` for `GCC_except_tableN` symbols,
   which points to the address of per-function LSDA info. It is more
   convenient to use than `MCSymbol` because it can take additional
   target flags.

2. When lowering `wasm_lsda` intrinsic, if PIC is enabled, make the
   symbol relative to `__memory_base` and generate the `add` node. If
   PIC is disabled, continue to use the absolute address.

3. Make tag symbols (`__cpp_exception` and `__c_longjmp`) undefined in
   the backend, because it is hard to make it work with dynamic
   linking's loading order. Instead, we make all tag symbols undefined
   in the LLVM backend and import it from JS.

4. Add support for undefined tags to the linker.

Companion patches:
- https://github.com/WebAssembly/binaryen/pull/4223
- https://github.com/emscripten-core/emscripten/pull/15266

Reviewed By: sbc100

Differential Revision: https://reviews.llvm.org/D111388
2021-10-12 23:28:27 -07:00
Amara Emerson 5abce56edb [GlobalISel] Add support for constant vector folding of binops in CSEMIRBuilder.
Differential Revision: https://reviews.llvm.org/D111524
2021-10-12 11:31:22 -07:00
Hongtao Yu 098a0d8fbc [CSSPGO] Unblock optimizations with pseudo probe instrumentation part 3.
This patch continues unblocking optimizations that are blocked by pseudo probe instrumentation.

Not exactly like DbgIntrinsics, PseudoProbe intrinsic has other attributes (such as mayread, maywrite, mayhaveSideEffect) that can block optimizations. The issues fixed are:
- Flipped default param of getFirstNonPHIOrDbg API to skip pseudo probes
- Unblocked CSE by avoiding pseudo probe from clobbering memory SSA
- Unblocked induction variable simpliciation
- Allow empty loop deletion by treating probe intrinsic isDroppable
- Some refactoring.

Reviewed By: wenlei

Differential Revision: https://reviews.llvm.org/D110847
2021-10-12 09:44:12 -07:00
Jeremy Morse d9fa186a5c Scatter NDEBUG to fix after 838b4a533e
These "dump" methods call into MachineOperand::dump, which doesn't exist
with NDEBUG, thus we croak. Disable LiveDebugValues dump methods when
NDEBUG is turned on to avoid this.
2021-10-12 17:13:15 +01:00
Jay Foad f7ee21aa32 [TwoAddressInstruction] Remove ad hoc machine verification
With the -early-live-intervals command line flag,
TwoAddressInstructionPass::runOnMachineFunction would call
MachineFunction::verify before returning to check the live intervals.
But there was not much benefit to doing this since -verify-machineinstrs
and LLVM_ENABLE_EXPENSIVE_CHECKS provide a more general way of
scheduling machine verification after every pass.

Also it caused problems on targets like Lanai which are marked as "not
machine verifier clean", since verification would fail for known
target-specific problems which are nothing to do with LiveIntervals.

Differential Revision: https://reviews.llvm.org/D111618
2021-10-12 16:09:18 +01:00
Jeremy Morse 838b4a533e [DebugInfo][NFC] Move LiveDebugValues class to header
This patch shifts the InstrRefBasedLDV class declaration to a header.
Partially because it's already massive, but mostly so that I can start
writing some unit tests for it. This patch also adds the boilerplate for
said unit tests.

Differential Revision: https://reviews.llvm.org/D110165
2021-10-12 16:07:26 +01:00
Yonghong Song 325d000765 [NFC][Attr] rename attribute btf_tag to btf_decl_tag
Per discussion in https://reviews.llvm.org/D111199,
the existing btf_tag attribute will be renamed to
btf_decl_tag. This patch mostly updated the Bitcode and
DebugInfo test cases with new attribute name.

Differential Revision: https://reviews.llvm.org/D111591
2021-10-11 20:57:31 -07:00
Amara Emerson 53ebfa7c5d [AArch64][GlobalISel] Fix combiner assertion in matchConstantOp().
We shouldn't call APInt::getSExtValue() on a >64b value.
2021-10-11 15:55:13 -07:00
Guozhi Wei 6599961c17 [TwoAddressInstructionPass] Improve the SrcRegMap and DstRegMap computation
This patch contains following enhancements to SrcRegMap and DstRegMap:

  1 In findOnlyInterestingUse not only check if the Reg is two address usage,
    but also check after commutation can it be two address usage.

  2 If a physical register is clobbered, remove SrcRegMap entries that are
    mapped to it.

  3 In processTiedPairs, when create a new COPY instruction, add a SrcRegMap
    entry only when the COPY instruction is coalescable. (The COPY src is
    killed)

With these enhancements isProfitableToCommute can do better commute decision,
and finally more register copies are removed.

Differential Revision: https://reviews.llvm.org/D108731
2021-10-11 15:28:31 -07:00
Roman Lebedev 684cbae89a
[KnownBits] Introduce `countMaxActiveBits()` and use it in a few places 2021-10-11 23:36:06 +03:00
Jay Foad edfdce2627 [PHIElimination] Fix accounting for undef uses when updating LiveVariables
PHI elimination updates LiveVariables info as described here:

    // We only need to update the LiveVariables kill of SrcReg if this was the
    // last PHI use of SrcReg to be lowered on this CFG edge and it is not live
    // out of the predecessor. We can also ignore undef sources.

Unfortunately if the last use also happened to be an undef use then it
would fail to update the LiveVariables at all. Fix this by not counting
undef uses in the VRegPHIUse map.

Thanks to Mikael Holmén for the test case!

Differential Revision: https://reviews.llvm.org/D111552
2021-10-11 20:22:47 +01:00
Amara Emerson f95d9c95bb [GlobalISel] Fix the stores of truncates -> wide store combine for non-evenly dividing type sizes.
If the wide store we'd generate is not a multiple of the memory type of the
narrow stores (e.g. s48 and s32), we'd assert. Fix that.
2021-10-09 21:18:20 -07:00
Dávid Bolvanský 943b304848 Fixed some errors detected by PVS Studio 2021-10-09 17:27:41 +02:00
Dávid Bolvanský 3649fb14d1 Fixed some errors detected by PVS Studio 2021-10-09 17:20:04 +02:00
Arthur Eubanks a0a4935182 Make more places that use alignment use uint64_t
Followup to D110451.
2021-10-08 16:35:19 -07:00
Reid Kleckner 89b57061f7 Move TargetRegistry.(h|cpp) from Support to MC
This moves the registry higher in the LLVM library dependency stack.
Every client of the target registry needs to link against MC anyway to
actually use the target, so we might as well move this out of Support.

This allows us to ensure that Support doesn't have includes from MC/*.

Differential Revision: https://reviews.llvm.org/D111454
2021-10-08 14:51:48 -07:00
Amara Emerson 17b89f9daa [GlobalISel] Improve G_UMHULH -> LSHR combine to accept non-uniform constant vectors. 2021-10-08 11:25:26 -07:00
Craig Topper a9700653ab [RegisterScavenging] Use a Twine in a call to report_fatal_error instead of going from std::string to c_str. NFC
The std::string was built on the line above. Might as well just
build it as a Twine in the call.
2021-10-08 11:04:08 -07:00
Bradley Smith 7c68d4b8ff Revert "[SelectionDAG] Remove PromoteIntOp_EXTRACT_SUBVECTOR."
This reverts commit 3e8d2008f7.

The code removed in this commit is actually required for extracting
fixed types from illegal scalable types, hence this commit causes
assertion failures in such extracts.
2021-10-08 14:53:26 +00:00
Mirko Brkusanin d20840c937 [GlobalISel] Combine for eliminating redundant operand negations
Differential Revision: https://reviews.llvm.org/D111319
2021-10-08 14:29:22 +02:00
Amara Emerson 72ce310bf0 [GlobalISel][IRTranslator] Fix a use-after-free bug when translating trap-func-name traps.
This was using MachineFunction::createExternalSymbolName() before, which seems
reasonable, but in fact this is freed before the asm emitter which tries to access
the function name string. Switching it to use the string returned by the attribute
seems to fix the problem.
2021-10-07 23:51:37 -07:00
Amara Emerson 08b3c0d995 [GlobalISel] Combine G_UMULH x, (1 << c)) -> x >> (bitwidth - c)
In order to not generate an unnecessary G_CTLZ, I extended the constant folder
in the CSEMIRBuilder to handle G_CTLZ. I also added some extra handing of
vector constants too. It seems we don't have any support for doing constant
folding of vector constants, so the tests show some other useless G_SUB
instructions too.

Differential Revision: https://reviews.llvm.org/D111036
2021-10-07 23:51:37 -07:00
Jay Foad b84d9d299e [TargetPassConfig] Remove an obsolete FIXME comment
The "coloring with register" functionality it refers to was removed ten
years ago in svn r144481 "Remove the -color-ss-with-regs option".
2021-10-08 07:34:25 +01:00
Itay Bookstein faa0e2ae76 [SelectionDAG] Fix shift libcall ABI mismatch in shift-amount argument
The shift libcalls have a shift amount parameter of MVT::i32, but
sometimes ExpandIntRes_Shift may be called with a node whose
second operand is a type that is larger than that. This leads to
an ABI mismatch, and for example causes a spurious zeroing of
a register in RV32 for 64-bit shifts. Note that at present regular
shift intstructions already have their shift amount operand adapted
at SelectionDAGBuilder::visitShift time, and funnelled shifts bypass that.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D110508
2021-10-08 09:57:57 +08:00
Wang, Pengfei c236883b6b [X86] Optimize fdiv with reciprocal instructions for half type
Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D110557
2021-10-08 09:41:13 +08:00
Adrian Prantl 9f93f2bfbd Do not emit prologue_end for line 0 locs if there is a non-zero loc present
This change fixes a bug where the compiler generates a prologue_end
for line 0 locs. That is because line 0 is not associated with any
source location, so there should not be a prolgoue_end at a location
that doesn't correspond to a source location.

There were some LLVM tests that were explicitly checking for line 0
prologue_end's as well since I believe that to be incorrect, I had to
change those tests as well.

Patch by Shubham Rastogi!

Differential Revision: https://reviews.llvm.org/D110740
2021-10-07 13:54:28 -07:00
Jay Foad 097339b1ca [TargetPassConfig] Enable machine verification after miscellaneous passes
In a couple of places machine verification was disabled for no apparent
reason, probably just because an "addPass(..., false)" line was cut and
pasted from elsewhere.

After this patch the only remaining place where machine verification is
disabled in the generic TargetPassConfig code, is after addPreEmitPass.
2021-10-07 21:24:50 +01:00
Jay Foad 27c57e791a [TwoAddressInstruction] Enable machine verification after this pass
Differential Revision: https://reviews.llvm.org/D111007
2021-10-07 20:04:51 +01:00
Jay Foad 3ff0a5747d [PHIElimination] Enable machine verification after this pass
Differential Revision: https://reviews.llvm.org/D111006
2021-10-07 20:04:51 +01:00
Jay Foad 3c9dfba189 [PHIElimination] Account for INLINEASM_BR when inserting kills
When PHIElimination adds kills after lowering PHIs to COPYs it knows
that some instructions after the inserted COPY might use the same
SrcReg, but it was only looking at the terminator instructions at the
end of the block, not at other instructions like INLINEASM_BR that can
appear after the COPY insertion point.

Since we have already called findPHICopyInsertPoint, which knows about
INLINEASM_BR, we might as well reuse the insertion point that it
calculated when looking for instructions that might use SrcReg.

This fixes a machine verification failure if you force machine
verification to run after PHIElimination (currently it is disabled for
other reasons) when running
test/CodeGen/X86/callbr-asm-phi-placement.ll.

Differential Revision: https://reviews.llvm.org/D110834
2021-10-07 20:04:39 +01:00
Amara Emerson 8bfc0e06dc [GlobalISel] Port the udiv -> mul by constant combine.
This is a straight port from the equivalent DAG combine.

Differential Revision: https://reviews.llvm.org/D110890
2021-10-07 11:37:17 -07:00
Jay Foad 548b01c7a6 [MIRParser] Add support for IsInlineAsmBrIndirectTarget
Print this basic block flag as inlineasm-br-indirect-target and parse
it. This allows you to write MIR test cases for INLINEASM_BR. The test
case I added is one that I wanted to precommit anyway for D110834.

Differential Revision: https://reviews.llvm.org/D111291
2021-10-07 19:08:01 +01:00
Bradley Smith 5be266db7a [AArch64][SVE] Improve VECTOR_SPLICE codegen for VL > 128-bit
Differential Revision: https://reviews.llvm.org/D111135
2021-10-07 15:28:55 +00:00
Jack Andersen bd4dad87f4 [MachineInstr] Move MIParser's DBG_VALUE RegState::Debug invariant into MachineInstr::addOperand
Based on the reasoning of D53903, register operands of DBG_VALUE are
invariably treated as RegState::Debug operands. This change enforces
this invariant as part of MachineInstr::addOperand so that all passes
emit this flag consistently.

RegState::Debug is inconsistently set on DBG_VALUE registers throughout
LLVM. This runs the risk of a filtering iterator like
MachineRegisterInfo::reg_nodbg_iterator to process these operands
erroneously when not parsed from MIR sources.

This issue was observed in the development of the llvm-mos fork which
adds a backend that relies on physical register operands much more than
existing targets. Physical RegUnit 0 has the same numeric encoding as
$noreg (indicating an undef for DBG_VALUE). Allowing debug operands into
the machine scheduler correlates $noreg with RegUnit 0 (i.e. a collision
of register numbers with different zero semantics). Eventually, this
causes an assert where DBG_VALUE instructions are prohibited from
participating in live register ranges.

Reviewed By: MatzeB, StephenTozer

Differential Revision: https://reviews.llvm.org/D110105
2021-10-07 16:08:52 +01:00
Carl Ritson b5d6ad20e1 [MachineCopyPropagation] Handle propagation of undef copies
When propagating undefined copies the undef flag must also be
propagated.

Reviewed By: foad

Differential Revision: https://reviews.llvm.org/D111219
2021-10-07 20:34:27 +09:00
Jay Foad df2d4bc4cb [TwoAddressInstruction] Fix ReplacedAllUntiedUses in processTiedPairs
Fix the calculation of ReplacedAllUntiedUses when any of the tied defs
are early-clobber. The effect of this is to fix the placement of kill
flags on an instruction like this (from @f2 in
 test/CodeGen/SystemZ/asm-18.ll):

  INLINEASM &"stepb $1, $2" [attdialect], $0:[regdef-ec:GRH32Bit], def early-clobber %3:grh32bit, $1:[reguse tiedto:$0], killed %4:grh32bit(tied-def 3), $2:[reguse:GRH32Bit], %4:grh32bit

After TwoAddressInstruction without this patch:

  %3:grh32bit = COPY killed %4:grh32bit
  INLINEASM &"stepb $1, $2" [attdialect], $0:[regdef-ec:GRH32Bit], def early-clobber %3:grh32bit, $1:[reguse tiedto:$0], %3:grh32bit(tied-def 3), $2:[reguse:GRH32Bit], %4:grh32bit

Note that the COPY kills %4, even though there is a later use of %4 in
the INLINEASM. This fails machine verification if you force it to run
after TwoAddressInstruction (currently it is disabled for other
reasons).

After TwoAddressInstruction with this patch:

  %3:grh32bit = COPY %4:grh32bit
  INLINEASM &"stepb $1, $2" [attdialect], $0:[regdef-ec:GRH32Bit], def early-clobber %3:grh32bit, $1:[reguse tiedto:$0], %3:grh32bit(tied-def 3), $2:[reguse:GRH32Bit], %4:grh32bit

Differential Revision: https://reviews.llvm.org/D110848
2021-10-07 10:10:11 +01:00
Mikael Holmen 9bf5d91361 [GlobalISel] Silence gcc warning about unused variable 2021-10-07 07:18:04 +02:00
Itay Bookstein 40ec1c0f16 [IR][NFC] Rename getBaseObject to getAliaseeObject
To better reflect the meaning of the now-disambiguated {GlobalValue,
GlobalAlias}::getBaseObject after breaking off GlobalIFunc::getResolverFunction
(D109792), the function is renamed to getAliaseeObject.
2021-10-06 19:33:10 -07:00
David Blaikie f6a561c4d6 DebugInfo: Use clang's preferred names for integer types
This reverts c7f16ab3e3 / r109694 - which
suggested this was done to improve consistency with the gdb test suite.
Possible that at the time GCC did not canonicalize integer types, and so
matching types was important for cross-compiler validity, or that it was
only a case of over-constrained test cases that printed out/tested the
exact names of integer types.

In any case neither issue seems to exist today based on my limited
testing - both gdb and lldb canonicalize integer types (in a way that
happens to match Clang's preferred naming, incidentally) and so never
print the original text name produced in the DWARF by GCC or Clang.

This canonicalization appears to be in `integer_types_same_name_p` for
GDB and in `TypeSystemClang::GetBasicTypeEnumeration` for lldb.

(I tested this with one translation unit defining 3 variables - `long`,
`long (*)()`, and `int (*)()`, and another translation unit that had
main, and a function that took `long (*)()` as a parameter - then
compiled them with mismatched compilers (either GCC+Clang, or
Clang+(Clang with this patch applied)) and no matter the combination,
despite the debug info for one CU naming the type "long int" and the
other naming it "long", both debuggers printed out the name as "long"
and were able to correctly perform overload resolution and pass the
`long int (*)()` variable to the `long (*)()` function parameter)

Did find one hiccup, identified by the lldb test suite - that CodeView
was relying on these names to map them to builtin types in that format.
So added some handling for that in LLVM. (these could be split out into
separate patches, but seems small enough to not warrant it - will do
that if there ends up needing any reverti/revisiting)

Differential Revision: https://reviews.llvm.org/D110455
2021-10-06 16:02:34 -07:00
Arthur Eubanks 05392466f0 Reland [IR] Increase max alignment to 4GB
Currently the max alignment representable is 1GB, see D108661.
Setting the align of an object to 4GB is desirable in some cases to make sure the lower 32 bits are clear which can be used for some optimizations, e.g. https://crbug.com/1016945.

This uses an extra bit in instructions that carry an alignment. We can store 15 bits of "free" information, and with this change some instructions (e.g. AtomicCmpXchgInst) use 14 bits.
We can increase the max alignment representable above 4GB (up to 2^62) since we're only using 33 of the 64 values, but I've just limited it to 4GB for now.

The one place we have to update the bitcode format is for the alloca instruction. It stores its alignment into 5 bits of a 32 bit bitfield. I've added another field which is 8 bits and should be future proof for a while. For backward compatibility, we check if the old field has a value and use that, otherwise use the new field.

Updating clang's max allowed alignment will come in a future patch.

Reviewed By: hans

Differential Revision: https://reviews.llvm.org/D110451
2021-10-06 13:29:23 -07:00
Arthur Eubanks 569346f274 Revert "Reland [IR] Increase max alignment to 4GB"
This reverts commit 8d64314ffe.
2021-10-06 11:38:11 -07:00
Arthur Eubanks 8d64314ffe Reland [IR] Increase max alignment to 4GB
Currently the max alignment representable is 1GB, see D108661.
Setting the align of an object to 4GB is desirable in some cases to make sure the lower 32 bits are clear which can be used for some optimizations, e.g. https://crbug.com/1016945.

This uses an extra bit in instructions that carry an alignment. We can store 15 bits of "free" information, and with this change some instructions (e.g. AtomicCmpXchgInst) use 14 bits.
We can increase the max alignment representable above 4GB (up to 2^62) since we're only using 33 of the 64 values, but I've just limited it to 4GB for now.

The one place we have to update the bitcode format is for the alloca instruction. It stores its alignment into 5 bits of a 32 bit bitfield. I've added another field which is 8 bits and should be future proof for a while. For backward compatibility, we check if the old field has a value and use that, otherwise use the new field.

Updating clang's max allowed alignment will come in a future patch.

Reviewed By: hans

Differential Revision: https://reviews.llvm.org/D110451
2021-10-06 11:03:51 -07:00
Arthur Eubanks 72cf8b6044 Revert "[IR] Increase max alignment to 4GB"
This reverts commit df84c1fe78.

Breaks some bots
2021-10-06 10:21:35 -07:00
Arthur Eubanks df84c1fe78 [IR] Increase max alignment to 4GB
Currently the max alignment representable is 1GB, see D108661.
Setting the align of an object to 4GB is desirable in some cases to make sure the lower 32 bits are clear which can be used for some optimizations, e.g. https://crbug.com/1016945.

This uses an extra bit in instructions that carry an alignment. We can store 15 bits of "free" information, and with this change some instructions (e.g. AtomicCmpXchgInst) use 14 bits.
We can increase the max alignment representable above 4GB (up to 2^62) since we're only using 33 of the 64 values, but I've just limited it to 4GB for now.

The one place we have to update the bitcode format is for the alloca instruction. It stores its alignment into 5 bits of a 32 bit bitfield. I've added another field which is 8 bits and should be future proof for a while. For backward compatibility, we check if the old field has a value and use that, otherwise use the new field.

Updating clang's max allowed alignment will come in a future patch.

Reviewed By: hans

Differential Revision: https://reviews.llvm.org/D110451
2021-10-06 09:54:14 -07:00
Amara Emerson 79d13bf22c Revert "Revert "[GlobalISel][IRTranslator] Emit trap intrinsic for "unreachable"""
This reverts commit d95cd81141.

Re-land the original patch now that the bug this exposed in selection has been
fixed by 6bc64e24c3
2021-10-06 04:16:19 -07:00
Simon Pilgrim 21661607ca [llvm] Replace report_fatal_error(std::string) uses with report_fatal_error(Twine)
As described on D111049, we're trying to remove the <string> dependency from error handling and replace uses of report_fatal_error(const std::string&) with the Twine() variant which can be forward declared.
2021-10-06 12:04:30 +01:00
David Sherwood 37edb7d3e2 [SVE] Fix incorrect DAG combines when extracting fixed-width from scalable vectors
We were previously silently generating incorrect code when extracting a
fixed-width vector from a scalable vector. This is worse than crashing,
since the user will have no indication that this is currently unsupported
behaviour. I have fixed the code to only perform DAG combines when safe
to do so, i.e. the input and output vectors are both fixed-width or
both scalable.

Test added here:

  CodeGen/AArch64/sve-extract-scalable-vector.ll

Differential revision: https://reviews.llvm.org/D110624
2021-10-06 09:27:44 +01:00
Amara Emerson 6bc64e24c3 [GlobalISel] Clear unreachable blocks' contents after selection.
If these blocks are unreachable, then we can discard all of the instructions.
However, keep the block around because it may have an address taken or the
block may have a stale reference from a PHI somewhere. Instead of finding
those PHIs and fixing them up, just leave the block empty.

Differential Revision: https://reviews.llvm.org/D111201
2021-10-05 23:06:22 -07:00
Simon Pilgrim 2e5daac217 [llvm] Update report_fatal_error calls from raw_string_ostream to use Twine(OS.str())
As described on D111049, we're trying to remove the <string> dependency from error handling and replace uses of report_fatal_error(const std::string&) with the Twine() variant which can be forward declared.

We can use the raw_string_ostream::str() method to perform the implicit flush() and return a reference to the std::string container that we can then wrap inside Twine().
2021-10-05 18:42:12 +01:00
Joe Nash 8f55fdf26c [MacroFusion] Expose useful static methods. NFC.
hasLessThanNumFused and fuseInstructionPair are useful for
DAG mutations similar to MacroFusion, but which cannot use
MacroFusion as a whole (such as fusing non-dependent instruction).

Reviewed By: MatzeB

Differential Revision: https://reviews.llvm.org/D111070

Change-Id: I3a5d56aba0471d45ef64cebb9b724030e2eae2f3
2021-10-05 11:51:48 -04:00
Amara Emerson de5b16d8ca Revert "Revert "Revert "[GlobalISel][IRTranslator] Emit trap intrinsic for "unreachable""""
This reverts commit c93bc508ee.

Seems to break a different thing now.
2021-10-05 08:25:13 -07:00
Jay Foad f65458df32 [PHIElimination] Update LiveVariables after handling an unspillable terminator
Update the LiveVariables analysis after the special handling for
unspillable terminators which was added in D91358. This is just enough
to fix some "Block should not be in AliveBlocks" / "Block missing from
AliveBlocks" errors in the codegen test suite when machine verification
is forced to run after PHIElimination (currently it is disabled).

Differential Revision: https://reviews.llvm.org/D110939
2021-10-05 14:25:53 +01:00
Jeremy Morse e265644b32 [DebugInfo][InstrRef] Track all of DBG_PHIs operands
An important part of the instruction referencing solution is that we
identify all the registers that values move between before we then compute
an SSA-like function from the machine code, and from the variable
intrinsics. DBG_PHIs weren't causing all the subregisters of their operands
to be tracked; this patch forces that to happen.

The practical implications were that not enough space is allocated for
storing values when analysing the function -- asan will crash on the
attached test case with an unpatched compiler. Non-asan llc's will produce
a DBG_VALUE $noreg, where it should be $dil.

Differential Revision: https://reviews.llvm.org/D109064
2021-10-05 14:01:26 +01:00
Mirko Brkusanin 40e00063bc [GlobalISel] Combine fabs(fneg(x)) to fabs(x)
Differential Revision: https://reviews.llvm.org/D110943
2021-10-05 13:43:39 +02:00
Bjorn Pettersson 8ed0e6b2cf [SelectionDAG] Replace error prone index check in BaseIndexOffset::computeAliasing
Deriving NoAlias based on having the same index in two BaseIndexOffset
expressions seemed weird (and as shown in the added unittest the
correctness of doing so depended on undocumented pre-conditions that
the user of BaseIndexOffset::computeAliasing would need to take care
of.

This patch removes the code that dereived NoAlias based on indices
being the same. As a compensation, to avoid regressions/diffs in
various lit test, we also add a new check. The new check derives
NoAlias in case the two base pointers are based on two different
GlobalValue:s (neither of them being a GlobalAlias).

Reviewed By: niravd

Differential Revision: https://reviews.llvm.org/D110256
2021-10-05 12:15:55 +02:00
Bjorn Pettersson 1896fb2cff [SelectionDAG] Assume that a GlobalAlias may alias other global values
This fixes a bug detected in DAGCombiner when using global alias
variables. Here is an example:
  @foo = global i16 0, align 1
  @aliasFoo = alias i16, i16 * @foo
  define i16 @bar() {
    ...
    store i16 7, i16 * @foo, align 1
    store i16 8, i16 * @aliasFoo, align 1
    ...
  }

BaseIndexOffset::computeAliasing would incorrectly derive NoAlias
for the two accesses in the example above, resulting in DAGCombiner
miscompiles.

This patch fixes the problem by a defensive approach letting
BaseIndexOffset::computeAliasing return false, i.e. that the aliasing
couldn't be determined, when comparing two global values and at least
one is a GlobalAlias. In the future we might improve this with a
deeper analysis to look at the aliasee for the GlobalAlias etc. But
that is a bit more complicated considering that we could have
'local_unnamed_addr' and situations with several 'alias' variables.

Fixes PR51878.

Differential Revision: https://reviews.llvm.org/D110064
2021-10-05 12:15:55 +02:00
Jay Foad 0a031f5c88 [GlobalISel] Simplify narrowScalarMul. NFC.
Remove some redundancy because the source and result types of any
multiply are always the same.
2021-10-05 10:53:12 +01:00
Jay Foad 0bd4365445 [LiveIntervals] Fix verification of early-clobbered segments
Enable verification of live intervals immediately after computing them
(when -early-live-intervals is used) and fix a problem that that
provokes: currently the verifier insists that a segment that ends at an
early-clobber slot must be followed by another segment starting at the
same slot. But before TwoAddressInstruction runs, the equivalent
condition is: a segment that ends at an early-clobber slot must have its
last use tied to an early-clobber def. That condition is harder to check
here, so for now just disable this check until tied operands have been
rewritten.

Differential Revision: https://reviews.llvm.org/D111065
2021-10-05 08:17:56 +01:00
Amara Emerson cfef1803dd [GlobalISel] Port over the SelectionDAG stack protector codegen feature.
This is a port of the feature that allows the StackProtector pass to omit
checking code for stack canary checks, and rely on SelectionDAG to do it at a
later stage. The reasoning behind this seems to be to prevent the IR checking
instructions from hindering tail-call optimizations during codegen.

Here we allow GlobalISel to also use that scheme. Doing so requires that we
do some analysis using some factored-out code to determine where to generate
code for the epilogs.

Not every case is handled in this patch since we don't have support for all
targets that exercise different stack protector schemes.

Differential Revision: https://reviews.llvm.org/D98200
2021-10-04 21:33:44 -07:00
Amara Emerson c93bc508ee Revert "Revert "[GlobalISel][IRTranslator] Emit trap intrinsic for "unreachable"""
This reverts commit d95cd81141.

The selector sometimes leaves unreachable blocks unselected because it uses a
postorder traversal for the block ordering.

With the trap intrinsics now being emitted, these blocks are no longer empty and
the unselected G_INTRINSIC instructions survive past selection. To fix this,
keep track of which blocks are selected and later delete any blocks that weren't
selected.
2021-10-04 18:10:28 -07:00
Amara Emerson d95cd81141 Revert "[GlobalISel][IRTranslator] Emit trap intrinsic for "unreachable""
This reverts commit 019041bec3.

It broke some bots.
2021-10-04 15:44:52 -07:00
Jeremy Morse e2b838dd91 [DebugInfo][InstrRef] Accept landingpad block arguments
This patch makes instruction-referencing accepts an additional scenario
where values can be read from physical registers at the start of blocks. As
far as I was aware, this only happened:
 * With arguments in the entry block,
 * With constant physical registers,

To which this patch adds a third case:
 * With exception-handling landing-pad blocks

In the attached test: the operand of the dbg.value traces back to the
"landingpad" instruction, which becomes some copies from physregs. Right
now, that's deemed unacceptable, and the assertion fires. The fix is to
just accept this scenario; this is a case where the value in question is
defined by a register and a position, not by an instruction that defines
it. Reading it with a DBG_PHI is the correct behaviour, there isn't a
non-copy instruction that we can refer to.

Differential Revision: https://reviews.llvm.org/D109005
2021-10-04 23:03:02 +01:00
Amara Emerson 8bde5e58c0 Delay outgoing register assignments to last.
The delayed stack protector feature which is currently used for SDAG (and thus
allows for more commonly generating tail calls) depends on being able to extract
the tail call into a separate return block. To do this it also has to extract
the vreg->physreg copies that set up the call's arguments, since if it doesn't
then the call inst ends up using undefined physregs in it's new spliced block.

SelectionDAG implementations can do this because they delay emitting register
copies until  *after* the stack arguments are set up. GISel however just
processes and emits the arguments in IR order, so stack arguments always end up
last, and thus this breaks the code that looks for any register arg copies that
precede the call instruction.

This patch adds a thunk argument to the assignValueToReg() and custom assignment
hooks. For outgoing arguments, register assignments use this return param to
return a thunk that does the actual generating of the copies. We collect these
until all the outgoing stack assignments have been done and then execute them,
so that the copies (and perhaps some artifacts like G_SEXTs) are placed after
any stores.

Differential Revision: https://reviews.llvm.org/D110610
2021-10-04 12:33:20 -07:00
Jay Foad 24688f8fdf Revert "[GlobalISel] Support vectors in LegalizerHelper::narrowScalarMul"
This reverts commit 90da0b9a5a.

It was causing an LLVM_ENABLE_EXPENSIVE_CHECKS buildbot failure.
2021-10-04 20:26:30 +01:00
Amara Emerson dafcbfdaa0 [GlobalISel] Widen G_EXTRACT_VECTOR_ELT using anyext instead of sext.
G_SEXT seems to be unnecessary here, anyext will do.

Differential Revision: https://reviews.llvm.org/D110469
2021-10-04 12:19:19 -07:00
Jay Foad 90da0b9a5a [GlobalISel] Support vectors in LegalizerHelper::narrowScalarMul
Also remove some redundancy because the source and result
types of any multiply are always the same.

Differential Revision: https://reviews.llvm.org/D110926
2021-10-04 19:33:38 +01:00
Amara Emerson 019041bec3 [GlobalISel][IRTranslator] Emit trap intrinsic for "unreachable"
We were previously just ignoring unreachable, but targets like Darwin want to
keep unreachable instructions as traps.

Differential Revision: https://reviews.llvm.org/D110603
2021-10-04 11:02:29 -07:00
Jay Foad a9bceb2b05 [APInt] Stop using soft-deprecated constructors and methods in llvm. NFC.
Stop using APInt constructors and methods that were soft-deprecated in
D109483. This fixes all the uses I found in llvm, except for the APInt
unit tests which should still test the deprecated methods.

Differential Revision: https://reviews.llvm.org/D110807
2021-10-04 08:57:44 +01:00
Kazu Hirata d34cd75d89 [Analysis, CodeGen] Migrate from arg_operands to args (NFC)
Note that arg_operands is considered a legacy name.  See
llvm/include/llvm/IR/InstrTypes.h for details.
2021-10-03 08:22:20 -07:00
Takafumi Arakaki e8806d7486 Re-apply the fix on DwarfEHPrepare and add a test
This patch re-introduces the fix in the commit https://github.com/llvm/llvm-project/commit/66b0cebf7f736 by @yrnkrn

> In DwarfEHPrepare, after all passes are run, RewindFunction may be a dangling
>
> pointer to a dead function. To make sure it's valid, doFinalization nullptrs
> RewindFunction just like the constructor and so it will be found on next run.
>
> llvm-svn: 217737

It seems that the fix was not migrated to `DwarfEHPrepareLegacyPass`.

This patch also updates `llvm/test/CodeGen/X86/dwarf-eh-prepare.ll` to include `-run-twice` to exercise the cleanup. Without this patch `llvm-lit -v llvm/test/CodeGen/X86/dwarf-eh-prepare.ll` fails with

```
-- Testing: 1 tests, 1 workers --
FAIL: LLVM :: CodeGen/X86/dwarf-eh-prepare.ll (1 of 1)
******************** TEST 'LLVM :: CodeGen/X86/dwarf-eh-prepare.ll' FAILED ********************
Script:
--
: 'RUN: at line 1';   /home/arakaki/build/llvm-project/main/bin/opt -mtriple=x86_64-linux-gnu -dwarfehprepare -simplifycfg-require-and-preserve-domtree=1 -run-twice < /home/arakaki/repos/watch/llvm-project/llvm/test/CodeGen/X86/dwarf-eh-prepare.ll -S | /home/arakaki/build/llvm-project/main/bin/FileCheck /home/arakaki/repos/watch/llvm-project/llvm/test/CodeGen/X86/dwarf-eh-prepare.ll
--
Exit Code: 2

Command Output (stderr):
--
Referencing function in another module!
  call void @_Unwind_Resume(i8* %ehptr) #1
; ModuleID = '<stdin>'
void (i8*)* @_Unwind_Resume
; ModuleID = '<stdin>'
in function simple_cleanup_catch
LLVM ERROR: Broken function found, compilation aborted!
PLEASE submit a bug report to https://bugs.llvm.org/ and include the crash backtrace.
Stack dump:
0.      Program arguments: /home/arakaki/build/llvm-project/main/bin/opt -mtriple=x86_64-linux-gnu -dwarfehprepare -simplifycfg-require-and-preserve-domtree=1 -run-twice -S
1.      Running pass 'Function Pass Manager' on module '<stdin>'.
2.      Running pass 'Module Verifier' on function '@simple_cleanup_catch'
 #0 0x000056121b570a2c llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) /home/arakaki/repos/watch/llvm-project/llvm/lib/Support/Unix/Signals.inc:569:0
 #1 0x000056121b56eb64 llvm::sys::RunSignalHandlers() /home/arakaki/repos/watch/llvm-project/llvm/lib/Support/Signals.cpp:97:0
 #2 0x000056121b56f28e SignalHandler(int) /home/arakaki/repos/watch/llvm-project/llvm/lib/Support/Unix/Signals.inc:397:0
 #3 0x00007fc7e9b22980 __restore_rt (/lib/x86_64-linux-gnu/libpthread.so.0+0x12980)
 #4 0x00007fc7e87d3fb7 raise /build/glibc-S7xCS9/glibc-2.27/signal/../sysdeps/unix/sysv/linux/raise.c:51:0
 #5 0x00007fc7e87d5921 abort /build/glibc-S7xCS9/glibc-2.27/stdlib/abort.c:81:0
 #6 0x000056121b4e1386 llvm::raw_svector_ostream::raw_svector_ostream(llvm::SmallVectorImpl<char>&) /home/arakaki/repos/watch/llvm-project/llvm/include/llvm/Support/raw_ostream.h:674:0
 #7 0x000056121b4e1386 llvm::report_fatal_error(llvm::Twine const&, bool) /home/arakaki/repos/watch/llvm-project/llvm/lib/Support/ErrorHandling.cpp:114:0
 #8 0x000056121b4e1528 (/home/arakaki/build/llvm-project/main/bin/opt+0x29e3528)
 #9 0x000056121adfd03f llvm::raw_ostream::operator<<(llvm::StringRef) /home/arakaki/repos/watch/llvm-project/llvm/include/llvm/Support/raw_ostream.h:218:0
FileCheck error: '<stdin>' is empty.
FileCheck command line:  /home/arakaki/build/llvm-project/main/bin/FileCheck /home/arakaki/repos/watch/llvm-project/llvm/test/CodeGen/X86/dwarf-eh-prepare.ll

--

********************
********************
Failed Tests (1):
  LLVM :: CodeGen/X86/dwarf-eh-prepare.ll

Testing Time: 0.22s
  Failed: 1
```

Reviewed By: loladiro

Differential Revision: https://reviews.llvm.org/D110979
2021-10-02 21:50:35 -04:00
Simon Pilgrim df672f66b6 [DAG] scalarizeExtractedVectorLoad - replace getABITypeAlign with allowsMemoryAccess (PR45116)
One of the cases identified in PR45116 - we don't need to limit extracted loads to ABI alignment, we can use allowsMemoryAccess - which tests using getABITypeAlign, but also checks if a target permits (fast) misaligned memory loads by checking allowsMisalignedMemoryAccesses as a fallback.

I've also cleaned up the alignment calculation code - if we have a constant extraction index then the alignment can be based on an offset from the original vector load alignment, but for non-constant indices we should assume the worst (single element alignment only).

Differential Revision: https://reviews.llvm.org/D110486
2021-10-01 21:07:34 +01:00
Jay Foad dff3454bda [TwoAddressInstruction] Tweak constraining of tied operands
In collectTiedOperands, when handling an undef use that is tied to a
def, constrain the dst reg with the actual register class of the src
reg, instead of with the register class from the instructions's
MCInstrDesc. This makes a difference in some AMDGPU test cases like
this, before:

  %16:sgpr_96 = INSERT_SUBREG undef %15:sgpr_96_with_sub0_sub1(tied-def 0), killed %11:sreg_64_xexec, %subreg.sub0_sub1

After, without this patch:

  undef %16.sub0_sub1:sgpr_96 = COPY killed %11:sreg_64_xexec

This fails machine verification if you force it to run after
TwoAddressInstruction (currently it is disabled) with:

*** Bad machine code: Invalid register class for subregister index ***
- function:    s_load_constant_v3i32_align4
- basic block: %bb.0  (0xa011a88)
- instruction: undef %16.sub0_sub1:sgpr_96 = COPY killed %11:sreg_64_xexec
- operand 0:   undef %16.sub0_sub1:sgpr_96
Register class SGPR_96 does not fully support subreg index 4

After, with this patch:

  undef %16.sub0_sub1:sgpr_96_with_sub0_sub1 = COPY killed %11:sreg_64_xexec

See also svn r159120 which introduced the code to handle tied undef
uses.

Differential Revision: https://reviews.llvm.org/D110944
2021-10-01 20:57:58 +01:00
Jay Foad 31c92d515d [MachineLoopInfo] Enable machine verification after this pass
Enabling this does not show any problems in check-llvm in an
LLVM_ENABLE_EXPENSIVE_CHECKS build.

Differential Revision: https://reviews.llvm.org/D110703
2021-10-01 18:15:57 +01:00
Jay Foad 04787239c9 [LiveVariables] Skip verification of kills inside bundles
LiveVariables does not examine the contents of bundles, so
MachineVerifier should not expect it to know about kill flags on
operands of instructions inside a bundle.

With this fix we can enable machine verification after running the
LiveVariables analysis. Doing this does not show any problems in
check-llvm in an LLVM_ENABLE_EXPENSIVE_CHECKS build.

Differential Revision: https://reviews.llvm.org/D110700
2021-10-01 18:15:57 +01:00
Jay Foad 08d41f75d9 [UnreachableMachineBlockElim] Enable machine verification after this pass
Enabling this does not show any problems in check-llvm in an
LLVM_ENABLE_EXPENSIVE_CHECKS build.

Differential Revision: https://reviews.llvm.org/D110697
2021-10-01 18:15:57 +01:00
Jay Foad 2bfe777a45 [ProcessImplicitDefs] Enable machine verification after this pass
Enabling this does not show any problems in check-llvm in an
LLVM_ENABLE_EXPENSIVE_CHECKS build.

Differential Revision: https://reviews.llvm.org/D110695
2021-10-01 18:15:56 +01:00
Jay Foad fd8e99700d [DetectDeadLanes] Enable machine verification after this pass
Machine verification after DetectDeadLanes has been disabled since the
pass was first added in D18427, but I guess this was just due to copy-
and-paste. Enabling it does not show any problems in check-llvm in an
LLVM_ENABLE_EXPENSIVE_CHECKS build.

Differential Revision: https://reviews.llvm.org/D110689
2021-10-01 18:15:56 +01:00
Marcelo Juchem dfb213c2df Fix ambiguous overload build failure
LLVM (llvmorg-14-init) under Debian sid using latest gcc (Debian
10.3.0-9) 10.3.0 fails due to ambiguous overload on operators == and !=:

/root/src/llvm/src/llvm/tools/obj2yaml/elf2yaml.cpp:212:22:
error: ambiguous overload for 'operator!='
(operand types are 'llvm::ELFYAML::ELF_SHF' and 'int')

/root/src/llvm/src/llvm/tools/obj2yaml/elf2yaml.cpp:204:32:
error: ambiguous overload for 'operator!='
(operand types are 'const llvm::yaml::Hex64' and 'int')

/root/src/llvm/src/llvm/lib/CodeGen/LiveDebugValues/VarLocBasedImpl.cpp:629:35:
error: ambiguous overload for 'operator=='
(operand types are 'const uint64_t' {aka 'const long unsigned int'} and
'llvm::Register')

Reviewed by: StephenTozer, jmorse, Higuoxing

Differential Revision: https://reviews.llvm.org/D109534
2021-10-01 14:19:57 +01:00
Sander de Smalen b62e6f19d7 [SelectionDAG] Handle promotion + widening in getCopyToPartsVector
Some vectors require both widening and promotion for their legalization.
This case is not yet handled in getCopyToPartsVector and falls back
on scalarizing by default. BBecause scalable vectors can't easily be
scalarised, we need to implement this in two separate stages:
1. Widen the vector.
2. Promote the vector.

As part of this patch, PromoteIntRes_CONCAT_VECTORS also needed to be
made scalable aware. Instead of falling back on scalarizing the vector
(fixed-width only), each sub-part of the CONCAT vector is promoted,
and the operation is performed on the type with the widest element type,
finally truncating the result to the promoted result type.

Differential Revision: https://reviews.llvm.org/D110646
2021-10-01 08:19:47 +01:00
Christopher Tetreault 3077bc90de [NFC] Restore magic and magicu to a globally visible location
While these functions are only used in one location in upstream,
it has been reused in multiple downstreams. Restore this file to
a globally visibile location (outside of APInt.h) to eliminate
donwstream breakage and enable potential future reuse.

Additionally, this patch renames types and cleans up
clang-tidy issues.
2021-09-30 17:43:12 -07:00
Amara Emerson ca8316b704 [GlobalISel] Extend CombinerHelper::matchConstantOp() to match constant splat vectors.
This allows the "x op 0 -> x" fold to optimize vector constant RHSs.

Differential Revision: https://reviews.llvm.org/D110802
2021-09-30 14:31:25 -07:00
Amara Emerson 80f4bb5c61 [GlobalISel] Extend G_SELECT of known condition combine to vectors.
Adds a new utility function: isConstantOrConstantSplatVector().

Differential Revision: https://reviews.llvm.org/D110786
2021-09-30 12:16:44 -07:00
Kazu Hirata f631173d80 [llvm] Migrate from arg_operands to args (NFC)
Note that arg_operands is considered a legacy name.  See
llvm/include/llvm/IR/InstrTypes.h for details.
2021-09-30 08:51:21 -07:00
Brock Wyma bafd8b1add [CodeView] Recognize Fortran95 as Fortran instead of MASM
Map Fortran95 sources to Fortran so the CodeView language is not emitted as
MASM.

Differential Revision: https://reviews.llvm.org/D110330
2021-09-30 09:27:05 -04:00
Jay Foad 156d7d2df7 [LiveIntervals] Remove unused subreg ranges in repairIntervalsInRange
If the old instructions mentioned a subreg that the new instructions do
not, remove the subrange for that subreg.

For example, in TwoAddressInstructionPass::eliminateRegSequence, if a
use operand in the REG_SEQUENCE has the undef flag then we don't
generate a copy for it so after the elimination there should be no live
interval at all for the corresponding subreg of the def.

This is a small step towards switching TwoAddressInstructionPass over
from LiveVariables to LiveIntervals. Currently this path is only tested
if you explicitly enable -early-live-intervals.

Differential Revision: https://reviews.llvm.org/D110542
2021-09-30 09:15:10 +01:00
Sander de Smalen 6709b193ea [SelectionDAG] Make WidenVecRes_EXTRACT_SUBVECTOR work for scalable vectors.
The legalizer handles this by breaking up an EXTRACT_SUBVECTOR into
smaller parts, and combines those together, padding the result with
UNDEF vectors, e.g.

  nxv6i64 extract_subvector(nxv12i64, 6)
  <->
  nxv8i64 concat(
    nxv2i64 extract_subvector(nxv16i64, 6)
    nxv2i64 extract_subvector(nxv16i64, 8)
    nxv2i64 extract_subvector(nxv16i64, 10)
    nxv2i64 undef)

Reviewed By: frasercrmck, david-arm

Differential Revision: https://reviews.llvm.org/D110253
2021-09-29 11:33:45 +01:00
Jay Foad 27179b39f9 [RemoveRedundantDebugValues] Enable machine verification after this pass
Machine verification after RemoveRedundantDebugValues has been disabled
since the pass was first added in D105279, but I guess this was just due
to copy-and-paste. Enabling it does not show any problems in check-llvm
in an LLVM_ENABLE_EXPENSIVE_CHECKS build.

Differential Revision: https://reviews.llvm.org/D110688
2021-09-29 10:44:35 +01:00
Itay Bookstein 7255ce30e4 [SelectionDAG] Fix incorrect condition for shift amount truncation
Comment says:
  // If the operand is larger than the shift count type but the shift
  // count type has enough bits to represent any shift value ...

It clearly talks about the shifted operand, not the shift-amount operand,
but the comparison is performed against Log2_32_Ceil(Op2.getValueSizeInBits())
where Op2 is the shift amount operand. This comparison also doesn't make
sense in the context of the previous one (ShiftsSize > Op2Size) because
Op2Size == Op2.getValueSizeInBits(). Fix to use Op1.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D110509
2021-09-28 17:52:30 -07:00
Jessica Paquette 15a24e1fdb [GlobalISel] Combine mulo x, 2 -> addo x, x
Similar to what SDAG does when it sees a smulo/umulo against 2
(see: `DAGCombiner::visitMULO`)

This pattern is fairly common in Swift code AFAICT.

Here's an example extracted from a Swift testcase:

https://godbolt.org/z/6cT8Mesx7

Differential Revision: https://reviews.llvm.org/D110662
2021-09-28 16:59:43 -07:00
Arthur Eubanks aa53785f23 Reland [clang] Rework dontcall attributes
To avoid using the AST when emitting diagnostics, split the "dontcall"
attribute into "dontcall-warn" and "dontcall-error", and also add the
frontend attribute value as the LLVM attribute value. This gives us all
the information to report diagnostics we need from within the IR (aside
from access to the original source).

One downside is we directly use LLVM's demangler rather than using the
existing Clang diagnostic pretty printing of symbols.

Previous revisions didn't properly declare the new dependencies.

Reviewed By: nickdesaulniers

Differential Revision: https://reviews.llvm.org/D110364
2021-09-28 15:31:30 -07:00
Shoaib Meenai f9b3c18e74 [CodeGen] Fix wrapping personality symbol on ARM
The ARM backend was explicitly setting global binding on the personality
symbol. This was added without any comment in a7ec2dcefd, which
introduced EHABI support (back in 2011). None of the other backends do
anything equivalent, as far as I can tell.

This causes problems when attempting to wrap the personality symbol.
Wrapped symbols are marked as weak inside LTO to inhibit IPO (see
https://reviews.llvm.org/D33621). When we wrap the personality symbol,
it initially gets weak binding, and then the ARM backend attempts to
change the binding to global, which causes an error in MC because of
attempting to change the binding of a symbol from non-global to global
(the error was added in https://reviews.llvm.org/D90108).

Simply drop the ARM backend's explicit global binding setting to fix
this. This matches all the other backends, and a large internal
application successfully linked and ran with this change, so it
shouldn't cause any problems. Test via LLD, since wrapping is required
to exhibit the issue.

Reviewed By: MaskRay

Differential Revision: https://reviews.llvm.org/D110609
2021-09-28 15:01:05 -07:00
Arthur Eubanks 7833d20f1f Revert "[clang] Rework dontcall attributes"
This reverts commit 2943071e2e.

Breaks bots
2021-09-28 14:49:27 -07:00
Arthur Eubanks 2943071e2e [clang] Rework dontcall attributes
To avoid using the AST when emitting diagnostics, split the "dontcall"
attribute into "dontcall-warn" and "dontcall-error", and also add the
frontend attribute value as the LLVM attribute value. This gives us all
the information to report diagnostics we need from within the IR (aside
from access to the original source).

One downside is we directly use LLVM's demangler rather than using the
existing Clang diagnostic pretty printing of symbols.

Reviewed By: nickdesaulniers

Differential Revision: https://reviews.llvm.org/D110364
2021-09-28 14:21:10 -07:00
Jay Foad 845b93e692 [LiveIntervals] Fix another asan debug build failure
Call RemoveMachineInstrFromMaps before erasing instrs.
repairIntervalsInRange will do this for you after erasing the
instruction, but it's not safe to rely on it because assertions in
SlotIndexes::removeMachineInstrFromMaps refer to fields in the erased
instruction.

This fixes asan buildbot failures caused by D110335.
2021-09-28 11:09:38 +01:00
“bhkumarn” 62eeacce17 [DebugInfo] Emit DW_TAG_namelist and DW_TAG_namelist_item
This patch emits DW_TAG_namelist and DW_TAG_namelist_item for fortran
namelist variables. DICompositeType is extended to support this fortran
feature.

Reviewed By: aprantl

Differential Revision: https://reviews.llvm.org/D108553
2021-09-28 14:40:58 +05:30
Jay Foad 20c0280733 [LiveIntervals] Repair subreg ranges in processTiedPairs
In TwoAddressInstructionPass::processTiedPairs, update subranges of the
live interval for RegB as well as the main range.

This is a small step towards switching TwoAddressInstructionPass over
from LiveVariables to LiveIntervals. Currently this path is only tested
if you explicitly enable -early-live-intervals.

Differential Revision: https://reviews.llvm.org/D110526
2021-09-28 08:10:16 +01:00
Jay Foad b2b1a8b833 [LiveIntervals] Improve repair after convertToThreeAddress
After TwoAddressInstructionPass calls
TargetInstrInfo::convertToThreeAddress, improve the LiveIntervals repair
to cope with convertToThreeAddress creating more than one new
instruction.

This mostly seems to benefit X86. For example in
test/CodeGen/X86/zext-trunc.ll it converts:

  %4:gr32 = ADD32rr %3:gr32(tied-def 0), %2:gr32, implicit-def dead $eflags

to:

  undef %6.sub_32bit:gr64 = COPY %3:gr32
  undef %7.sub_32bit:gr64_nosp = COPY %2:gr32
  %4:gr32 = LEA64_32r killed %6:gr64, 1, killed %7:gr64_nosp, 0, $noreg

Differential Revision: https://reviews.llvm.org/D110335
2021-09-28 08:10:08 +01:00
Xiang1 Zhang ebe9944a34 [ISel] Legalized arithmetic.fence.f128 for 32-bits target
Reviewed By: Craig Topper, Wang Pengfei

Differential Revision: https://reviews.llvm.org/D110467
2021-09-28 10:27:25 +08:00
Fraser Cormack e2b46e336b [DAGCombiner][VP] Fold zero-length or false-masked VP ops
This patch adds a generic DAGCombine for vector-predicated (VP) nodes.
Those for which we can determine that no vector element is active can be
replaced by either undef or, for reductions, the start value.

This is tested rather trivially at the IR level, where it's possible
that we want to teach instcombine to perform this optimization.

However, we can also see the zero-evl case arise during SelectionDAG
legalization, when wide VP operations can be split into two and the
upper operation emerges as trivially false.

It's possible that we could perform this optimization "proactively"
(both on legal vectors and before splitting) and reduce the width of an
operation and insert it into a larger undef vector:

```
v8i32 vp_add x, y, mask, 4
->
v8i32 insert_subvector (v8i32 undef), (v4i32 vp_add xsub, ysub, mask, 4), i32 0
```

This is somewhat analogous to similar vector narrow/widening
optimizations, but it's unclear at this point whether that's beneficial
to do this for VP ops for any/all targets.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D109148
2021-09-27 11:30:09 +01:00
Simon Pilgrim 18c8ed5416 [DAG] ReduceLoadOpStoreWidth - replace getABITypeAlign with allowsMemoryAccess (PR45116)
One of the cases identified in PR45116 - we don't need to limit store narrowing to ABI alignment, we can use allowsMemoryAccess - which tests using getABITypeAlign, but also checks if a target permits (fast) misaligned memory access by checking allowsMisalignedMemoryAccesses as a fallback.
2021-09-25 18:35:57 +01:00
Simon Pilgrim 6bd5b1b1ce [DAG] combineShiftToMULH - move getValueType() inside assert. NFCI.
Avoids an unnecessary (void).
2021-09-25 11:56:35 +01:00
David Blaikie 5cb210862b DebugInfo: Use the signedness of the underlying enum when encoding enum non-type-template-parameters
This improves the accuracy of the debug info and improves round tripping
through -gsimple-template-names.
2021-09-24 17:02:55 -07:00
Jay Foad ac51ad24a7 [LiveIntervals] Fix asan debug build failures
Call RemoveMachineInstrFromMaps before erasing instrs.
repairIntervalsInRange will do this for you after erasing the
instruction, but it's not safe to rely on it because assertions in
SlotIndexes::removeMachineInstrFromMaps refer to fields in the erased
instruction.

This fixes asan buildbot failures caused by D110328.
2021-09-24 19:14:57 +01:00
Stanislav Mekhanoshin 08d7eec06e Revert "Allow rematerialization of virtual reg uses"
Reverted due to two distcint performance regression reports.

This reverts commit 92c1fd19ab.
2021-09-24 10:26:11 -07:00
Jay Foad e4e95f14f1 [LiveIntervals] Repair live intervals that gain subranges
In repairIntervalsInRange, if the new instructions refer to subregs but
the old instructions did not, make sure any existing live interval for
the superreg is updated to have subranges. Also skip repairing any range
that we have recalculated from scratch, partly for efficiency but also
to avoids some cases that repairOldRegInRange can't handle.

The existing test/CodeGen/AMDGPU/twoaddr-regsequence.mir provides some
test coverage for this change: when TwoAddressInstructionPass converts
REG_SEQUENCE into subreg copies, the live intervals will now get
subranges and MachineVerifier will verify that the subranges are
correct. Unfortunately MachineVerifier does not complain if the
subranges are not present, so the test also passed before this patch.

This patch also fixes ~800 of the ~1500 failures in the whole CodeGen
lit test suite when -early-live-intervals is forced on.

Differential Revision: https://reviews.llvm.org/D110328
2021-09-24 11:58:08 +01:00
Jay Foad 7863cc6c1c [LiveIntervals] Fix repairOldRegInRange for simple def cases
The fix applied in D23303 "LiveIntervalAnalysis: fix a crash in repairOldRegInRange"
was over-zealous. It would bail out when the end of the range to be
repaired was in the middle of the first segment of the live range of
Reg, which was always the case when the range contained a single def of
Reg.

This patch fixes it as suggested by Matthias Braun in post-commit review
on the original patch, and tests it by adding -early-live-intervals to
a selection of existing lit tests that now pass.

(Note that D23303 was originally applied to fix a crash in
SILoadStoreOptimizer, but that is now moot since D23814 updated
SILoadStoreOptimizer to run before scheduling so it no longer has to
update live intervals.)

Differential Revision: https://reviews.llvm.org/D110238

Unrevert with some changes to the tests:
- Add -verify-machineinstrs to check for remaining problems in live
  interval support in TwoAddressInstructionPass.
- Drop test/CodeGen/AMDGPU/extract-load-i1.ll since it suffers from
  some of those remaining problems.
2021-09-24 11:44:49 +01:00
Amara Emerson 9f773b17c2 [GlobalISel][IRTranslator] Fix crash during bit-test switch optimization with odd types.
Odd switch case types cause a crash in the conversion to MVT. Instead use a pointer sized
scalar type which is what SDAG does in these cases.
2021-09-24 00:19:27 -07:00
Matt Arsenault 2875d3d484 RegAllocGreedy: Remove an unhelpful auto, and don't use a reference 2021-09-23 17:25:25 -04:00
Jay Foad deb2ca566a Revert "[LiveIntervals] Fix repairOldRegInRange for simple def cases"
This reverts commit 8229cb7412.

It was failing on buildbots with expensive checks enabled.
2021-09-23 17:55:05 +01:00
Jay Foad 8229cb7412 [LiveIntervals] Fix repairOldRegInRange for simple def cases
The fix applied in D23303 "LiveIntervalAnalysis: fix a crash in repairOldRegInRange"
was over-zealous. It would bail out when the end of the range to be
repaired was in the middle of the first segment of the live range of
Reg, which was always the case when the range contained a single def of
Reg.

This patch fixes it as suggested by Matthias Braun in post-commit review
on the original patch, and tests it by adding -early-live-intervals to
a selection of existing lit tests that now pass.

(Note that D23303 was originally applied to fix a crash in
SILoadStoreOptimizer, but that is now moot since D23814 updated
SILoadStoreOptimizer to run before scheduling so it no longer has to
update live intervals.)

Differential Revision: https://reviews.llvm.org/D110238
2021-09-23 17:16:14 +01:00
Craig Topper d5c67bba62 [RegAlloc] Cast uint8_t to unsigned before printing it.
raw_ostream interprets uint8_t as wanting to print a character
with that ASCII value. In this case the uint8_t is an integer
that we want to print.
2021-09-23 08:49:44 -07:00
Simon Pilgrim 2a5936faf0 [CodeGen] ProcessSDDbgValues - use const-ref value in for-range loop. NFCI.
Avoid unnecessary copies, reported by MSVC static analyzer.
2021-09-23 12:23:46 +01:00
Simon Pilgrim 5cabe4d9d3 [CodeGen] RegisterCoalescer::buildVRegToDbgValueMap - use const-ref value in for-range loop. NFCI.
Avoid unnecessary copies, reported by MSVC static analyzer.
2021-09-23 12:23:45 +01:00
Fraser Cormack e7c879a69d [RISCV][VP] Add support for VP_REDUCE_* operations
This patch adds codegen support for lowering the vector-predicated
reduction intrinsics to RVV instructions. The process is similar to that
of the other reduction intrinsics, save for the fact that every VP
reduction has a start value. We reuse the existing custom "VL" nodes,
adding extra patterns where required to handle non-true masks.

To support these nodes, the `RISCVISD::VECREDUCE_*_VL` nodes have been
given an explicit "merge" operand. This is to faciliate the VP
reductions, where we must be careful to ensure that even if no operation
is performed (when VL=0) we still produce the start value. The RVV
reductions don't update the destination register under these conditions,
so we tie the splatted start value to the output register.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D107657
2021-09-23 11:11:05 +01:00
Jay Foad 6cef28ed2d [TII] Remove the MFI argument to convertToThreeAddress. NFC.
This simplifies the API and addresses a FIXME in
TwoAddressInstructionPass::convertInstTo3Addr.

Differential Revision: https://reviews.llvm.org/D110229
2021-09-23 08:58:46 +01:00
Bjorn Pettersson c3ae8ecb52 [DAGCombiner] Rename isAlias as mayAlias. NFC
Differential Revision: https://reviews.llvm.org/D110062
2021-09-23 09:54:42 +02:00
Freddy Ye 13207a21a6 [NFC] Remove redundant setOperationAction.
[FROUND,FROUNDEVEN][f32, f64, f128] are set Expand twice.

Differential Revision: https://reviews.llvm.org/D110302
2021-09-23 10:28:21 +08:00
David Green c49611f909 Mark CFG as preserved in TypePromotion and InterleaveAccess passes
Neither of these passes modify the CFG, allowing us to preserve DomTree
and LoopInfo across them by using setPreservesCFG.

Differential Revision: https://reviews.llvm.org/D110161
2021-09-22 18:58:00 +01:00
Daniil Fukalov 1a7b7d7ba2 [NFCI][CodeGen, AArch64] Fix inconsistent TargetCostKind types.
The pass uses different cost kinds to estimate "old" and "interleaved" costs:
default cost kind for all targets override `getInterleavedMemoryOpCost()` is
`TCK_SizeAndLatency`. Although at the moment estimated `TCK_Latency` costs are
equal to `TCK_SizeAndLatency`, (so the change is NFC) it may change in future.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D110100
2021-09-22 20:15:17 +03:00
Hongtao Yu d9b511d8e8 [CSSPGO] Set PseudoProbeInserter as a default pass.
Currenlty PseudoProbeInserter is a pass conditioned on a target switch. It works well with a single clang invocation. It doesn't work so well when the backend is called separately (i.e, through the linker or llc), where user has always to pass -pseudo-probe-for-profiling explictly. I'm making the pass a default pass that requires no command line arg to trigger, but will be actually run depending on whether the CU comes with `llvm.pseudo_probe_desc` metadata.

Reviewed By: wenlei

Differential Revision: https://reviews.llvm.org/D110209
2021-09-22 09:09:48 -07:00
Kazu Hirata 3c557cd7f9 [CodeGen] Remove redundant declaration MIRCanonicalizerID (NFC)
Note that MIRCanonicalizerID is declared in
llvm/include/llvm/CodeGen/Passes.h, which MIRCanonicalizerPass.cpp
includes.

Identified with readability-redundant-declaration.
2021-09-22 08:58:27 -07:00
Sander de Smalen 3e8d2008f7 [SelectionDAG] Remove PromoteIntOp_EXTRACT_SUBVECTOR.
This code seems untested and is likely obsolete, because this case
should already be handled by the code that legalizes the result type
of EXTRACT_SUBVECTOR.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D110061
2021-09-22 14:23:35 +01:00
Sander de Smalen d5681f1d68 [SelectionDAG] Add PromoteIntOp_INSERT_SUBVECTOR.
This is required to codegen something like:
  <vscale x 8 x i16> @llvm.experimental.vector.insert(<vscale x 8 x i16> %vec,
                                                      <vscale x 2 x i16> %subvec,
                                                      i64 %idx)
where the output vector is legal, but the input vector needs promoting.

It implements this by performing the whole operation on the promoted type,
and then truncating the result.

Reviewed By: david-arm, craig.topper

Differential Revision: https://reviews.llvm.org/D110059
2021-09-22 13:32:36 +01:00
Sander de Smalen 4ca1fbe361 [SelectionDAG] Make WidenVecRes_Convert work for scalable vectors.
Most of the code wasn't yet scalable safe, although most of the
code conceptually just works for scalable vectors. This change
makes the algorithm work on ElementCount, where appropriate,
and leaves the fixed-width only code to use `getFixedNumElements`.

Reviewed By: david-arm

Differential Revision: https://reviews.llvm.org/D110058
2021-09-22 10:58:38 +01:00
Arthur Eubanks e42234383e Make DiagnosticInfoResourceLimit's limit param required
And always print it.

This makes some LLVM diagnostics match up better with Clang's diagnostics.

Updated some AMDGPU uses of DiagnosticInfoResourceLimit and now we print
better diagnostics for those.

Reviewed By: dblaikie

Differential Revision: https://reviews.llvm.org/D110204
2021-09-21 15:27:58 -07:00
Craig Topper aeb63d464f [RISCV] Teach RISCVTargetLowering::shouldSinkOperands to sink splats for and/or/xor.
This requires a minor change to CodeGenPrepare to ensure that
shouldSinkOperands will be called for And.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D110106
2021-09-21 10:07:29 -07:00
Michael Liao 5fb3ae525f [SelectionDAG] Re-calculate scoped AA metadata when merging stores.
Reviewed By: jeroen.dobbelaere

Differential Revision: https://reviews.llvm.org/D102821
2021-09-21 11:41:17 -04:00
Aleksandr Bezzubikov 624e4d087e [GlobalISel] Support ConstantAsMetadata in IRTranslator
When using instructions which have a MetadataAsValue argument
(e.g. some target-specific intrinsics) MD canonicalization strips
internal MDNodes with a single ConstantAsMetadata child. That
prevented IRTranslator from the proper translation of such a calls.
2021-09-21 11:24:56 -04:00
Simon Pilgrim 20b58855e0 [CodeGen] SelectionDAGBuilder - Use const-ref iterator in for-range loops. NFCI.
Avoid unnecessary copies, reported by MSVC static analyzer.
2021-09-21 13:01:08 +01:00
Simon Pilgrim 0f83456cf5 [CodeGen] SDDbgValue::getSDNodes() - use const-ref to avoid unnecessary copies. NFCI.
Reported by MSVC static analyzer.
2021-09-21 13:01:08 +01:00
Petar Avramovic 8bc7185668 GlobalISel/Utils: Refactor constant splat match functions
Add generic helper function that matches constant splat. It has option to
match constant splat with undef (some elements can be undef but not all).
Add util function and matcher for G_FCONSTANT splat.

Differential Revision: https://reviews.llvm.org/D104410
2021-09-21 12:09:35 +02:00
Amara Emerson 7091a7f781 [GlobalISel][Legalizer] Don't use eraseFromParentAndMarkDBGValuesForRemoval() for some artifacts.
For artifacts excluding G_TRUNC/G_SEXT, which have IR counterparts, we don't
seem to have debug users of defs. However, in the legalizer we're always calling
MachineInstr::eraseFromParentAndMarkDBGValuesForRemoval() which is expensive.
In some rare cases, this contributes significantly to unreasonably long compile
times when we have lots of artifact combiner activity.

To verify this, I added asserts to that function when it actually replaced a debug
use operand with undef for these artifacts. On CTMark with both -O0 and -Os and
debug info enabled, I didn't see a single case where it triggered.

In my measurements I saw around a 0.5% geomean compile-time improvement on -g -O0
for AArch64 with this change.

Differential Revision: https://reviews.llvm.org/D109750
2021-09-20 23:34:42 -07:00
Amara Emerson f9d69a0ab0 [GlobalISel] Implement support for the "trap-func-name" attribute.
This attribute calls a function instead of emitting a trap instruction.

Differential Revision: https://reviews.llvm.org/D110098
2021-09-20 14:32:01 -07:00
Petar Avramovic e4c46ddd91 [GlobalISel] Improve elimination of dead instructions in legalizer
Add eraseInstr(s) utility functions. Before deleting an instruction
collects its use instructions. After deletion deletes use instructions
that became trivially dead.
This patch clears all dead instructions in existing legalizer mir tests.

Differential Revision: https://reviews.llvm.org/D109154
2021-09-20 13:00:58 +02:00
Kazu Hirata 84b07c9b3a [llvm] Use pop_back_val (NFC) 2021-09-19 13:44:23 -07:00
Kazu Hirata 48719e3b18 [CodeGen] Use make_early_inc_range (NFC) 2021-09-18 09:29:24 -07:00
Kazu Hirata e2febc2ed4 [llvm] Use drop_begin (NFC) 2021-09-17 09:16:40 -07:00
Simon Pilgrim 4af7643470 [CodeGen] LiveDebug - Use const-ref iterator in for-range loop. NFCI.
Avoid unnecessary copies, reported by MSVC static analyzer.
2021-09-17 14:04:54 +01:00
Simon Pilgrim 9e70d4e5f2 [AsmPrinter] DebugLocEntry::dump() - Use const-ref iterator in for-range loop. NFCI.
Avoid unnecessary copies, reported by MSVC static analyzer.
2021-09-17 12:11:54 +01:00
Petar Avramovic d477a7c2e7 GlobalISel/Utils: Refactor integer/float constant match functions
Rework getConstantstVRegValWithLookThrough in order to make it clear if we
are matching integer/float constant only or any constant(default).
Add helper functions that get DefVReg and APInt/APFloat from constant instr
getIConstantVRegValWithLookThrough: integer constant, only G_CONSTANT
getFConstantVRegValWithLookThrough: float constant, only G_FCONSTANT
getAnyConstantVRegValWithLookThrough: either G_CONSTANT or G_FCONSTANT

Rename getConstantVRegVal and getConstantVRegSExtVal to getIConstantVRegVal
and getIConstantVRegSExtVal. These now only match G_CONSTANT as described
in comment.

Relevant matchers now return both DefVReg and APInt/APFloat.

Replace existing uses of getConstantstVRegValWithLookThrough and
getConstantVRegVal with new helper functions. Any constant match is
only required in:
ConstantFoldBinOp: for constant argument that was bit-cast of float to int
getAArch64VectorSplat: AArch64::G_DUP operands can be any constant
amdgpu select for G_BUILD_VECTOR_TRUNC: operands can be any constant

In other places use integer only constant match.

Differential Revision: https://reviews.llvm.org/D104409
2021-09-17 11:22:13 +02:00
Nikita Popov 0fc624f029 [IR] Return AAMDNodes from Instruction::getMetadata() (NFC)
getMetadata() currently uses a weird API where it populates a
structure passed to it, and optionally merges into it. Instead,
we can return the AAMDNodes and provide a separate merge() API.
This makes usages more compact.

Differential Revision: https://reviews.llvm.org/D109852
2021-09-16 21:06:57 +02:00
Kazu Hirata cfc7402419 [llvm] Use drop_begin (NFC) 2021-09-16 08:46:26 -07:00
Doug Gregor a773db7d76 Add a command-line flag to control the Swift extended async frame info.
Introduce a new command-line flag `-swift-async-fp={auto|always|never}`
that controls how code generation sets the Swift extended async frame
info bit. There are three possibilities:

* `auto`: which determines how to set the bit based on deployment target, either
statically or dynamically via `swift_async_extendedFramePointerFlags`.
* `always`: the default, always set the bit statically, regardless of deployment
target.
* `never`: never set the bit, regardless of deployment target.

Patch by Doug Gregor <dgregor@apple.com>

Reviewed By: doug.gregor

Differential Revision: https://reviews.llvm.org/D109392
2021-09-16 06:57:45 -07:00
Konstantin Schwarz d2e66d7fa4 [GlobalISel] Add a combine for and(load , mask) -> zextload
This only handles simple masks, not shifted masks, for now.

Reviewed By: aemerson

Differential Revision: https://reviews.llvm.org/D109357
2021-09-16 10:42:46 +02:00
Sam Parker c98a8a09b5 [HardwareLoops] Loop guard intrinsic to recognise zext
If a loop count was initially represented by a 32b unsigned int in C
then the hardware-loop pass can recognise the loop guard and insert
the llvm.test.set.loop.iterations intrinsic. If this was instead a
unsigned short/char then clang inserts a zext instruction to expand
the loop count to an i32. This patch adds the necessary pattern
matching to enable the use of lvm.test.set.loop.iterations in those
cases.

Patch by: sherwin-dc

Differential Revision: https://reviews.llvm.org/D109631
2021-09-16 08:33:16 +01:00
Alok Kumar Sharma a5b72abc9e [DebugInfo] Enhance DIImportedEntity to accept children entities
New field `elements` is added to '!DIImportedEntity', representing
list of aliased entities.
This is needed to dump optimized debugging information where all names
in a module are imported, but a few names are imported with overriding
aliases.

Reviewed By: dblaikie

Differential Revision: https://reviews.llvm.org/D109343
2021-09-16 10:41:55 +05:30
Ahmed Bougacha 94a2f9cdb6 [GlobalISel] Fix CombinerHelper::isPredecessor for same def/use MI.
The doc comment for isPredecessor says:
  Returns true if \p DefMI precedes \p UseMI or they are the same
  instruction.
And dominates relies on that behavior for its own:
  Returns true if \p DefMI dominates \p UseMI. By definition an
  instruction dominates itself.

Make both statements correct by fixing isPredecessor.
Found by inspection.
2021-09-15 16:45:27 -07:00
Matt Arsenault 87c00878d3 SplitKit: Remove decade old live interval hack
This was trying to fixup broken live intervals coming out of the
coalescer. The verifier is more complete now and no tests seem to fail
without this.
2021-09-15 17:35:59 -04:00
Amara Emerson 5ec1845cad [AArch64][GlobalISel] Add a new reassociation for G_PTR_ADDs.
G_PTR_ADD (G_PTR_ADD X, C), Y) -> (G_PTR_ADD (G_PTR_ADD(X, Y), C)

Improves CTMark -Os on AArch64:

Program            before after  diff
           sqlite3 286932 287024  0.0%
                kc 432512 432508 -0.0%
             SPASS 412788 412764 -0.0%
    pairlocalalign 249460 249416 -0.0%
            bullet 475740 475512 -0.0%
    7zip-benchmark 568864 568356 -0.1%
  consumer-typeset 419088 418648 -0.1%
        tramp3d-v4 367628 367224 -0.1%
          clamscan 383184 382732 -0.1%
            lencod 430028 429284 -0.2%
Geomean difference               -0.1%

Differential Revision: https://reviews.llvm.org/D109528
2021-09-14 23:57:41 -07:00
Matt Arsenault 54d755a034 DAG: Fix incorrect folding of fmul -1 to fneg
The fmul is a canonicalizing operation, and fneg is not so this would
break denormals that need flushing and also would not quiet signaling
nans. Fold to fsub instead, which is also canonicalizing.
2021-09-14 21:25:02 -04:00
Matt Arsenault 4a36e96c3f RegAllocGreedy: Account for reserved registers in num regs heuristic
This simple heuristic uses the estimated live range length combined
with the number of registers in the class to switch which heuristic to
use. This was taking the raw number of registers in the class, even
though not all of them may be available. AMDGPU heavily relies on
dynamically reserved numbers of registers based on user attributes to
satisfy occupancy constraints, so the raw number is highly misleading.

There are still a few problems here. In the original testcase that
made me notice this, the live range size is incorrect after the
scheduler rearranges instructions, since the instructions don't have
the original InstrDist offsets. Additionally, I think it would be more
appropriate to use the number of disjointly allocatable registers in
the class. For the AMDGPU register tuples, there are a large number of
registers in each tuple class, but only a small fraction can actually
be allocated at the same time since they all overlap with each
other. It seems we do not have a query that corresponds to the number
of independently allocatable registers. Relatedly, I'm still debugging
some allocation failures where overlapping tuples seem to not be
handled correctly.

The test changes are mostly noise. There are a handful of x86 tests
that look like regressions with an additional spill, and a handful
that now avoid a spill. The worst looking regression is likely
test/Thumb2/mve-vld4.ll which introduces a few additional
spills. test/CodeGen/AMDGPU/soft-clause-exceeds-register-budget.ll
shows a massive improvement by completely eliminating a large number
of spills inside a loop.
2021-09-14 21:00:29 -04:00
Bjorn Pettersson cd2bff1ef1 [StackColoring] Fix a debug invariance problem
Ignore dbg instructions when collecting stack slot markers. This is
to make sure the coloring is invariant regarding presence of dbg
instructions (even in cases when the dbg instructions might be
badly placed in the input).

Differential Revision: https://reviews.llvm.org/D109758
2021-09-14 19:21:56 +02:00
vnalamot 726b5d3416 [RegScavenger][NFC] Refer to the already initialized local variable for spill slot index
Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D109501
2021-09-13 21:55:33 +05:30
Simon Pilgrim 9db20822f7 [APInt] Add APIntOps::ScaleBitMask helper
APInt is used to describe a bit mask in a variety of value tracking and demanded bits/elts functions.

When traversing through dst/src operands, we have a number of places where these masks need to widened/narrowed to translate through bitcasts, reductions etc. to a different type.

This patch add a APIntOps::ScaleBitMask common helper, adds unit test coverage, and updates a number of cases to use the the helper instead of their own implementation.

This came up on D109065 where we currently have to add yet another implementation of the same code.

Differential Revision: https://reviews.llvm.org/D109683
2021-09-13 16:27:12 +01:00
vnalamot 0fc3ebb70a [SelectionDAG][NFC] Fix typo in VerifyDAGDiverence() function name
Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D109674
2021-09-13 20:48:04 +05:30
David Truby 915e9e76bf [llvm][sve] Lowering for VLS masked extending loads
This extends the custom lowering for extending loads on
fixed length vectors in SVE to support masked extending loads.

The existing tests for correct behaviour of masked extending loads
exhibit bad code generation due to the legalistaion of i1 vectors.
They have been left as-is and new tests have been added that do not
exhibit this behaviour.

Differential Revision: https://reviews.llvm.org/D108200
2021-09-13 11:13:25 +01:00
Nikita Popov 4189e5fe12 [CGP] Support opaque pointers in address mode fold
Rather than inspecting the pointer element type, use the access
type of the load/store/atomicrmw/cmpxchg.

In the process of doing this, simplify the logic by storing the
address + type in MemoryUses, rather than an Instruction + Operand
pair (which was then used to fetch the address).
2021-09-12 17:43:37 +02:00
Kazu Hirata c9fca53af1 [CodeGen, Target] Use pred_empty and succ_empty (NFC) 2021-09-10 11:11:31 -07:00
Nikita Popov 14afbe9448 [CallLowering] Support opaque pointers
Always use the byval/inalloca/preallocated type (which is required
nowadays), don't fall back on the pointer element type.

This requires adding Function::getParamPreallocatedType() to
mirror the CallBase API, so that the templated code can work with
both.
2021-09-10 18:32:12 +02:00
Sander de Smalen ec7d8d5069 [SelectionDAG] PromoteIntRes_EXTRACT_SUBVECTOR for scalable vectors (widening).
This patch implements legalization of EXTRACT_SUBVECTOR for the case
where the result needs promoting, and the input type requires widening.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D109509
2021-09-10 13:29:26 +01:00
Sander de Smalen 801a745dd2 [SelectionDAG] PromoteIntRes_EXTRACT_SUBVECTOR for scalable vectors.
This patch implements legalization of EXTRACT_SUBVECTOR for the case
where the result needs promoting, and the input type is either legal
or requires splitting.

The idea is that the operation is broken down into simpler steps,
by first extracting a smaller subvector until the input vector
becomes legal or requires promotion.

Reviewed By: CarolineConcatto

Differential Revision: https://reviews.llvm.org/D109313
2021-09-10 13:29:26 +01:00
Zequan Wu 12f80c0bbd [DebugInfo] Emit DW_AT_inline under -g1/-gmlt
Differential Revision: https://reviews.llvm.org/D109554
2021-09-09 18:59:50 -07:00
Craig Topper 9af8f1b18e [SelectionDAG] Add isZero/isAllOnes methods to ConstantSDNode.
Soft deprecrate isNullValue/isAllOnesValue and update in tree
callers. This matches the changes to the APInt interface from
D109483.

Reviewed By: lattner

Differential Revision: https://reviews.llvm.org/D109535
2021-09-09 13:28:30 -07:00
Craig Topper 517728fe1e [SelectionDAG] Use DAG.getNOT to further simplify some code. NFC
Followup to D109483
2021-09-09 10:53:39 -07:00
Nick Desaulniers e69d402088 [NFC] rename member of BitTestBlock and JumpTableHeader
Follow up to suggestions in D109103 via hans:
  I think UnreachableDefault (or UnreachableFallthrough) would be a
  better name now, since it doesn't just omit the range check, it also
  omits the last bit test.

Reviewed By: hans

Differential Revision: https://reviews.llvm.org/D109455
2021-09-09 10:43:00 -07:00
Chris Lattner d51da74889 [CodeGen] Use DAG.getAllOnesConstant where possible to simplify code. NFC. 2021-09-09 10:22:51 -07:00
Chris Lattner 735f46715d [APInt] Normalize naming on keep constructors / predicate methods.
This renames the primary methods for creating a zero value to `getZero`
instead of `getNullValue` and renames predicates like `isAllOnesValue`
to simply `isAllOnes`.  This achieves two things:

1) This starts standardizing predicates across the LLVM codebase,
   following (in this case) ConstantInt.  The word "Value" doesn't
   convey anything of merit, and is missing in some of the other things.

2) Calling an integer "null" doesn't make any sense.  The original sin
   here is mine and I've regretted it for years.  This moves us to calling
   it "zero" instead, which is correct!

APInt is widely used and I don't think anyone is keen to take massive source
breakage on anything so core, at least not all in one go.  As such, this
doesn't actually delete any entrypoints, it "soft deprecates" them with a
comment.

Included in this patch are changes to a bunch of the codebase, but there are
more.  We should normalize SelectionDAG and other APIs as well, which would
make the API change more mechanical.

Differential Revision: https://reviews.llvm.org/D109483
2021-09-09 09:50:24 -07:00
Chris Lattner 9e46dd965a [APInt.h] Reduce the APInt header file interface a bit. NFC
This moves one mid-size function out of line, inlines the
trivial tcAnd/tcOr/tcXor/tcComplement methods into their only
caller, and moves the magic/umagic functions into SelectionDAG
since they are implementation details of its algorithm. This
also removes the unit tests for magic, but these are already
tested in the divide lowering logic for various targets.

This also upgrades some C style comments to C++.

Differential Revision: https://reviews.llvm.org/D109476
2021-09-08 18:17:07 -07:00
Amara Emerson eae44c8a86 [GlobalISel] Implement merging of stores of truncates.
This is a port of a combine which matches a pattern where a wide type scalar
value is stored by several narrow stores. It folds it into a single store or
a BSWAP and a store if the targets supports it.

Assuming little endian target:
 i8 *p = ...
 i32 val = ...
 p[0] = (val >> 0) & 0xFF;
 p[1] = (val >> 8) & 0xFF;
 p[2] = (val >> 16) & 0xFF;
 p[3] = (val >> 24) & 0xFF;
=>
 *((i32)p) = val;

On CTMark AArch64 -Os this results in a good amount of savings:

Program            before        after       diff
             SPASS 412792       412788       -0.0%
                kc 432528       432512       -0.0%
            lencod 430112       430096       -0.0%
  consumer-typeset 419156       419128       -0.0%
            bullet 475840       475752       -0.0%
        tramp3d-v4 367760       367628       -0.0%
          clamscan 383388       383204       -0.0%
    pairlocalalign 249764       249476       -0.1%
    7zip-benchmark 570100       568860       -0.2%
           sqlite3 287628       286920       -0.2%
Geomean difference                           -0.1%

Differential Revision: https://reviews.llvm.org/D109419
2021-09-08 17:06:33 -07:00
Nick Desaulniers 4331f19d8b [ISEL][BitTestBlock] omit additional bit test when default destination is unreachable
Otherwise we end up with an extra conditional jump, following by an
unconditional jump off the end of a function. ie.

  bb.0:
    BT32rr ..
    JCC_1 %bb.4 ...
  bb.1:
    BT32rr ..
    JCC_1 %bb.2 ...
    JMP_1 %bb.3
  bb.2:
    ...
  bb.3.unreachable:
  bb.4:
    ...

  Should be equivalent to:
  bb.0:
    BT32rr ..
    JCC_1 %bb.4 ...
    JMP_1 %bb.2
  bb.1:
  bb.2:
    ...
  bb.3.unreachable:
  bb.4:
    ...

This can occur since at the higher level IR (Instruction) SwitchInsts
are required to have BBs for default destinations, even when it can be
deduced that such BBs are unreachable.

For most programs, this isn't an issue, just wasted instructions since the
unreachable has been statically proven.

The x86_64 Linux kernel when built with CONFIG_LTO_CLANG_THIN=y fails to
boot though once D106056 is re-applied.  D106056 makes it more likely
that correlation-propagation (CVP) can deduce that the default case of
SwitchInsts are unreachable. The x86_64 kernel uses a binary post
processor called objtool, which emits this warning:

vmlinux.o: warning: objtool: cfg80211_edmg_chandef_valid()+0x169: can't
find jump dest instruction at .text.cfg80211_edmg_chandef_valid+0x17b

I haven't debugged precisely why this causes a failure at boot time, but
fixing this very obvious jump off the end of the function fixes the
warning and boot problem.

Link: https://bugs.llvm.org/show_bug.cgi?id=50080
Fixes: https://github.com/ClangBuiltLinux/linux/issues/679
Fixes: https://github.com/ClangBuiltLinux/linux/issues/1440

Reviewed By: hans

Differential Revision: https://reviews.llvm.org/D109103
2021-09-08 11:03:47 -07:00
David Green d8d24c64fe [DAG] Fix GT -> GE condition when creating SetCC
79845ed6df folded some setcc(ashr) conditions to setcc, but got
the condition for NE incorrect, using GT where it should be using GE.
2021-09-08 12:41:51 +01:00
Evgeny Leviant 93b09a2a5d [LiveDebugValues] Handle spills of indirect debug values correctly
When handling register spill for indirect debug value LiveDebugValues pass doesn't add
DW_OP_deref operator which may in some cases cause debugger to return value address, instead
of value while machine register holding that address is spilled.

Differential revision: https://reviews.llvm.org/D109142
2021-09-08 14:06:08 +03:00
Fraser Cormack 2c5568a6a9 [LegalizeTypes][VP] Add promotion support for binary VP ops
This patch extends the preliminary support for vector-predicated (VP)
operation legalization to include promotion of illegal integer vector
types.

Integer promotion of binary VP operations is relatively simple and
piggy-backs on the non-VP logic, but passing the two extra mask and VP
operands through to the promoted operation.

Tests have been added to the RISC-V target to cover the basic scenarios
for integer promotion for both fixed- and scalable-vector types.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D108288
2021-09-08 10:22:57 +01:00
Peter Smith 5e71839f77 [MC] Add MCSubtargetInfo to MCAlignFragment
In preparation for passing the MCSubtargetInfo (STI) through to writeNops
so that it can use the STI in operation at the time, we need to record the
STI in operation when a MCAlignFragment may write nops as padding. The
STI is currently unused, a further patch will pass it through to
writeNops.

There are many places that can create an MCAlignFragment, in most cases
we can find out the STI in operation at the time. In a few places this
isn't possible as we are in initialisation or finalisation, or are
emitting constant pools. When possible I've tried to find the most
appropriate existing fragment to obtain the STI from, when none is
available use the per module STI.

For constant pools we don't actually need to use EmitCodeAlign as the
constant pools are data anyway so falling through into it via an
executable NOP is no better than falling through into data padding.

This is a prerequisite for D45962 which uses the STI to emit the
appropriate NOP for the STI. Which can differ per fragment.

Note that involves an interface change to InitSections. It is now
called initSections and requires a SubtargetInfo as a parameter.

Differential Revision: https://reviews.llvm.org/D45961
2021-09-07 15:46:19 +01:00
Mirko Brkusanin 6c4b634da6 [AMDGPU][GlobalISel] Legalize G_MUL for non-standard types
Legalizing G_MUL for non-standard types (like i33) generated an error. Putting
minScalar and maxScalar instead of clampScalar. Also using new rule, instead
of widening to the next power of 2, widen to the next multiple of the passed
argument (32 in this case), so instead of widening i65 to i128, we widen it to
i96.

Patch by: Mateja Marjanovic

Differential Revision: https://reviews.llvm.org/D109228
2021-09-07 16:33:24 +02:00
Mirko Brkusanin 5263bf583a [AMDGPU][GlobalISel] Legalization of G_ROTL and G_ROTR
Add implementation for the legalization of G_ROTL and G_ROTR machine
instructions. They are very similar to funnel shift instructions, the only
difference is funnel shifts have 3 operands, whereas rotate instructions have
two operands, the first being the register that is being rotated and the second
being the number of shifts. The legalization of G_ROTL/G_ROTR is just lowering
them into funnel shift instructions if they are legal.

Patch by: Mateja Marjanovic

Differential Revision: https://reviews.llvm.org/D105347
2021-09-07 16:33:24 +02:00
Mirko Brkusanin 36527cbe02 [AMDGPU][GlobalISel] Legalize memcpy family of intrinsics
Legalize G_MEMCPY, G_MEMMOVE, G_MEMSET and G_MEMCPY_INLINE.

Corresponding intrinsics are replaced by a loop that uses loads/stores in
AMDGPULowerIntrinsics pass unless their length is a constant lower then
MemIntrinsicExpandSizeThresholdOpt (default 1024). Any G_MEM* instruction that
reaches legalizer should have a const length argument and should be expanded
into appropriate number of loads + stores.

Differential Revision: https://reviews.llvm.org/D108357
2021-09-07 12:24:07 +02:00
Fraser Cormack a823bdf3ab [RISCV][VP] Custom lower VP_STORE and VP_LOAD
This patch adds support for the vector-predicated `VP_STORE` and
`VP_LOAD` nodes. We do this in the same way we lower `MSTORE` and
`MLOAD`: to regular load/store instructions via intrinsics.

One necessary change was made to `SelectionDAGLegalize` so that
`VP_STORE` nodes' operation actions are taken from the stored "value"
operands, in the same vein as `STORE` or `MSTORE`.

Reviewed By: craig.topper, rogfer01

Differential Revision: https://reviews.llvm.org/D108999
2021-09-07 10:53:25 +01:00
Fraser Cormack f4dee8cb82 [RISCV][VP] Custom lower VP_SCATTER and VP_GATHER
This patch adds support for the `VP_SCATTER` and `VP_GATHER` nodes by
lowering them to RVV's `vsox`/`vlux` instructions, respectively. This
process is almost identical to the existing `MSCATTER`/`MGATHER` support.

One extra change was made to `SelectionDAGLegalize` so that
`VP_SCATTER`'s operation action is derived from its stored "value"
operand rather than its return type (which is always the chain).

Reviewed By: craig.topper, rogfer01

Differential Revision: https://reviews.llvm.org/D108987
2021-09-07 10:43:07 +01:00
Sanjay Patel e1e4bf174b [DAGCombine] Prevent the transform of combine for multi-use operand
The test is based on a miscompile example in:
https://llvm.org/PR51321

Differential Revision: https://reviews.llvm.org/D107692
2021-09-06 15:30:32 -04:00
Jonas Paulsson 118997d8e9 [SelectionDAGBuilder] Bugfix in visitInlineAsm()
In case of a virtual register tied to a phys-def, the register class needs to
be computed. Make sure that this works generally also with fast regalloc by
using TLI.getRegClassFor() whenever possible, and make only the case of
'Untyped' use getMinimalPhysRegClass().

Fixes https://bugs.llvm.org/show_bug.cgi?id=51699.

Review: Ulrich Weigand
Differential Revision: https://reviews.llvm.org/D109291
2021-09-06 17:46:31 +02:00
David Green 1b83aaaefa [DAG] Remove oneuse check in select_cc setgt X, -1, C, ~C fold
This appears to produce better code, even if the condition may need to
be replicated.
2021-09-05 16:18:31 +01:00
David Green 8523fb96a6 [DAG] Fold select_cc setgt X, -1, C, ~C -> xor (ashr X, BW-1), C
Given a select_cc producing a constant and a invertion of the constant
for a comparison more than zero, we can produce an xor with ashr
instead, which produces smaller code. The ashr either sets all bits or
clear all bits depending on if the value is negative. This is then xor'd
with the constant to optionally negate the value.
https://alive2.llvm.org/ce/z/DTFaBZ

This includes a OneUseCheck on the Cmp, which seems to make thinks a
little worse and will be removed in a followup.

Differential Revision: https://reviews.llvm.org/D109149
2021-09-05 16:04:01 +01:00
David Green 79845ed6df [DAG] Fold setcc eq with ashr to compare to zero.
Pulled out of D109149, this folds set_cc seteq (ashr X, BW-1), -1 ->
set_cc setlt X, 0 to prevent some regressions later on when folding
select_cc setgt X, -1, C, ~C -> xor (ashr X, BW-1), C

Differential Revision: https://reviews.llvm.org/D109214
2021-09-05 14:06:47 +01:00
Fangrui Song e03c8d309a [AsmPrinter] Remove unneeded MCSubtargetInfo temporary after D14346. NFC
The temporary object was used as a workaround when the target parser may
change STI. D14346 made the MCSubtargetInfo argument to
createMCAsmParser const, so we no longer need the temporary object.
2021-09-04 10:50:10 -07:00
Konstantin Schwarz 90d5298759 [GlobalISel] Add convenience constructors to MemDesc
This allows constructing a MemDesc from a MachineMemoryOperand, a pattern that starts to show up more frequently.

Reviewed By: foad

Differential Revision: https://reviews.llvm.org/D109161
2021-09-03 12:52:18 +02:00
Chen Zheng 34badc409c Revert "[HardwareLoops] Change order of SCEV expression construction for InitLoopCount."
This causes https://bugs.llvm.org/show_bug.cgi?id=51714 and
is not a right patch according to comments in D91724

This reverts commit 42eaf4fe0a.
2021-09-03 02:55:43 +00:00
Jessica Paquette 844d8e0337 [GlobalISel] Combine icmp eq/ne x, 0/1 -> x when x == 0 or 1
This adds the following combines:

```
x = ... 0 or 1
c = icmp eq x, 1

->

c = x
```

and

```
x = ... 0 or 1
c = icmp ne x, 0

->

c = x
```

When the target's true value for the relevant types is 1.

This showed up in the following situation:

https://godbolt.org/z/M5jKexWTW

SDAG currently supports the `ne` case, but not the `eq` case. This can probably
be further generalized, but I don't feel like thinking that hard right now.

This gives some minor code size improvements across the board on CTMark at
-Os for AArch64. (0.1% for 7zip and pairlocalalign in particular.)

Differential Revision: https://reviews.llvm.org/D109130
2021-09-02 15:05:31 -07:00
Heejin Ahn 28780e59f6 [WebAssembly] Add Wasm SjLj support
This add support for SjLj using Wasm exception handling instructions:
https://github.com/WebAssembly/exception-handling/blob/master/proposals/exception-handling/Exceptions.md

This does not yet support the mixed use of EH and SjLj within a
function. It will be added in a follow-up CL.

This currently passes all SjLj Emscripten tests for wasm0/1/2/3/s,
except for the below:
- `test_longjmp_standalone`: Uses Node
- `test_dlfcn_longjmp`: Uses NodeRAWFS
- `test_longjmp_throw`: Mixes EH and SjLj
- `test_exceptions_longjmp1`: Mixes EH and SjLj
- `test_exceptions_longjmp2`: Mixes EH and SjLj
- `test_exceptions_longjmp3`: Mixes EH and SjLj

Reviewed By: dschuff, tlively

Differential Revision: https://reviews.llvm.org/D108960
2021-09-02 10:51:02 -07:00
David Green 9cb8f4d1ad [ARM] Add a tail-predication loop predicate register
The semantics of tail predication loops means that the value of LR as an
instruction is executed determines the predicate. In other words:

mov r3, #3
DLSTP lr, r3        // Start tail predication, lr==3
VADD.s32 q0, q1, q2 // Lanes 0,1 and 2 are updated in q0.
mov lr, #1
VADD.s32 q0, q1, q2 // Only first lane is updated.

This means that the value of lr cannot be spilled and re-used in tail
predication regions without potentially altering the behaviour of the
program. More lanes than required could be stored, for example, and in
the case of a gather those lanes might not have been setup, leading to
alignment exceptions.

This patch adds a new lr predicate operand to MVE instructions in order
to keep a reference to the lr that they use as a tail predicate. It will
usually hold the zeroreg meaning not predicated, being set to the LR phi
value in the MVETPAndVPTOptimisationsPass. This will prevent it from
being spilled anywhere that it needs to be used.

A lot of tests needed updating.

Differential Revision: https://reviews.llvm.org/D107638
2021-09-02 13:42:58 +01:00
Roman Lebedev 3f1f08f0ed
Revert @llvm.isnan intrinsic patchset.
Please refer to
https://lists.llvm.org/pipermail/llvm-dev/2021-September/152440.html
(and that whole thread.)

TLDR: the original patch had no prior RFC, yet it had some changes that
really need a proper RFC discussion. It won't be productive to discuss
such an RFC, once it's actually posted, while said patch is already
committed, because that introduces bias towards already-committed stuff,
and the tree is potentially in broken state meanwhile.

While the end result of discussion may lead back to the current design,
it may also not lead to the current design.

Therefore i take it upon myself
to revert the tree back to last known good state.

This reverts commit 4c4093e6e3.
This reverts commit 0a2b1ba33a.
This reverts commit d9873711cb.
This reverts commit 791006fb8c.
This reverts commit c22b64ef66.
This reverts commit 72ebcd3198.
This reverts commit 5fa6039a5f.
This reverts commit 9efda541bf.
This reverts commit 94d3ff09cf.
2021-09-02 13:53:56 +03:00
Fraser Cormack ef78f2106c [LegalizeTypes][VP] Add splitting support for binary VP ops
This patch extends D107904's introduction of vector-predicated (VP)
operation legalization to include vector splitting.

When the result of a binary VP operation needs splitting, all of its
operands are split in kind. The two operands and the mask are split as
usual, and the vector-length parameter EVL is "split" such that the low
and high halves each execute the correct number of elements.

Tests have been added to the RISC-V target to show splitting several
scenarios for fixed- and scalable-vector types. Without support for
`umax` (e.g. in the `B` extension) the generated code starts to branch.
Ideally a cost model would prevent their insertion in the first place.

Through these tests many opportunities for better codegen can be seen:
combining known-undef VP operations and for constant-folding operations
on `ISD::VSCALE`, to name but a few.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D107957
2021-09-02 10:15:53 +01:00
Abinav Puthan Purayil 0baace5379 [DAGCombine] Add node level checks for fp-contract and fp-ninf in visitFMULForFMADistributiveCombine().
Differential Revision: https://reviews.llvm.org/D107551
2021-09-02 11:33:14 +05:30
Roman Lebedev f5753125f0
[Codegen][TLI][X86] SimplifyMultipleUseDemandedBits(): 0'th vec subreg widening is free, try to perform it earlier
I believe, the profitability reasoning here is correct
"sub"reg is already located within the 0'th subreg of wider reg,
so if we have suvector insertion at index 0 into undef,
then it's always free do to.

After this, D109065 finally avoids the regression in D108382.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D109074
2021-09-02 00:54:05 +03:00
Arthur Eubanks 52e6d70c40 [NFC] Use newly introduced *AtIndex methods
Introduced in D108788. These are clearer.
2021-09-01 11:18:41 -07:00
Fraser Cormack 85fd44d7fe [SelectionDAG][NFC] Fix typo in assertion message
s/Uexpected/Unexpected.
2021-09-01 08:55:06 +01:00
Yonghong Song 89424a829f [DWARF] Support new TAG DW_TAG_LLVM_annotation
A new LLVM specific TAG DW_TAG_LLVM_annotation is added.
The name is suggested by Paul Robinson ([1]).
Currently, this tag is used to output __attribute__((btf_tag("string")))
annotations in dwarf. The following is an example for a global
variable with two btf_tag attributes:
  0x0000002a:   DW_TAG_variable
                  DW_AT_name      ("g1")
                  DW_AT_type      (0x00000052 "int")
                  DW_AT_external  (true)
                  DW_AT_decl_file ("/tmp/home/yhs/work/tests/llvm/btf_tag/t.c")
                  DW_AT_decl_line (8)
                  DW_AT_location  (DW_OP_addr 0x0)

  0x0000003f:     DW_TAG_LLVM_annotation
                    DW_AT_name    ("btf_tag")
                    DW_AT_const_value     ("tag1")

  0x00000048:     DW_TAG_LLVM_annotation
                    DW_AT_name    ("btf_tag")
                    DW_AT_const_value     ("tag2")

  0x00000051:     NULL

In the future, DW_TAG_LLVM_annotation may encode other type
of non-string const value.

 [1] https://lists.llvm.org/pipermail/llvm-dev/2021-June/151250.html

Differential Revision: https://reviews.llvm.org/D106621
2021-08-31 19:22:17 -07:00
Stanislav Mekhanoshin d170945bb2 [RegAlloc] Immediately delete dead instructions with live uses
When RA eliminated a dead def it can either immediately delete
the instruction itself or replace it with KILL to defer the
actual removal. If this instruction has a virtual register use
killing the register it will shrink the LI of the use. However,
if the LI covers the instruction and extends beyond it the
shrink will not happen. In fact that is impossible to shrink
such use because of the KILL still using it.

If later the LI of the use will be split at the KILL and the
KILL itself is eliminated after that point the new live segment
ends up at an invalid slot index.

This extremely rare condition was hit after D106408 which has
enabled rematerialization of such instructions. The replacement
with KILL is only done for rematerialized defs which became dead
and such rematerialization did not generally happen before.

The patch deletes an instruction immediately if it is a result
of rematerialization and has such use. An alternative would be
to prohibit a split at a KILL instruction, but it looks like it
is better to split a live range rather then keeping a killed
instruction just in case it can be rematerialized further.

Fixes PR51655.

Differential Revision: https://reviews.llvm.org/D108951
2021-08-31 13:46:00 -07:00
Jessica Paquette 94d3ff09cf [GlobalISel] Don't use G_FPTOSI in G_ISNAN legalization
As noted in the comments in D108227, using G_FPTOSI produces wrong results for
G_ISNAN. Drop the G_FPTOSI and perform the operation on integer types.

Elsewhere in LLVM, a bitcast would be the appropriate choice (as it is in SDAG).
GlobalISel does not distinguish between integer and FP types, so a bitcast would
be meaningless here.
2021-08-31 10:26:42 -07:00
Hussain Kadhem 524ded7d01 [VP] implementation of sdag support for VP memory intrinsics
Followup to D99355: SDAG support for vector-predicated load/store/gather/scatter.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D105871
2021-08-31 17:01:50 +02:00
Nemanja Ivanovic 84d4ed1761 Revert "[DebugInfo] Emit DW_TAG_namelist and DW_TAG_namelist_item"
This reverts commit 0a6fad754e.
It caused failures on a number of PowerPC bots.
2021-08-31 09:24:50 -05:00
Craig Topper 201f6446da [LegalizeTypes][X86] Improve ExpandIntRes_FP_TO_SINT/ExpandIntRes_FP_TO_UINT when input is SoftPromoteHalf.
Instead of splitting off the fp16 to float conversion and generating
a libcall, we should split the operation into fp16 to float and float
to integer operations. This will allow the float to integer conversion
to go through any custom handling the target has. If the target doesn't
have custom handling then we should come back to ExpandIntRes_FP_TO_SINT/
ExpandIntRes_FP_TO_UINT automatically to create the libcall.

This avoids generating libcalls on 32-bit X86. These library functions may
not exist in 32-bit libgcc. At least for LLVM, we never generate them when
hardware floating point instructions are available.

Differential Revision: https://reviews.llvm.org/D108933
2021-08-30 13:12:59 -07:00
Bjorn Pettersson 789f01283d [SelectionDAG] Fix miscompile bugs related to smul.fix.sat with scale zero
When expanding a SMULFIXSAT ISD node (usually originating from
a smul.fix.sat intrinsic) we've applied some optimizations for
the special case when the scale is zero. The idea has been that
it would be cheaper to use an SMULO instruction (if legal) to
perform the multiplication and at the same time detect any overflow.
And in case of overflow we could use some SELECT:s to replace the
result with the saturated min/max value. The only tricky part
is to know if we overflowed on the min or max value, i.e. if the
product is positive or negative. Unfortunately the implementation
has been incorrect as it has looked at the product returned by the
SMULO to determine the sign of the product. In case of overflow that
product is truncated and won't give us the correct sign bit.

This patch is adding an extra XOR of the multiplication operands,
which is used to determine the sign of the non truncated product.

This patch fixes PR51677.

Reviewed By: lebedev.ri

Differential Revision: https://reviews.llvm.org/D108938
2021-08-30 22:08:26 +02:00
Chih-Ping Chen 070090cfa5 [DebugInfo] Remove the restriction on the size of DIStringType
in DebugHandlerBase::isUnsignedDIType.

Differential Revision: https://reviews.llvm.org/D108559
2021-08-30 15:36:54 -04:00
Nikita Popov 0529e2e018 [InstrInfo] Use 64-bit immediates for analyzeCompare() (NFCI)
The backend generally uses 64-bit immediates (e.g. what
MachineOperand::getImm() returns), so use that for analyzeCompare()
and optimizeCompareInst() as well. This avoids truncation for
targets that support immediates larger 32-bit. In particular, we
can avoid the bugprone value normalization hack in the AArch64
target.

This is a followup to D108076.

Differential Revision: https://reviews.llvm.org/D108875
2021-08-30 19:46:04 +02:00
Hongtao Yu f39256e3a5 [CSSPGO] Avoid repeatedly computing md5 hash code for pseudo probe inline contexts.
Md5 hashing is expansive. Using a hash map to look up already computed GUID for dwarf names. Saw a 2% build time improvement on an internal large application.

Reviewed By: wenlei

Differential Revision: https://reviews.llvm.org/D108722
2021-08-30 10:11:47 -07:00
Kazu Hirata c50faffb4e [llvm] Remove redundant calls to str() and c_str() (NFC)
Identified with readability-redundant-string-cstr.
2021-08-30 09:05:05 -07:00
Craig Topper 705d005781 [DAGCombiner][RISCV] Don't use vector types in DAGCombiner::tryStoreMergeOfLoads if we need a rotate.
The check for whether a rotate is possible occurs before the
memory legality checks for the integer type. So it's possible we
decide we can use a rotate, but then fail the legality checks. If
that happens we should not fall back to a vector type. This triggers
an assertion in the rotate handling when it finds a vector type
instead of an integer type.

In theory we could use a shufflevector in place of the rotate, but
right now I'd just like to fix the crash.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D108839
2021-08-30 08:47:15 -07:00
Djordje Todorovic 86f5288eae [LiveDebugValues] Cleanup Transfers when removing Entry Value
If we encounter a new debug value, describing the same parameter,
we should stop tracking the parameter's Entry Value. At that point,
in some cases, the Transfer which uses the parameter's Entry Value,
is already emitted. Thanks to the RemoveRedundantDebugValues pass,
many problems with incorrect instruction order and number of DBG_VALUEs
are fixed. However, we still cannot rely on the rule that each new
debug value is set by the previous non-debug instruction in Machine
Basic Block.

When new parameter debug value triggers removal of Backup Entry Value
for the same parameter, do the cleanup of Transfers emitted from Backup
Entry Values. Get the Transfer Instruction which created the new debug
value and search for debug values already emitted from the to-be-deleted
Backup Entry Value and attached to the Transfer Instruction. If found,
delete the Transfer and remove "primary" Entry Value Var Loc from
OpenRanges.

This patch fixes PR47628.

Patch by Nikola Tesic.

Differential revision: https://reviews.llvm.org/D106856
2021-08-30 14:00:41 +02:00
Simon Pilgrim 7c25a32840 Fix MSVC "signed/unsigned mismatch" comparison warning. NFCI. 2021-08-30 12:11:09 +01:00
“bhkumarn” 0a6fad754e [DebugInfo] Emit DW_TAG_namelist and DW_TAG_namelist_item
This patch emits DW_TAG_namelist and DW_TAG_namelist_item for fortran
namelist variables. DICompositeType is extended to support this fortran
feature.

Reviewed By: aprantl

Differential Revision: https://reviews.llvm.org/D108553
2021-08-30 13:40:39 +05:30
Matt Arsenault 1494298b51 GlobalISel: Remove check for empty functions as these are invalid IR 2021-08-27 09:27:06 -04:00
Carl Ritson 5d9de3ea18 [DAGCombine] Allow FMA combine with both FMA and FMAD
Without this change only the preferred fusion opcode is tested
when attempting to combine FMA operations.
If both FMA and FMAD are available then FMA ops formed prior to
legalization will not be merged post legalization as FMAD becomes
the preferred fusion opcode.

Reviewed By: foad

Differential Revision: https://reviews.llvm.org/D108619
2021-08-27 19:49:35 +09:00
Matt Arsenault 3fdcd9bb13 GlobalISel: Add CallBase to CallLoweringInfo
The DAG version has this, and is necessary for call lowering to take
advantage of any attributes at the call site.
2021-08-26 21:09:11 -04:00
Craig Topper 8bb24289f3 [SelectionDAG] Optimize bitreverse expansion to minimize the number of mask constants.
We can halve the number of mask constants by masking before shl
and after srl.

This can reduce the number of mov immediate or constant
materializations. Or reduce the number of constant pool loads
for X86 vectors.

I think we might be able to do something similar for bswap. I'll
look at it next.

Differential Revision: https://reviews.llvm.org/D108738
2021-08-26 09:33:24 -07:00
Andrew Wei c9066c5d37 [CGP] Fix the crash for combining address mode when having cyclic dependency
In the combination of addressing modes, when replacing the matched phi nodes,
sometimes the phi node to be replaced has been modified. For example,
there’s matcher set [A, B] and [C, A], which will have cyclic dependency:
A is replaced by B and C will be replaced by A. Because we tried to match new phi node
to another new phi node, we should ignore new phi nodes when mapping new phi node to old one.

Reviewed By: skatkov

Differential Revision: https://reviews.llvm.org/D108635
2021-08-26 22:52:42 +08:00
Jay Foad 985eb25546 [MachineScheduler] Fix tracing
Consistently print a newline before "RegionInstrs:".
2021-08-26 09:27:01 +01:00
Heejin Ahn 2f88a30ca6 [WebAssembly] Extract longjmp handling in EmSjLj to a function (NFC)
Emscripten SjLj and (soon-to-be-added) Wasm SjLj transformation share
many steps:
1. Initialize `setjmpTable` and `setjmpTableSize` in the entry BB
2. Handle `setjmp` callsites
3. Handle `longjmp` callsites
4. Cleanup and update SSA

1, 3, and 4 are identical for Emscripten SjLj and Wasm SjLj. Only the
step 2 is different. This CL extracts the current Emscripten SjLj's
longjmp callsites handling into a function. The reason to make this a
separate CL is, without this, the diff tool cannot compare things well
in the presence of moved code and added code in the followup Wasm SjLj
CL, and it ends up mixing them together, making the diff unreadable.

Also fixes some typos and variable names. So far we've been calling the
buffer argument to `setjmp` and `longjmp` `jmpbuf`, but the name used in
the man page for those functions is `env`, so updated them to be
consistent.

Reviewed By: tlively

Differential Revision: https://reviews.llvm.org/D108728
2021-08-25 15:45:38 -07:00
Heejin Ahn c2c9a3fd9c [WebAssembly] Rename wasm.catch.exn intrinsic back to wasm.catch
The plan was to use `wasm.catch.exn` intrinsic to catch exceptions and
add `wasm.catch.longjmp` intrinsic, that returns two values (setjmp
buffer and return value), later to catch longjmps. But because we
decided not to use multivalue support at the moment, we are going to use
one intrinsic that returns a single value for both exceptions and
longjmps. And even if it's not for that, I now think the naming of
`wasm.catch.exn` is a little weird, because the intrinsic can still take
a tag immediate, which means it can be used for anything, not only
exceptions, as long as that returns a single value.

This partially reverts D107405.

Reviewed By: tlively

Differential Revision: https://reviews.llvm.org/D108683
2021-08-25 14:19:22 -07:00
Sanjay Patel e728d1a3e8 [DAGCombiner] create binop nodes with all of expected values
This is another bug exposed by https://llvm.org/PR51612
(and the one that triggered the initial assertion) in the report.

That example was suppressed with:
985b48f183

...but these would still crash because we created nodes
like UADDO without the expected 2 output values.
2021-08-25 16:14:22 -04:00
Sanjay Patel 985b48f183 [DAGCombiner] check uses more strictly on select-of-binop fold
There are 2 bugs here:
1. We were not checking uses of operand 2 (the false value of the select).
2. We were not checking for multiple uses of nodes that produce >1 result.

Correcting those is enough to avoid the crash in the reduced test based on:
https://llvm.org/PR51612

The additional use check on operand 0 (the condition value of the select)
should not strictly be necessary because we are only replacing one use
with another (whether it makes performance sense to do the transform with
that pattern is not clear). But as noted in the TODO, changing that
uncovers another bug.

Note: there's at least one more bug here - we aren't propagating EVTs
correctly, but I plan to fix that in another patch.
2021-08-25 14:14:41 -04:00
Nick Desaulniers 846e562dcc [Clang] add support for error+warning fn attrs
Add support for the GNU C style __attribute__((error(""))) and
__attribute__((warning(""))). These attributes are meant to be put on
declarations of functions whom should not be called.

They are frequently used to provide compile time diagnostics similar to
_Static_assert, but which may rely on non-ICE conditions (ie. relying on
compiler optimizations). This is also similar to diagnose_if function
attribute, but can diagnose after optimizations have been run.

While users may instead simply call undefined functions in such cases to
get a linkage failure from the linker, these provide a much more
ergonomic and actionable diagnostic to users and do so at compile time
rather than at link time. Users instead may be able use inline asm .err
directives.

These are used throughout the Linux kernel in its implementation of
BUILD_BUG and BUILD_BUG_ON macros. These macros generally cannot be
converted to use _Static_assert because many of the parameters are not
ICEs. The Linux kernel still needs to be modified to make use of these
when building with Clang; I have a patch that does so I will send once
this feature is landed.

To do so, we create a new IR level Function attribute, "dontcall" (both
error and warning boil down to one IR Fn Attr).  Then, similar to calls
to inline asm, we attach a !srcloc Metadata node to call sites of such
attributed callees.

The backend diagnoses these during instruction selection, while we still
know that a call is a call (vs say a JMP that's a tail call) in an arch
agnostic manner.

The frontend then reconstructs the SourceLocation from that Metadata,
and determines whether to emit an error or warning based on the callee's
attribute.

Link: https://bugs.llvm.org/show_bug.cgi?id=16428
Link: https://github.com/ClangBuiltLinux/linux/issues/1173

Reviewed By: aaron.ballman

Differential Revision: https://reviews.llvm.org/D106030
2021-08-25 10:34:18 -07:00
Jeremy Morse 0116ed0069 [DebugInfo][InstrRef] Don't use instr-ref for unoptimised functions
InstrRefBasedLDV is marginally slower than VarlocBasedLDV when analysing
optimised code -- however, it's much slower when analysing code compiled
-O0.

To avoid this: don't use instruction referencing for -O0 functions. In the
"pure" case of unoptimised code, this won't really harm the debugging
experience because most variables won't have been promoted off the stack,
so can't go missing. It becomes more complicated when optimised code is
inlined into functions marked optnone; however these are rare, and as -O0
doesn't run many optimisations there should be little damage to the debug
experience as a result.

I've taken the opportunity to refactor testing for instruction-referencing
into a MachineFunction method, which seems the most appropriate place to
put it.

Differential Revision: https://reviews.llvm.org/D108585
2021-08-25 15:10:36 +01:00
Peilin Guo 4c4dbeeeea [DAGCombine] Check the legality of the index of EXTRACT_SUBVECTOR
For ISD::EXTRACT_SUBVECTOR, its second operand must be a constant
multiple of the known-minimum vector length of the result type.

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D107795
2021-08-25 19:33:39 +08:00
Jeremy Morse cc1e87bf55 [DebugInfo][InstrRef] Avoid stack-slot-coloring changing codegen due to DI
Stack slot colouring adds "weight" to slots if a non-dbg-value instruction
refers to it. This, unfortunately, means that DBG_PHI instructions can have
an effect on codegen. The fix is very simple, replace isDebugValue with
isDebugInstr.

The regression test contains a scenario that reproduces this problem; I've
represented both normal-debug mode and instr-ref debug mode instructions
in comment lines prefixed with AAAAAA and BBBBBB, and un-comment them with
sed to test that the two different modes produce the same behaviour.

Differential Revision: https://reviews.llvm.org/D108627
2021-08-25 12:04:59 +01:00
Konstantin Schwarz 4b4bc1ea16 [GlobalISel] Do not generate illegal G_SEXTLOADs after legalization
The sext_inreg_of_load combine did not have the isLegalOrBeforeLegalizer check,
leading to the generation of potentially illegal G_SEXTLOADs when run after legalization.

Reviewed By: foad

Differential Revision: https://reviews.llvm.org/D108626
2021-08-25 10:13:39 +02:00
Vang Thao 549f6a819a [MachineCopyPropagation] Check CrossCopyRegClass for cross-class copys
On some AMDGPU subtargets, copying to and from AGPR registers using another
AGPR register is not possible. A intermediate VGPR register is needed for AGPR
to AGPR copy. This is an issue when machine copy propagation forwards a
COPY $agpr, replacing a COPY $vgpr which results in $agpr = COPY $agpr. It is
removing a cross class copy that may have been optimized by previous passes and
potentially creating an unoptimized cross class copy later on.

To avoid this issue, check CrossCopyRegClass if a different register class will
be needed for the copy. If so then avoid forwarding the copy when the
destination does not match the desired register class and if the original copy
already matches the desired register class.

Issue seen while attempting to optimize another AGPR to AGPR issue:

Live-ins: $agpr0
$vgpr0 = COPY $agpr0
$agpr1 = V_ACCVGPR_WRITE_B32 $vgpr0
$agpr2 = COPY $vgpr0
$agpr3 = COPY $vgpr0
$agpr4 = COPY $vgpr0

After machine-cp:

$vgpr0 = COPY $agpr0
$agpr1 = V_ACCVGPR_WRITE_B32 $vgpr0
$agpr2 = COPY $agpr0
$agpr3 = COPY $agpr0
$agpr4 = COPY $agpr0

Machine-cp propagated COPY $agpr0 to replace $vgpr0 creating 3 AGPR to AGPR
copys. Later this creates a cross-register copy from AGPR->VGPR->AGPR for each
copy when the prior VGPR->AGPR copy was already optimal.

Reviewed By: lkail, rampitec

Differential Revision: https://reviews.llvm.org/D108011
2021-08-24 21:22:36 -07:00
Stanislav Mekhanoshin 92c1fd19ab Allow rematerialization of virtual reg uses
Currently isReallyTriviallyReMaterializableGeneric() implementation
prevents rematerialization on any virtual register use on the grounds
that is not a trivial rematerialization and that we do not want to
extend liveranges.

It appears that LRE logic does not attempt to extend a liverange of
a source register for rematerialization so that is not an issue.
That is checked in the LiveRangeEdit::allUsesAvailableAt().

The only non-trivial aspect of it is accounting for tied-defs which
normally represent a read-modify-write operation and not rematerializable.

The test for a tied-def situation already exists in the
/CodeGen/AMDGPU/remat-vop.mir,
test_no_remat_v_cvt_f32_i32_sdwa_dst_unused_preserve.

The change has affected ARM/Thumb, Mips, RISCV, and x86. For the targets
where I more or less understand the asm it seems to reduce spilling
(as expected) or be neutral. However, it needs a review by all targets'
specialists.

Differential Revision: https://reviews.llvm.org/D106408
2021-08-24 11:09:02 -07:00
Simon Pilgrim 194b08000c [DAG] LoadedSlice::canMergeExpensiveCrossRegisterBankCopy - replace getABITypeAlign with allowsMemoryAccess (PR45116)
One of the cases identified in PR45116 - we don't need to limit load combines to ABI alignment, we can use allowsMemoryAccess - which tests using getABITypeAlign, but also checks if a target permits (fast) misaligned memory loads by checking allowsMisalignedMemoryAccesses as a fallback.
2021-08-24 15:28:30 +01:00
Simon Pilgrim 6de0b55188 [DAG] TransformFPLoadStorePair - replace getABITypeAlign with allowsMemoryAccess (PR45116)
One of the cases identified in PR45116 - we don't need to limit load combines (in this case for fp->int load/store copies) to ABI alignment, we can use allowsMemoryAccess - which tests using getABITypeAlign, but also checks if a target permits (fast) misaligned memory loads by checking allowsMisalignedMemoryAccesses as a fallback.

Differential Revision: https://reviews.llvm.org/D108318
2021-08-24 13:11:27 +01:00
Simon Pilgrim e431b280c9 [DAG] CombineConsecutiveLoads - replace getABITypeAlign with allowsMemoryAccess (PR45116)
One of the cases identified in PR45116 - we don't need to limit load combines (in this case for ISD::BUILD_PAIR) to ABI alignment, we can use allowsMemoryAccess - which tests using getABITypeAlign, but also checks if a target permits (fast) misaligned memory loads by checking allowsMisalignedMemoryAccesses as a fallback.

This helps in particular for 32-bit X86 cases loading 64-bit size data, reducing codegen diffs vs x86_64.

Differential Revision: https://reviews.llvm.org/D108307
2021-08-24 12:31:22 +01:00
Stanislav Mekhanoshin 401a45c61b Fix late rematerialization operands check
D106408 enables rematerialization of instructions with virtual
register uses. That has uncovered the bug in the allUsesAvailableAt
implementation: https://bugs.llvm.org/show_bug.cgi?id=51516.

In the majority of cases canRematerializeAt() called to check if
an instruction can be rematerialized before the given UseIdx.
However, SplitEditor::enterIntvAtEnd() calls it to rematerialize
an instruction at the end of a block passing LIS.getMBBEndIdx()
into the check. In the testcase from the bug it has attempted to
rematerialize ADDXri after STRXui in bb.17. The use operand %55
of the ADD is killed by the STRX but that is undetected by the check
because it adjusts passed UseIdx to the reg slot, before the kill.
The value is dead at the index passed to the check however.

This change uses a later of passed UseIdx and its reg slot. This
shall be correct because if are checking an availability of operands
before an instruction that instruction cannot be the one defining
these operands. If we are checking for late rematerialization we
are really interested if operands live past the instruction.

The bug is not exploitable without D106408 but needed to reland
reverted D106408.

Differential Revision: https://reviews.llvm.org/D108475
2021-08-23 12:23:58 -07:00
Jessica Paquette 6760e2a7bc [GlobalISel] Translate @llvm.llround.* -> G_LLROUND
Translate it using `IRTranslator::translateSimpleIntrinsic`.

Differential Revision: https://reviews.llvm.org/D108563
2021-08-23 09:42:53 -07:00
Ben Shi f69fb7ac72 [DAGCombiner] Add target hook function to decide folding (mul (add x, c1), c2)
Reviewed by: lebedev.ri, spatel, craig.topper, luismarques, jrtc27

Differential Revision: https://reviews.llvm.org/D107711
2021-08-22 16:53:32 +08:00