Commit Graph

57083 Commits

Author SHA1 Message Date
Simon Pilgrim 8206c50cde [X86] Add isAnyZero shuffle mask helper 2020-03-29 19:51:37 +01:00
Matt Arsenault d15723ef06 AMDGPU/GlobalISel: Remove redundant virtual 2020-03-29 14:03:07 -04:00
Matt Arsenault ab7a41069e AMDGPU: Fix using wrong instruction for FP conversion
This was was never actually hit, but FTRUNC was clearly not the intent
here.
2020-03-29 14:03:07 -04:00
Matt Arsenault 97bbe7ad2a AMDGPU: Fix typo 2020-03-29 14:03:06 -04:00
Simon Pilgrim 7734e4b3a3 [X86][AVX] Combine 128-bit lane shuffles with a zeroable upper half to EXTRACT_SUBVECTOR (PR40720)
As explained on PR40720, EXTRACTF128 is always as good/better than VPERM2F128, and we can use the implicit zeroing of the upper half.

I've added some extra tests to vector-shuffle-combining-avx2.ll to make sure we don't lose coverage.
2020-03-29 16:41:59 +01:00
Simon Pilgrim da4c7db793 [X86] Rename matchShuffleAsByteRotate to matchShuffleAsElementRotate. NFC.
This was an inner helper function for the real matchShuffleAsByteRotate function, but it is more generic and is used directly for VALIGN lowering which doesn't work at the byte level.
2020-03-29 16:41:58 +01:00
Simon Pilgrim 10439f9e32 [X86][AVX] Add X86ISD::VALIGN target shuffle decode support
Allows us to combine VALIGN instructions with other shuffles - the combiner doesn't create VALIGN yet though.
2020-03-29 16:41:58 +01:00
Simon Pilgrim a7115d51be [X86] X86CallFrameOptimization - generalize slow push code path
Replace the explicit isAtom() || isSLM() test with the more general (and more specific) slowTwoMemOps() check to avoid the use of the PUSHrmm push from memory case.

This is actually very tricky to test in anything but quite complex code, but the atomic-idempotent.ll tests seem to be the most straightforward to use.

Differential Revision: https://reviews.llvm.org/D76239
2020-03-29 11:01:59 +01:00
Fangrui Song fc93787d7e [MC][PowerPC] Make .reloc support arbitrary relocation types
Generalizes ad7199f3e6 (R_PPC_NONE/R_PPC64_NONE).
2020-03-28 17:04:31 -07:00
Matt Arsenault 9564f46766 AMDGPU: Make use of default operands 2020-03-28 17:33:29 -04:00
Benjamin Kramer 2d24d74b85 [AMDGPU] Stabilize sort order
Found by the expensive checks in llvm::sort.
2020-03-28 20:20:14 +01:00
Yonghong Song ced0d1f42b [BPF] support 128bit int explicitly in layout spec
Currently, bpf does not specify 128bit alignment in its
layout spec. So for a structure like
  struct ipv6_key_t {
    unsigned pid;
    unsigned __int128 saddr;
    unsigned short lport;
  };
clang will generate IR type
  %struct.ipv6_key_t = type { i32, [12 x i8], i128, i16, [14 x i8] }
Additional padding is to ensure later IR->MIR can generate correct
stack layout with target layout spec.

But it is common practice for a tracing program to be
first compiled with target flag (e.g., x86_64 or aarch64) through
clang to generate IR and then go through llc to generate bpf
byte code. Tracing program often refers to kernel internal
data structures which needs to be compiled with non-bpf target.

But such a compilation model may cause a problem on aarch64.
The bcc issue https://github.com/iovisor/bcc/issues/2827
reported such a problem.

For the above structure, since aarch64 has "i128:128" in its
layout string, the generated IR will have
  %struct.ipv6_key_t = type { i32, i128, i16 }

Since bpf does not have "i128:128" in its spec string,
the selectionDAG assumes alignment 8 for i128 and
computes the stack storage size for the above is 32 bytes,
which leads incorrect code later.

The x86_64 does not have this issue as it does not have
"i128:128" in its layout spec as it does permits i128 to
be alignmented at 8 bytes at stack. Its IR type looks like
  %struct.ipv6_key_t = type { i32, [12 x i8], i128, i16, [14 x i8] }

The fix here is add i128 support in layout spec, the same as
aarch64. The only downside is we may have less optimal stack
allocation in certain cases since we require 16byte alignment
for i128 instead of 8. But this is probably fine as i128 is
not used widely and in most cases users should already
have proper alignment.

Differential Revision: https://reviews.llvm.org/D76587
2020-03-28 11:46:29 -07:00
Benjamin Kramer 4065e92195 Upgrade some instances of std::sort to llvm::sort. NFC. 2020-03-28 19:23:29 +01:00
Michael Liao d2dd0fac48 Fix `-Wsign-compare` warning. NFC. 2020-03-28 10:20:27 -04:00
Kamlesh Kumar aabc24acf0 [RISCV] Support llvm.thread.pointer
Fixes https://bugs.llvm.org/show_bug.cgi?id=45303 (clang crashed on __builtin_thread_pointer)

Reviewed By: lenary, MaskRay, luismarques

Differential Revision: https://reviews.llvm.org/D76828
2020-03-27 17:30:12 -07:00
Francesco Petrogalli c66d1f38f6 [llvm][Support] Add isZero method for TypeSize. [NFC]
Summary:
The method is used where TypeSize is implicitly cast to integer for
being checked against 0.

Reviewers: sdesmalen, efriedma

Reviewed By: sdesmalen, efriedma

Subscribers: efriedma, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D76748
2020-03-27 21:03:44 +00:00
Matt Arsenault a8cc9047de CodeGen: Add -denormal-fp-math-f32 flag
Make the set of FP related attributes and command flags closer.
2020-03-27 14:00:39 -07:00
Jay Foad a6dfd827e5 [AMDGPU] Fix getEUsPerCU for gfx10 in CU mode
Summary:
"Per CU" is a bit simplistic for gfx10, but I couldn't think of a better
name.

Reviewers: arsenm, rampitec, nhaehnle, dstuttard, tpr

Subscribers: kzhuravl, jvesely, wdng, yaxunl, t-tye, hiraditya, kerbowa, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D76861
2020-03-27 20:36:49 +00:00
Fangrui Song 152d14da64 [MC][X86] Make .reloc support arbitrary relocation types
Generalizes D62014 (R_386_NONE/R_X86_64_NONE).

Unlike ARM (D76746) and AArch64 (D76754), we cannot delete FK_NONE from
getFixupKindSize because FK_NONE is still used by R_386_TLS_DESC_CALL/R_X86_64_TLSDESC_CALL.
2020-03-27 13:33:15 -07:00
diggerlin 9c20f09985 [AIX] Address comment https://reviews.llvm.org/D76162#inline-701237
SUMMARY:

Address clang format issue:

"clang format this block, I don't think the spaces are aligned correctly."

Subscribers: wuzish, nemanjai, hiraditya

Differential Revision: https://reviews.llvm.org/D76162
2020-03-27 16:21:53 -04:00
Matt Arsenault 348735b723 AMDGPU: Stop setting attributes based on TargetOptions
Having arbitrary passes looking at the TargetOptions is pretty
messy. This was also disregarding if a function already had an
explicit attribute setting on it. opt/llc now add the attributes to
functions that don't specify the attribute. clang and lld do not call
the function to do this, which they maybe should.

This was also treating unsafe-fp-math as implying the others, and
setting the other attributes based on it. This is not done anywhere
else, and I'm not sure is correct based on the current description of
the option bit.

Effectively reverts 1d8cf2be89
2020-03-27 13:13:43 -07:00
Matt Arsenault 0ab5b5b858 Fix denormal-fp-math flag and attribute interaction
Make these behave the same way unsafe-fp-math and co. The command line
flag should add the attribute to functions that do not already have
it, and leave existing attributes. The attribute is the actual
implementation, but the flag is useful in some testing situations.

AMDGPU has a variety of tests with denormals enabled/disabled that
would require a painful level of test duplication without a flag. This
doesn't expose setting the separate input/output modes, or add a flag
for the f32 version yet.

Tests will be included in future patch.
2020-03-27 12:48:58 -07:00
Fangrui Song 34d77516b8 [MC][AArch64] Make .reloc support arbitrary relocation types
Depends on D76746. Generalizes D61973.

Differential Revision: https://reviews.llvm.org/D76754
2020-03-27 12:30:52 -07:00
Fangrui Song c389526171 [MC][ARM] Make .reloc support arbitrary relocation types
Generalizes D61992. In GNU as, the .reloc directive supports arbitrary relocation types.

A MCFixupKind value `V` larger than or equal to FirstLiteralRelocationKind
is used to represent the relocation type whose number is V-FirstLiteralRelocationKind.

This is useful for linker tests. Without the feature the assembler
cannot produce certain relocation records (e.g.  R_ARM_ALU_PC_G0/R_ARM_LDR_PC_G0)
This helps move forward D75349 and D76575.

Differential Revision: https://reviews.llvm.org/D76746
2020-03-27 12:29:49 -07:00
Craig Topper cdd1cd7120 [X86] Don't form masked instructions if the operation has an additional user.
This will cause the operation to be repeated in both a mask and another masked
or unmasked form. This can a wasted of execution resources.

Differential Revision: https://reviews.llvm.org/D60940
2020-03-27 10:44:22 -07:00
Simon Pilgrim 950ea61653 [X86] Remove orphan LowerSTRICT_FSETCC declaration. NFCI.
LowerSETCC handles strict cases as well, we don't have a separate function.
2020-03-27 17:03:19 +00:00
Guillaume Chatelet 74eac9031a [Alignment][NFC] MachineMemOperand::getAlign/getBaseAlign
Summary:
This is patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790

Reviewers: courbet

Subscribers: arsenm, dschuff, sdardis, nemanjai, jvesely, nhaehnle, sbc100, jgravelle-google, hiraditya, aheejin, kbarton, jrtc27, atanasyan, jfb, kerbowa, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D76925
2020-03-27 15:49:13 +00:00
Sam Parker d7084fa34a [ARM][LowOverheadLoops] DoubleWidthResult instructions canGenerateZeros
Given that some instructions generate wider result elements than
their inputs, flag them as being able to generate non zeros in the
false lanes.

Differential Revision: https://reviews.llvm.org/D76766
2020-03-27 15:26:13 +00:00
Sam Parker 0e6aa08381 [ARM][MVE] Add DoubleWidthResult flag
Add a flag for those instructions which read from the top/bottom
halves of their inputs and produce a vector of results with double
width elements.

Differential Revision: https://reviews.llvm.org/D76762
2020-03-27 13:44:04 +00:00
Guillaume Chatelet e2ef6127d9 [Alignment] Fix overaligning bug
Summary:
This was discovered while converting to Align type.

See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790

Reviewers: courbet

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D76914
2020-03-27 12:57:50 +00:00
Simon Pilgrim d6ddabd7ef Revert rG6ff1ea3244c543ad24fc99c7f4979db2f2078593 "Fix "use of uninitialized variable" static analyzer warning. NFCI."
@dblaikie noticed that this may interfere with msan analysis
2020-03-27 11:44:03 +00:00
Jonas Paulsson 35173dddd1 [SystemZ] Fix typos in comments. 2020-03-27 12:31:48 +01:00
David Green 8689f98e9b [ARM] Fix MVE VCMPr f16 pattern
This patterns seemed to be using the f32 instruction, not f16. Fix it to
use the correct one.

Differential Revision: https://reviews.llvm.org/D76841
2020-03-27 11:18:24 +00:00
Fangrui Song 6728a9ae19 [MCInstPrinter] Add parameter `Address` to printCustomAliasOperand. NFC
Follow-up of D72172 and llvmorg-11-init-6896-gb3cc5dcef0f.
2020-03-27 00:38:20 -07:00
Fangrui Song b3cc5dcef0 [MCInstPrinter] Add parameter `Address` to MCInstPrinter::printAliasInstr. NFC
Follow-up of D72172.
2020-03-27 00:03:32 -07:00
Shengchen Kan 1fb4f99a21 [X86][MC] Fix the bug for prefix padding support
Summary:
There is a tiny logic error of D75300, making branch is not
correctly aligned with option -x86-pad-max-prefix-size

Reviewers: reames, MaskRay, craig.topper, LuoYuanke, jyknight

Reviewed By: reames

Subscribers: hiraditya, llvm-commits, annita.zhang

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D76285
2020-03-27 14:16:09 +08:00
Dan Gohman d865437d9c [WebAssembly] Fix the order of destructors in the LowerGlobalDtors pass.
Fix the LowerGlobalDtors pass to run destructors in the same order as the
regular LLVM destructor lowering -- in reverse order. Adjacent
destructors with the same associated object are grouped, but destructors
are not reordered based on associated objects.

Differential Revision: https://reviews.llvm.org/D70685
2020-03-26 16:19:02 -07:00
Stanislav Mekhanoshin 4c4b71843b [AMDGPU] Propagate amdgpu-waves-per-eu to callees
Differential Revision: https://reviews.llvm.org/D76868
2020-03-26 14:43:44 -07:00
Craig Topper 9f7d4150b9 [X86] Move combineLoopMAddPattern and combineLoopSADPattern to an IR pass before SelecitonDAG.
These transforms rely on a vector reduction flag on the SDNode
set by SelectionDAGBuilder. This flag exists because SelectionDAG
can't see across basic blocks so SelectionDAGBuilder is looking
across and saving the info. X86 is the only target that uses this
flag currently. By removing the X86 code we can remove the flag
and the SelectionDAGBuilder code.

This pass adds a dedicated IR pass for X86 that looks across the
blocks and transforms the IR into a form that the X86 SelectionDAG
can finish.

An advantage of this new approach is that we can enhance it to
shrink the phi nodes and final reduction tree based on the zeroes
that we need to concatenate to bring the partially reduced
reduction back up to the original width.

Differential Revision: https://reviews.llvm.org/D76649
2020-03-26 14:10:20 -07:00
Simon Pilgrim ad36491ebb [X86] Prefer PACKUS(AND(),AND()) to SHUFFLE(PSHUFB(),PSHUFB()) on all targets
Extends rG9d1721ce3926 to support AVX2+ targets.
2020-03-26 20:46:24 +00:00
Jay Foad 0fe096c4e9 [AMDGPU] Rename overloaded getMaxWavesPerEU to getWavesPerEUForWorkGroup
Summary: I think Max in the name was misleading. NFC.

Reviewers: arsenm, rampitec

Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, kerbowa, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D76860
2020-03-26 20:21:04 +00:00
Jay Foad bb9c4fd7ea [AMDGPU] Remove getMaxWavesPerCU in favour of getWavesPerWorkGroup.
Summary:
These methods were identical. I chose to remove getMaxWavesPerCU because
I think Max in the name was misleading. NFC.

Reviewers: arsenm, rampitec

Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, kerbowa, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D76859
2020-03-26 20:21:04 +00:00
Derek Schuff e110897e28 [WEbAssembly] Clear frame base vreg in explicit-locals when stack pointer is dead
Having an alloca in a function causes the stack pointer to be generated in the
prolog, but if it's unused other than for debug info, explicit-locals will drop
it and not allocate a local. In this case we need to reset the FrameBaseVreg.

Differential Revision: https://reviews.llvm.org/D76784
2020-03-26 13:07:32 -07:00
Simon Pilgrim 39a52a19ed [X86] lowerV16I8Shuffle - create v8i16 mask for PACKUS(AND(),AND()) patterns.
We can improve computeKnownBits results by avoiding excess bitcasts.

For this pattern we were doing:

  (v16i8 PACKUS(v8i16 BITCAST(v16i8 AND(V1, MASK)), v8i16 BITCAST(v16i8 AND(V2, MASK))))

By performing the MASK/AND with a v8i16 type and bitcasting V1/V2 directly we can help computeKnownBits see that the mask is clearing the upper bits and allows shuffle combining to peek through later on.

This will be necessary to extend rG9d1721ce3926 to AVX2+ targets in a future patch.
2020-03-26 19:59:57 +00:00
diggerlin fdfe411e7c [AIX] discard the label in the csect of function description and use qualname for linkage
SUMMARY:

SUMMARY
for a source file  "test.c"

void foo() {};

llc will generate assembly code as (assembly patch)
     .globl  foo
     .globl  .foo
     .csect foo[DS]
foo:

        .long   .foo
        .long   TOC[TC0]
        .long   0

   and symbol table as (xcoff object file)
   [4]     m   0x00000004     .data     1  unamex                    foo
   [5]     a4  0x0000000c       0    0     SD       DS    0    0
   [6]     m   0x00000004     .data     1  extern                    foo
   [7]     a4  0x00000004       0    0     LD       DS    0    0

   After first patch, the assembly will be as

        .globl  foo[DS]                 # -- Begin function foo
        .globl  .foo
        .align  2
        .csect foo[DS]
        .long   .foo
        .long   TOC[TC0]
        .long   0

    and symbol table will as
   [6]     m   0x00000004     .data     1  extern                    foo
   [7]     a4  0x00000004       0    0     DS      DS    0    0
Change the code for the assembly path and xcoff objectfile patch for llc.

Reviewers: Jason Liu
Subscribers: wuzish, nemanjai, hiraditya

Differential Revision: https://reviews.llvm.org/D76162
2020-03-26 15:46:52 -04:00
Scott Linder bd12ecb88f [AMDGPU] Fix PC register mapping in wave32 mode
Summary:
The PC_32 DWARF register is for a 32-bit process address space which we
don't implement in AMDGCN; another way of putting this is that the size
of the PC register is not a function of the wavefront size. If we ever
implement a 32-bit process address space we will need to add two more
DwarfFlavours i.e. we will need to represent the product of (wave32,
wave64) x (64-bit address space, 32-bit address space).

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D76732
2020-03-26 14:43:25 -04:00
David Blaikie 9002db05a2 Roll otherwise unused subexpressions into an assertion 2020-03-26 11:32:33 -07:00
Guillaume Chatelet b727aabcb8 [Alignment][NFC] Use llvmTargetFrameLowering::getStackAlign
Summary:
This is patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790

Reviewers: courbet

Reviewed By: courbet

Subscribers: wuzish, arsenm, jyknight, nemanjai, jvesely, nhaehnle, hiraditya, kbarton, fedor.sergeev, jrtc27, kerbowa, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D76613
2020-03-26 18:15:53 +00:00
Jay Foad 0602c20b1b [AMDGPU] Make use of divideCeil. NFC. 2020-03-26 16:11:35 +00:00
Jay Foad 596bed3fd3 [AMDGPU] Remove unused methods. NFC. 2020-03-26 16:11:35 +00:00
Justin Hibbits 459e8e9488 [PowerPC]: Don't allow r0 as a target for LD_GOT_TPREL_L/32
Summary:
The linker is free to relax this (relocation R_PPC_GOT_TPREL16) against
R_PPC_TLS, if it sees fit (initial exec to local exec).  If r0 is used,
this can generate execution-invalid code (converts to 'addi %rX, %r0,
FOO, which translates in PPC-lingo to li %rX, FOO).  Forbid this
instead.

This fixes static binaries using locales on FreeBSD/powerpc
(tested on FreeBSD/powerpcspe).

Reviewed By: nemanjai
Differential Revision: https://reviews.llvm.org/D76662
2020-03-26 10:59:28 -05:00
Simon Pilgrim 9d1721ce39 [X86][SSE] Prefer PACKUS(AND(),AND()) to SHUFFLE(PSHUFB(),PSHUFB()) on pre-AVX2 targets
As discussed on PR31443, we should be trying to use PACKUS for binary truncation patterns to reduce the number of shuffles.

The plan is to support AVX2+ targets once we've worked around PR45315 - we fail to peek through a VBROADCAST_LOAD mask to recognise zero upper bits in a PACKUS pattern.

We should also be able to add support for v8i16 and possibly 256/512-bit vectors as well.
2020-03-26 15:47:43 +00:00
Fangrui Song 3eef47407b [PPCInstPrinter] Change printBranchOperand(calltarget) to print the target address in hexadecimal form
```
// llvm-objdump -d output (before)
0: bl .-4
4: bl .+0
8: bl .+4

// llvm-objdump -d output (after) ; GNU objdump -d
0: bl 0xfffffffc / bl 0xfffffffffffffffc
4: bl 0x4
8: bl 0xc
```

Many Operand's are not annotated as OPERAND_PCREL.
They are not affected (e.g. `b .+67108860`). I plan to fix them in future patches.

Modified test/tools/llvm-objdump/ELF/PowerPC/branch-offset.s to test
address space wraparound for powerpc32 and powerpc64.

Reviewed By: sfertile, jhenderson

Differential Revision: https://reviews.llvm.org/D76591
2020-03-26 08:32:29 -07:00
Fangrui Song 87de9a0786 [X86InstPrinter] Change printPCRelImm to print the target address in hexadecimal form
```
// llvm-objdump -d output (before)
400000: e8 0b 00 00 00   callq 11
400005: e8 0b 00 00 00   callq 11

// llvm-objdump -d output (after)
400000: e8 0b 00 00 00  callq 0x400010
400005: e8 0b 00 00 00  callq 0x400015

// GNU objdump -d. The lack of 0x is not ideal because the result cannot be re-assembled
400000: e8 0b 00 00 00  callq 400010
400005: e8 0b 00 00 00  callq 400015
```

In llvm-objdump, we pass the address of the next MCInst. Ideally we
should just thread the address of the current address, unfortunately we
cannot call X86MCCodeEmitter::encodeInstruction (X86MCCodeEmitter
requires MCInstrInfo and MCContext) to get the length of the MCInst.

MCInstPrinter::printInst has other callers (e.g llvm-mc -filetype=asm, llvm-mca) which set Address to 0.
They leave MCInstPrinter::PrintBranchImmAsAddress as false and this change is a no-op for them.

Reviewed By: jhenderson

Differential Revision: https://reviews.llvm.org/D76580
2020-03-26 08:28:59 -07:00
Fangrui Song 5fad05e80d [MCInstPrinter] Pass `Address` parameter to MCOI::OPERAND_PCREL typed operands. NFC
Follow-up of D72172 and D72180

This patch passes `uint64_t Address` to print methods of PC-relative
operands so that subsequent target specific patches can change
`*InstPrinter::print{Operand,PCRelImm,...}` to customize the output.

Add MCInstPrinter::PrintBranchImmAsAddress which is set to true by
llvm-objdump.

```
// Current llvm-objdump -d output
aarch64: 20000: bl #0
ppc:     20000: bl .+4
x86:     20000: callq 0

// Ideal output
aarch64: 20000: bl 0x20000
ppc:     20000: bl 0x20004
x86:     20000: callq 0x20005

// GNU objdump -d. The lack of 0x is not ideal because the result cannot be re-assembled
aarch64: 20000: bl 20000
ppc:     20000: bl 0x20004
x86:     20000: callq 20005
```

In `lib/Target/X86/X86GenAsmWriter1.inc` (generated by `llvm-tblgen -gen-asm-writer`):

```
   case 12:
     // CALL64pcrel32, CALLpcrel16, CALLpcrel32, EH_SjLj_Setup, JCXZ, JECXZ, J...
-    printPCRelImm(MI, 0, O);
+    printPCRelImm(MI, Address, 0, O);
     return;
```

Some targets have 2 `printOperand` overloads, one without `Address` and
one with `Address`. They should annotate derived `Operand` properly with
`let OperandType = "OPERAND_PCREL"`.

Reviewed By: jhenderson

Differential Revision: https://reviews.llvm.org/D76574
2020-03-26 08:21:15 -07:00
Simon Pilgrim e30d29ebc1 [X86][SSE] getFauxShuffleMask - peek through TRUNCATE/AEXT/ZEXT for INSERT_VECTOR_ELT(EXTRACT_VECTOR_ELT())
As long we extract from a source vector with smaller elements and we zero-extend the element in the final shuffle mask then we can safely peek through truncations and any/zero-extensions to find the source extraction.
2020-03-26 11:57:45 +00:00
Jonas Paulsson 8bf9e317e4 [SystemZ] Bugfix in tieOpsIfNeeded()
This function did a check which was broken to see if an opcode requires op0
and op1 to be tied. By chance this is NFC.

Review: Ulrich Weigand
2020-03-26 12:22:14 +01:00
Kang Zhang 4673699a47 [PowerPC] Remove the repeated definition for some InstAlias for mtspr/mfspr
Summary:
Below InstAlias have been redefined, this patch is to remove the repeated
definition.
mtdec/mfdec mtsdr1/mfsdr1 mtsrr0/mfsrr0 mtsrr1/mfsrr1 mtasr

Reviewed By: nemanjai, steven.zhang

Differential Revision: https://reviews.llvm.org/D75821
2020-03-26 09:58:30 +00:00
Cullen Rhodes 9086db707d [AArch64][SVE] Implement structured store intrinsics
Summary:
This patch adds initial support for the following intrinsics:

    * llvm.aarch64.sve.st2
    * llvm.aarch64.sve.st3
    * llvm.aarch64.sve.st4

For storing two, three and four vectors worth of data. Basic codegen for
reg+immediate forms are implemented. Reg+reg addressing modes will be
addressed in a later patch.

These intrinsics are intended for use in the Arm C Language Extension
(ACLE).

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D75947
2020-03-26 09:34:51 +00:00
Ties Stuij 71ae267d1f [PATCH] [ARM] ARMv8.6-a command-line + BFloat16 Asm Support
Summary:
This patch introduces command-line support for the Armv8.6-a architecture and assembly support for BFloat16. Details can be found
https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/arm-architecture-developments-armv8-6-a

in addition to the GCC patch for the 8..6-a CLI:
https://gcc.gnu.org/legacy-ml/gcc-patches/2019-11/msg02647.html

In detail this patch

- march options for armv8.6-a
- BFloat16 assembly

This is part of a patch series, starting with command-line and Bfloat16
assembly support. The subsequent patches will upstream intrinsics
support for BFloat16, followed by Matrix Multiplication and the
remaining Virtualization features of the armv8.6-a architecture.

Based on work by:
- labrinea
- MarkMurrayARM
- Luke Cheeseman
- Javed Asbar
- Mikhail Maltsev
- Luke Geeson

Reviewers: SjoerdMeijer, craig.topper, rjmccall, jfb, LukeGeeson

Reviewed By: SjoerdMeijer

Subscribers: stuij, kristof.beyls, hiraditya, dexonsmith, danielkiss, cfe-commits, llvm-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D76062
2020-03-26 09:17:20 +00:00
David Green 37b9cc8f29 [ARM] Sink splats to vector float instructions
Some MVE floating point instructions have gpr register variants that take
the scalar gpr value and splat them to all lanes. In order to accept
them in loops, the shuffle_vector and insert need to be sunk down into
the loop, next to the instruction so that ISel can see the whole
pattern.

This does that sinking for FAdd, FSub, FMul and FCmp. The patterns for
mul are slightly more constrained as there are no fms variants taking
register arguments.

Differential Revision: https://reviews.llvm.org/D76023
2020-03-26 09:02:18 +00:00
QingShan Zhang 1ef7bf4121 [PowerPC] Improve the way legalize mul for v8i16 and add pattern to match mul + add
We can legalize the operation MUL for v8i16 with instruction (vmladduhm A, B, 0)
if altivec enabled. Now, it is set as custom and expand it later, which is not
the right way. And then, we can add the pattern to match the mul + add with (vmladduhm A, B, C)

Reviewed By: Nemanjai

Differential Revision: https://reviews.llvm.org/D76751
2020-03-26 04:46:49 +00:00
Stanislav Mekhanoshin e06d707aa2 [AMDGPU] Fixed function traversal in attribute propagation
AMDGPUPropagateAttributes pass was skipping some of the functions
when cloning. Functions were added to root set and then skipped
on the next interation because they are already in the root set,
while were meant to be processed with different features.

Differential Revision: https://reviews.llvm.org/D76815
2020-03-25 18:47:09 -07:00
Stanislav Mekhanoshin 6e00e3fcb0 [AMDGPU] Preserve original symbol during attribute propagation
AMDGPUPropagateAttributes can swap names while cloning a function.
Only do it if original symbol was not externally visible.

Differential Revision: https://reviews.llvm.org/D76789
2020-03-25 15:26:30 -07:00
Simon Pilgrim c6e5531f9b [X86][AVX] Combine shuffles to TRUNCATE/VTRUNC patterns
Add support for combining shuffles to AVX512 truncate instructions - another step toward fixing D56387/D66004. It also fixes SKX code on PR31443.

We could probably extend this further to handle non-VLX truncation cases.
2020-03-25 17:41:51 +00:00
Mikhail Maltsev bb4da94e5b [ARM,CDE] Implement predicated Q-register CDE intrinsics
Summary:
This patch implements the following CDE intrinsics:

  T __arm_vcx1q_m(int coproc, T inactive, uint32_t imm, mve_pred_t p);
  T __arm_vcx2q_m(int coproc, T inactive, U n, uint32_t imm, mve_pred_t p);
  T __arm_vcx3q_m(int coproc, T inactive, U n, V m, uint32_t imm, mve_pred_t p);

  T __arm_vcx1qa_m(int coproc, T acc, uint32_t imm, mve_pred_t p);
  T __arm_vcx2qa_m(int coproc, T acc, U n, uint32_t imm, mve_pred_t p);
  T __arm_vcx3qa_m(int coproc, T acc, U n, V m, uint32_t imm, mve_pred_t p);

The intrinsics are not part of the released ACLE spec, but internally at
Arm we have reached consensus to add them to the next ACLE release.

Reviewers: simon_tatham, MarkMurrayARM, ostannard, dmgreen

Reviewed By: simon_tatham

Subscribers: kristof.beyls, hiraditya, danielkiss, cfe-commits

Tags: #clang

Differential Revision: https://reviews.llvm.org/D76610
2020-03-25 17:08:19 +00:00
Yvan Roux bd069ad39c [ARM] Move ConstantIsland and LowOverheadLoops Passes.
Move ARM ConstantIsland and LowOverheadLopps passes later in the pipeline
such that they will be run after the upcoming Machine Outlining pass.

Differential Revision: https://reviews.llvm.org/D76065
2020-03-25 16:49:21 +01:00
cdevadas ce984129ea [AMDGPU] Add SIPreEmitPeephole pass.
This pass can handle all the optimization
opportunities found just before code emission.
Presently it includes the handling of vcc branch
optimization that was handled earlier in SIInsertSkips.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D76712
2020-03-25 15:35:35 +00:00
Jonas Paulsson f09b891d4a [SystemZ] Improve foldMemoryOperandImpl()
A spilled load of an immediate can use MVHI/MVGHI instead.
A compare of a spilled register against an immediate can use CHSI/CGHSI.
A logical compare can use CLFHSI/CLGHSI.

Review: Ulrich Weigand

Differential Revision: https://reviews.llvm.org/D76055
2020-03-25 16:21:08 +01:00
Sean Fertile 3282d875d6 [PowerPC][AIX] ByVal formal arguments in a single register.
Adds support for passing ByVal formal arguments as long as they fit in a
single register.

Differential Revision: https://reviews.llvm.org/D76401
2020-03-25 11:09:40 -04:00
Kerry McLaughlin 05606329e2 [AArch64][SVE] Add SVE intrinsics for masked loads & stores
Summary:
Implements the following intrinsics for contiguous loads & stores:
  - @llvm.aarch64.sve.ld1
  - @llvm.aarch64.sve.st1

Reviewers: sdesmalen, andwar, efriedma, cameron.mcinally, dancgr, rengolin

Reviewed By: cameron.mcinally

Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, danielkiss, cfe-commits, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D76688
2020-03-25 11:48:40 +00:00
Sam Parker e87250202d [ARM][MVE] Add HorizontalReduction flag
Add a target flag for instructions that reduce into one, or more,
scalar reg(s), including variants of:
- VADDV
- VABAV
- VMINV/VMAXV
- VMLADAV

Differential Revision: https://reviews.llvm.org/D76683
2020-03-25 11:12:03 +00:00
Kazushi (Jam) Marukawa 28a42dd1b9 [VE] Change name of enum to CondCode
Summary: Change enum name for condition codes from CondCodes to CondCode.

Reviewers: arsenm, simoll, k-ishizaka

Reviewed By: arsenm

Subscribers: wdng, hiraditya, llvm-commits

Tags: #llvm, #ve

Differential Revision: https://reviews.llvm.org/D76747
2020-03-25 09:20:05 +01:00
Amara Emerson 472d282046 [AArch64][GlobalISel] Don't localize TLS G_GLOBAL_VALUEs on Darwin.
On Darwin these need to be selected into a function call for the TLS
address lookup. As a result, they can't be moved below a physreg write,
which happens in call sequences. In the long term, we should have some
mechanism in the localizer to prevent localizing into target-specific
atomic instruction sequences.

rdar://60056248

Differential Revision: https://reviews.llvm.org/D76652
2020-03-24 13:35:50 -07:00
Reid Kleckner 597718aae0 Re-land "Avoid emitting unreachable SP adjustments after `throw`"
This reverts commit 4e0fe038f4. Re-lands
65b21282c7.

After landing 5ff5ddd0ad to add int3 into
trailing unreachable blocks, we can now remove these extra stack
adjustments without confusing the Win64 unwinder. See
https://llvm.org/45064#c4 or X86AvoidTrailingCall.cpp for a full
explanation.

Fixes PR45064.
2020-03-24 12:04:43 -07:00
Matt Arsenault 26ebc51a34 AMDGPU/GlobalISel: Fix smrd loads of v4i64 2020-03-24 13:44:41 -04:00
David Green f8c79b94af [ARM] Fold VMOVrh VLDR to LDRH
This adds a simple fold to combine VMOVrh load to a integer load.
Similar to what is already performed for BITCAST, but needs to account
for the types being of different sizes, creating an zero extending load.

Differential Revision: https://reviews.llvm.org/D76485
2020-03-24 15:51:03 +00:00
Simon Pilgrim 714402147d [X86][SSE1] Add support for logic+movmsk patterns (PR42870)
rL368506 handled the basic case, but we need to account for boolean logic patterns as well.
2020-03-24 14:28:40 +00:00
David Green 1232cfa385 [ARM] Don't split trunc stores that can be better handled as VMOVN
We deliberately split stores of the form
store(truncate(larger-than-legal-type)) into two stores, allowing each
store to perform part of the truncate for free.

There are times however where it makes more sense to use VMOVN to
de-interlace the results back into a single vector, and store that in
one go. This adds a check for that situation, not splitting the store if
it looks like a VMOVN can be more useful.

Differential Revision: https://reviews.llvm.org/D76511
2020-03-24 08:48:52 +00:00
Sam Parker 94cacebcca [ARM][LowOverheadLoops] Add checks for narrowing
Modify ValidateLiveOuts to track 'FalseLaneZeros' more precisely,
including checks on specific operations that can generate non-zeros
from zero values, e.g VMVN. We can then check that any instructions
that retain some information in their output register (all narrowing
instructions) that they only use and def registers that always have
zeros in their falsely predicated bytes, whether or not tail
predication happens.

Most of the logic remains the same, just the names of the data
structures and helpers have been renamed to reflect the change in
logic. The key change, apart from the opcode checkers, is that the
FalseZeros set now strictly contains only instructions which will
always generate zeros, and not instructions that could also have
their false bytes masked away later.

Differential Revision: https://reviews.llvm.org/D76235
2020-03-24 08:41:48 +00:00
Sam Parker 6f86e6bf40 [ARM][MVE] Add target flag for narrowing insts
Add a flag, 'RetainsPreviousHalfElement', for operations that operate
on top/bottom halves of their input and only write to half of their
destination, leaving the other half to retain its previous value.

Differential Revision: https://reviews.llvm.org/D76608
2020-03-24 08:36:44 +00:00
Chen Zheng 9d07d91fb6 [PowerPC] fix a typo in commit 3f85134d71
Implement target hook isProfitableToHoist - typo fix.
2020-03-24 01:56:15 -04:00
Nemanja Ivanovic bfa9ce1cb2 [PowerPC] Improve handling of some BUILD_VECTOR nodes
An analysis of real world code turned up a number of patterns with BUILD_VECTOR
of nodes resulting from operations on extracted vector elements for which we
produce poor code. This addresses those cases. No attempt is made for
completeness as that would entail a large amount of work for something that
there is no evidence of in real code.

Differential revision: https://reviews.llvm.org/D72660
2020-03-23 17:34:29 -05:00
Justin Hibbits f0990e104b [PowerPC]: e500 target can't use lwsync, use msync instead
The e500 core has a silicon bug that triggers an illegal instruction
program trap on any sync other than msync.  Other cores will typically
ignore illegal sync types, and the documentation even implies that the
'illegal' bits are ignored.

Address this hardware deficiency by only using msync, like the PPC440.

Differential Revision:  https://reviews.llvm.org/D76614
2020-03-23 17:15:27 -05:00
Matt Arsenault 66073953a5 AMDGPU: Allow vectorization of round intrinsic
There seems to be a small benefit to the legalized sequence for v2f16
round with packed instructions, so allow vectorizing it by reducing
the cost.

An unintended side effect is vectorization of f32 round also
happens. The current FMA logic seems off to me, and isn't checking for
packed instructions.
2020-03-23 17:00:41 -04:00
Matt Arsenault 2ad5fc1d91 AMDGPU/GlobalISel: Implement computeNumSignBitsForTargetInstr 2020-03-23 15:02:30 -04:00
Reid Kleckner 5ff5ddd0ad [Win64] Insert int3 into trailing empty BBs
Otherwise, the Win64 unwinder considers direct branches to such empty
trailing BBs to be a branch out of the function. It treats such a branch
as a tail call, which can only be part of an epilogue. If the unwinder
misclassifies such a branch as part of the epilogue, it will fail to
unwind the stack further. This can lead to bad stack traces, or failure
to handle exceptions properly. This is described in
https://llvm.org/PR45064#c4, and by the comment at the top of the
X86AvoidTrailingCallPass.cpp file.

It should be safe to insert int3 for such blocks. An empty trailing BB
that reaches this pass is pretty much guaranteed to be unreachable.  If
a program executed such a block, it would fall off the end of the
function.

Most of the complexity in this patch comes from threading through the
"EHFuncletEntry" boolean on the MIRParser and registering the pass so we
can stop and start codegen around it. I used an MIR test because we
should teach LLVM to optimize away these branches as a follow-up.

Reviewed By: hans

Differential Revision: https://reviews.llvm.org/D76531
2020-03-23 08:50:37 -07:00
Ram Nalamothu 24698e526f Implement wave32 DWARF register mapping
Implement the DWARF register mapping described in llvm/docs/AMDGPUUsage.rst.

This enables generating appropriate DWARF register numbers for wave64 and
wave32 modes.
2020-03-23 10:24:16 -04:00
Sanjay Patel 0eeee83d75 [VectorUtils] move x86's scaleShuffleMask to generic VectorUtils
We have some long-standing missing shuffle optimizations that could
use this transform via VectorCombine now:
https://bugs.llvm.org/show_bug.cgi?id=35454
(and we still don't get that case in the backend either)

This function is apparently templated because there's existing code
in IR that treats mask values as unsigned and backend code that
treats masks values as signed.

The mask values are not endian-dependent (as shown by the existing
bitcast transform from DAGCombiner).

Differential Revision: https://reviews.llvm.org/D76508
2020-03-23 09:58:55 -04:00
Jonas Paulsson 9adc7fc3cd [SystemZ] Perform instruction shortening for fused fp ops.
Replace single-lane (W... form) vector "multiply and add" and "multiply and
subtract" instructions with equivalent floating point instructions whenever
possible in SystemZShortenInst.

Review: Ulrich Weigand

Differential Revision: https://reviews.llvm.org/D76370
2020-03-23 14:12:13 +01:00
Guillaume Chatelet 3ba550a05a [Alignment][NFC] Use TFL::getStackAlign()
Summary:
This is patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790

Reviewers: courbet

Subscribers: dylanmckay, sdardis, nemanjai, hiraditya, kbarton, asb, rbar, johnrusso, simoncook, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, PkmX, jocewei, Jim, lenary, s.egerton, pzheng, sameer.abuasal, apazos, luismarques, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D76551
2020-03-23 13:48:29 +01:00
Guillaume Chatelet ea64ee0edb [Alignment][NFC] Deprecate ensureMaxAlignment
Summary:
This is patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790

Reviewers: courbet

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D76368
2020-03-23 11:31:33 +01:00
Craig Topper e2cb121374 [X86] Remove maximum vector length limit from combineBasicSADPattern.
createPSADBW uses SplitsOpsAndApply so should be able to handle
any size.

Restrict the extract result type to i32 or i64 since that's what
we have coverage for today and probably matches what the
isSimple() check gave us before.

Differential Revision: https://reviews.llvm.org/D76560
2020-03-22 15:02:05 -07:00
Craig Topper f4c67dfa92 [X86] More accurately model the cost of horizontal reductions.
This patch attempts to more accurately model the reduction of
power of 2 vectors of types we natively support. This takes into
account the narrowing of vectors that occur as we go from 512
bits to 256 bits, to 128 bits. It also takes into account the use
of wider elements in the shuffles for the first 2 steps of a
reduction from 128 bits. And uses a v8i16 shift for the final step
of vXi8 reduction.

The default implementation uses the legalized type for the arithmetic
for all levels. And uses the single source permute cost of the
legalized type for all levels. This penalizes things like
lack of v16i8 pshufb on pre-sse3 targets and the splitting and
joining that needs to be done for integer types on AVX1. We never
need v16i8 shuffle for a reduction and we only need split AVX1 ops
when type the type wide and needs to be split. I think we're still
over costing splits and joins for AVX1, but we're closer now.

I've also removed all pairwise special casing because I don't
think we ever want to generate that on X86. I've also adjusted
the add handling to more accurately account for any type splitting
that occurs before we reach a legal type.

Differential Revision: https://reviews.llvm.org/D76478
2020-03-22 14:20:15 -07:00
Simon Atanasyan 2dc4eb08cd [mips] Implement .cpadd directive
This directive inserts code to add $gp to the argument's register when
support for position independent code is enabled.

For example, this code:
  .cpadd $4
expands to:
  addu $4, $4, $gp
2020-03-22 23:34:32 +03:00
Simon Atanasyan 9bbddfbeaa [mips] Implement sne pseudo instruction
The `sne Dst, Src1, Src2/Imm` pseudo instruction sets register `Dst` to
1 if register `Src1` is not equal to `Src2/Imm` and to 0 otherwise.
2020-03-22 23:34:31 +03:00
Simon Atanasyan dca9e40c0c [mips] Implement sle/sleu pseudo instructions
The `sle/sleu Dst, Src1, Src2/Imm` pseudo instructions set register
`Dst` to 1 if register `Src1` is less than or equal `Src2/Imm` and
to 0 otherwise.
2020-03-22 23:34:31 +03:00
Simon Atanasyan 862f120fdb [mips] Remove instructions related to "wired paired single" from the P5600 model. 2020-03-22 23:34:31 +03:00
Simon Atanasyan ecc92fd018 [mips] Add HasMips3D to the list of features unsupported by P5600 model. 2020-03-22 23:34:31 +03:00
Simon Atanasyan 0f15ace018 [mips] Rename target feature Mips3D => HasMips3D. NFC 2020-03-22 23:34:31 +03:00
Craig Topper b89ae50795 [X86] Remove maximum vector width restriction from combineLoopSADPattern.
SplitsOpsAndApply will take care of any needed splitting correctly.
All that we need to check is that the vector element count is a
power of 2.

Differential Revision: https://reviews.llvm.org/D76558
2020-03-22 11:09:14 -07:00
Matt Arsenault 830cfda19f Utils: Mostly convert memcpy expansion to use Align
The TTI hooks aren't converted. I also think the intrinsics should
have mandatory alignment and never return MaybeAlign.
2020-03-22 11:21:44 -04:00
Fangrui Song 140d6245af Delete TargetLoweringObjectFile::Ctx
We can use the parent MCObjectFileInfo::Ctx which has the same value.
2020-03-21 22:36:29 -07:00
Fangrui Song b445643632 [X86] Delete unneeded X86ELFTargetObjectFile::Initialize. NFC 2020-03-21 22:03:42 -07:00
Simon Pilgrim 25eb9056d7 [X86] getTargetShuffleAndZeroables - add insert_subvector(undef, sub, c) handling.
We often widen xmm/ymm vectors to ymm/zmm by insertion into an undef base vector. By letting getTargetShuffleAndZeroables track the undef elts we can help avoid a lot of unnecessary cross-lane shuffles.

Fixes PR44694
2020-03-21 19:11:42 +00:00
Simon Pilgrim 4ceade0428 [X86] Combine concat(shufps,shufps) -> shufps(concat,concat)
Now that rG18c19441d105 has improved VPERM2X128 handling, we can perform this to improve x64->x32 truncation without poor cross-lane issues.

Someday combineX86ShufflesRecursively will handle this, but we're still really bad at dealing with different vector widths.
2020-03-21 12:44:10 +00:00
Simon Pilgrim f424d51c3e Revert rGe6a7e3b5e3e7 "[X86][SSE] matchShuffleWithSHUFPD - add support for unary shuffles."
This reverts commit e6a7e3b5e3.

Avoids register pressure regression reported at PR45263
2020-03-21 12:14:19 +00:00
Simon Pilgrim ff3aae6908 Fix Wdocumentation warning. NFCI. 2020-03-21 11:21:57 +00:00
Fangrui Song 85c30f3374 [X86] Reland D71360 Clean up UseInitArray initialization for X86ELFTargetObjectFile
-fuse-init-array is now the CC1 default but TargetLoweringObjectFileELF::UseInitArray still defaults to false.
The following two unknown OS target triples continue using .ctors/.dtors because InitializeELF is not called.

clang -target i386 -c a.c
clang -target x86_64 -c a.c

This cleanup fixes this as a bonus.

X86SpeculativeLoadHardeningPass::tracePredStateThroughCall can call
MCContext::createTempSymbol before TargetLoweringObjectFileELF::Initialize().
We need to call TargetLoweringObjectFileELF::Initialize() ealier.

test/CodeGen/X86/speculative-load-hardening-indirect.ll

Differential Revision: https://reviews.llvm.org/D71360
2020-03-20 21:57:34 -07:00
Eric Christopher fc7233d774 Temporarily Revert "[X86] Reland D71360 Clean up UseInitArray initialization for X86ELFTargetObjectFile"
as it's causing msan failures.

This reverts commit 7899fe9da8.
2020-03-20 17:36:12 -07:00
Fangrui Song df4cc35efd [VE] Fix -Wunused-private-field after D72598 and -Wdeprecated-declarations after D76348 2020-03-20 15:06:58 -07:00
Fangrui Song 7899fe9da8 [X86] Reland D71360 Clean up UseInitArray initialization for X86ELFTargetObjectFile
UseInitArray is now the CC1 default but TargetLoweringObjectFileELF::UseInitArray still defaults to false.
The following two unknown OS target triples continue using .ctors/.dtors because InitializeELF is not called.

clang -target i386 -c a.c
clang -target x86_64 -c a.c

This cleanup fixes this as a bonus.

Differential Revision: https://reviews.llvm.org/D71360
2020-03-20 11:18:36 -07:00
Craig Topper 32fbea1548 [X86] Prevent (bitcast (broadcast_load)) combine from producing vXf16 broadcast instructions.
The combine tries to put the broadcast in either the integer or
fp domain to match the bitcast domain. But we can only do this
if the broadcast size is 32 or larger.
2020-03-20 09:15:07 -07:00
Simon Tatham 1adfa4c991 [ARM,MVE] Add ACLE intrinsics for the vaddv/vaddlv family.
Summary:
I've implemented them as target-specific IR intrinsics rather than
using `@llvm.experimental.vector.reduce.add`, on the grounds that the
'experimental' intrinsic doesn't currently have much code generation
benefit, and my replacements encapsulate the sign- or zero-extension
so that you don't expose the illegal MVE vector type (`<4 x i64>`) in
IR.

The machine instructions come in two versions: with and without an
input accumulator. My new IR intrinsics, like the 'experimental' one,
don't take an accumulator parameter: we represent that by just adding
on the input value using an ordinary i32 or i64 add. So if you write
the `vaddvaq` C-language intrinsic with an input accumulator of zero,
it can be optimised to VADDV, and conversely, if you write something
like `x += vaddvq(y)` then that can be combined into VADDVA.

Most of this is achieved in isel lowering, by converting these IR
intrinsics into the existing `ARMISD::VADDV` family of custom SDNode
types. For the difficult case (64-bit accumulators), isel lowering
already implements the optimization of folding an addition into a
VADDLV to make a VADDLVA; so once we've made a VADDLV, our job is
already done, except that I had to introduce a parallel set of ARMISD
nodes for the //predicated// forms of VADDLV.

For the simpler VADDV, we handle the predicated form by just leaving
the IR intrinsic alone and matching it in an ordinary dag pattern.

Reviewers: dmgreen, MarkMurrayARM, miyuki, ostannard

Reviewed By: dmgreen

Subscribers: kristof.beyls, hiraditya, danielkiss, cfe-commits

Tags: #clang

Differential Revision: https://reviews.llvm.org/D76491
2020-03-20 15:42:33 +00:00
Simon Tatham 45a9945b9e [ARM,MVE] Add ACLE intrinsics for the vminv/vmaxv family.
Summary:
I've implemented these as target-specific IR intrinsics, because
they're not //quite// enough like @llvm.experimental.vector.reduce.min
(which doesn't take the extra scalar parameter). Also this keeps the
predicated and unpredicated versions looking similar, and the
floating-point minnm/maxnm versions fold into the same schema.

We had a couple of min/max reductions already implemented, from the
initial pathfinding exercise in D67158. Those were done by having
separate IR intrinsic names for the signed and unsigned integer
versions; as part of this commit, I've changed them to use a flag
parameter indicating signedness, which is how we ended up deciding
that the rest of the MVE intrinsics family ought to work. So now
hopefully the ewhole lot is consistent.

In the new llc test, the output code from the `v8f16` test functions
looks quite unpleasant, but most of it is PCS lowering (you can't pass
a `half` directly in or out of a function). In other circumstances,
where you do something else with your `half` in the same function, it
doesn't look nearly as nasty.

Reviewers: dmgreen, MarkMurrayARM, miyuki, ostannard

Reviewed By: MarkMurrayARM

Subscribers: kristof.beyls, hiraditya, cfe-commits

Tags: #clang

Differential Revision: https://reviews.llvm.org/D76490
2020-03-20 15:42:33 +00:00
Matt Arsenault a950e3beef AMDGPU: Move towards deprecating alignbit intrinsic
This is equivalent to llvm.fshr, so legalize the intrinsic to the
generic node.
2020-03-20 11:03:04 -04:00
alex-t 6e34e71869 [AMDGPU] Enable divergence driven ISel for ADD/SUB i64
Summary:
Currently we custom select add/sub with carry out to scalar form relying on later replacing them to vector form if necessary.
This change enables custom selection code to take the divergence of adde/addc SDNodes into account and select the appropriate form in one step.

Reviewers: arsenm, vpykhtin, rampitec

Reviewed By: arsenm, vpykhtin

Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, kerbowa

Differential Revision: https://reviews.llvm.org/D76371
2020-03-20 17:06:11 +03:00
Mikhail Maltsev 969034b860 [ARM,CDE] Implement CDE unpredicated Q-register intrinsics
Summary:
This patch implements the following intrinsics:

  uint8x16_t __arm_vcx1q_u8 (int coproc, uint32_t imm);
  T __arm_vcx1qa(int coproc, T acc, uint32_t imm);
  T __arm_vcx2q(int coproc, T n, uint32_t imm);
  uint8x16_t __arm_vcx2q_u8(int coproc, T n, uint32_t imm);
  T __arm_vcx2qa(int coproc, T acc, U n, uint32_t imm);
  T __arm_vcx3q(int coproc, T n, U m, uint32_t imm);
  uint8x16_t __arm_vcx3q_u8(int coproc, T n, U m, uint32_t imm);
  T __arm_vcx3qa(int coproc, T acc, U n, V m, uint32_t imm);

Most of them are polymorphic. Furthermore, some intrinsics are
polymorphic by 2 or 3 parameter types, such polymorphism is not
supported by the existing MVE/CDE tablegen backends, also we don't
really want to have a combinatorial explosion caused by 1000 different
combinations of 3 vector types. Because of this some intrinsics are
implemented as macros involving a cast of the polymorphic arguments to
uint8x16_t.

The IR intrinsics are even more restricted in terms of types: all MVE
vectors are cast to v16i8.

Reviewers: simon_tatham, MarkMurrayARM, dmgreen, ostannard

Reviewed By: MarkMurrayARM

Subscribers: kristof.beyls, hiraditya, danielkiss, cfe-commits

Tags: #clang

Differential Revision: https://reviews.llvm.org/D76299
2020-03-20 14:01:56 +00:00
Mikhail Maltsev d22e661712 [ARM,CDE] Implement CDE S and D-register intrinsics
Summary:
This patch implements the following ACLE intrinsics:

  uint32_t __arm_vcx1_u32(int coproc, uint32_t imm);
  uint32_t __arm_vcx1a_u32(int coproc, uint32_t acc, uint32_t imm);
  uint32_t __arm_vcx2_u32(int coproc, uint32_t n, uint32_t imm);
  uint32_t __arm_vcx2a_u32(int coproc, uint32_t acc, uint32_t n, uint32_t imm);
  uint32_t __arm_vcx3_u32(int coproc, uint32_t n, uint32_t m, uint32_t imm);
  uint32_t __arm_vcx3a_u32(int coproc, uint32_t acc, uint32_t n, uint32_t m, uint32_t imm);

  uint64_t __arm_vcx1d_u64(int coproc, uint32_t imm);
  uint64_t __arm_vcx1da_u64(int coproc, uint64_t acc, uint32_t imm);
  uint64_t __arm_vcx2d_u64(int coproc, uint64_t m, uint32_t imm);
  uint64_t __arm_vcx2da_u64(int coproc, uint64_t acc, uint64_t m, uint32_t imm);
  uint64_t __arm_vcx3d_u64(int coproc, uint64_t n, uint64_t m, uint32_t imm);
  uint64_t __arm_vcx3da_u64(int coproc, uint64_t acc, uint64_t n, uint64_t m, uint32_t imm);

Since the semantics of CDE instructions is opaque to the compiler, the
ACLE intrinsics require dedicated LLVM IR intrinsics. The 64-bit and
32-bit variants share the same IR intrinsic.

Reviewers: simon_tatham, MarkMurrayARM, ostannard, dmgreen

Reviewed By: MarkMurrayARM

Subscribers: kristof.beyls, hiraditya, danielkiss, cfe-commits

Tags: #clang

Differential Revision: https://reviews.llvm.org/D76298
2020-03-20 14:01:53 +00:00
Mikhail Maltsev 7a85e3585e [ARM,CDE] Implement GPR CDE intrinsics
Summary:
This change implements ACLE CDE intrinsics that translate to
instructions working with general-purpose registers.

The specification is available at
https://static.docs.arm.com/101028/0010/ACLE_2019Q4_release-0010.pdf

Each ACLE intrinsic gets a corresponding LLVM IR intrinsic (because
they have distinct function prototypes). Dual-register operands are
represented as pairs of i32 values. Because of this the instruction
selection for these intrinsics cannot be represented as TableGen
patterns and requires custom C++ code.

Reviewers: simon_tatham, MarkMurrayARM, dmgreen, ostannard

Reviewed By: MarkMurrayARM

Subscribers: kristof.beyls, hiraditya, danielkiss, cfe-commits

Tags: #clang

Differential Revision: https://reviews.llvm.org/D76296
2020-03-20 14:01:51 +00:00
Adrian Kuegel baa6f6a782 Revert "[TableGen][GlobalISel] Account for HwMode in RegisterBank register sizes"
This reverts commit e9f22fd429.

When building with -DLLVM_USE_SANITIZER="Thread", check-llvm has 70
failing tests with this revision, and 29 without this revision.
2020-03-20 11:02:50 +01:00
David Green b3499f572d [ARM] Change VDUP type to i32 for MVE
The MVE VDUP instruction take a GPR and splats into every lane of a
vector register. Unlike NEON we do not have a VDUPLANE equivalent
instruction, doing the same splat from a fp register. Previously a VDUP
to a v4f32/v8f16 would be represented as a (v4f32 VDUP f32), which
would mean the instruction pattern needs to add a COPY_TO_REGCLASS to
the GPR.

Instead this now converts that earlier during an ISel DAG combine,
converting (VDUP x) to (VDUP (bitcast x)). This can allow instruction
selection to tell that the input needs to be an i32, which in one of the
testcases allows it to use ldr (or specifically ldm) over (vldr;vmov).

Whilst being simple enough for floats, as the types sizes are the same,
these is no BITCAST equivalent for getting a half into a i32. This uses
a VMOVrh ARMISD node, which doesn't know the same tricks yet.

Differential Revision: https://reviews.llvm.org/D76292
2020-03-20 09:48:45 +00:00
Roger Ferrer Ibanez 3c24aee7ee [RISCV] Select +0.0 immediate using fmv.{w,d}.x / fcvt.d.w
Floating point positive zero can be selected using fmv.w.x / fmv.d.x /
fcvt.d.w and the zero source register.

Differential Revision: https://reviews.llvm.org/D75729
2020-03-20 09:42:24 +00:00
Austin Kerbow 2cbb8c946a [AMDGPU] Reuse register during frame index elimination
If there were no free VGPRs we would need two emergency spill slots for register
scavenging during PEI/frame index elimination. Reuse 'ResultReg' for scale
calculation so that only one spill is needed.

Differential Revision: https://reviews.llvm.org/D76387
2020-03-20 00:19:15 -07:00
cdevadas 728b878de6 [AMDGPU] Set the CostPerUse value for vgpr registers.
Apart from the argument registers, set the CostPerUse
value as per the ratio reg_index/allocation_granularity.
It is a pre-commit for introducing the scratch registers
in the ABI. This change should help in a balanced
register allocation.

Differential Revision: https://reviews.llvm.org/D76417
2020-03-20 11:49:35 +05:30
Wei Mi a035726e5a Revert "Generate Callee Saved Register (CSR) related cfi directives like .cfi_restore."
This reverts commit 3c96d01d2e. Got report that it caused test failures in libc++.
2020-03-19 22:45:27 -07:00
Yuta Saito 08670d435b [WebAssembly] Support swiftself and swifterror for WebAssembly target
Summary:
Swift ABI is based on basic C ABI described here https://github.com/WebAssembly/tool-conventions/blob/master/BasicCABI.md
Swift Calling Convention on WebAssembly is a little deffer from swiftcc
on another architectures.

On non WebAssembly arch, swiftcc accepts extra parameters that are
attributed with swifterror or swiftself by caller. Even if callee
doesn't have these parameters, the invocation succeed ignoring extra
parameters.

But WebAssembly strictly checks that callee and caller signatures are
same. https://github.com/WebAssembly/design/blob/master/Semantics.md#calls
So at WebAssembly level, all swiftcc functions end up extra arguments
and all function definitions and invocations explicitly have additional
parameters to fill swifterror and swiftself.

This patch support signature difference for swiftself and swifterror cc
is swiftcc.

e.g.
```
declare swiftcc void @foo(i32, i32)
@data = global i8* bitcast (void (i32, i32)* @foo to i8*)
define swiftcc void @bar() {
  %1 = load i8*, i8** @data
  %2 = bitcast i8* %1 to void (i32, i32, i32)*
  call swiftcc void %2(i32 1, i32 2, i32 swiftself 3)
  ret void
}
```

For swiftcc, emit additional swiftself and swifterror parameters
if there aren't while lowering. These additional parameters are added
for both callee and caller.
They are necessary to match callee and caller signature for direct and
indirect function call.

Differential Revision: https://reviews.llvm.org/D76049
2020-03-19 17:39:52 -07:00
Thomas Lively 34db3c3a18 [WebAssembly] SIMD integer abs instructions
Summary:
These were merged to the SIMD proposal in
https://github.com/WebAssembly/simd/pull/128.

Depends on D76397 to avoid merge conflicts.

Reviewers: aheejin

Subscribers: dschuff, sbc100, jgravelle-google, hiraditya, sunfish, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D76399
2020-03-19 17:25:58 -07:00
Thomas Lively a3f974f3c3 [WebAssembly] SIMD bitmask intrinsics and builtin functions
Summary:
These experimental new instructions are proposed in
https://github.com/WebAssembly/simd/pull/201.

Reviewers: aheejin

Subscribers: dschuff, sbc100, jgravelle-google, hiraditya, sunfish, cfe-commits

Tags: #clang

Differential Revision: https://reviews.llvm.org/D76397
2020-03-19 17:15:37 -07:00
Matt Arsenault 678da7b109 AMDGPU/GlobalISel: Remove leftover #if 0
The subtarget feature used to be missing from subtargets, but that was
fixed.
2020-03-19 20:07:05 -04:00
Stefan Agner f87563661d [MC][ARM] add implicit immediate form for ldrsbt/ldrht/ldrsht
Add pseudo instructions for ldrsbt/ldrht/ldrsht with implicit immediate
and add fall back C++ code to transform the instruction to the
equivalent LDRSBTi/LDRHTi/LDRSHTi form.

This is similar to how it has been done in commit
fb3950ec63

This fixes:
https://bugs.llvm.org/show_bug.cgi?id=45070
2020-03-19 22:36:42 +01:00
Scott Linder 0e9368cc8c [AMDGPU] Move frame pointer from s34 to s33
Remove the gap left between the stack pointer (s32) and frame pointer
(s34) now that the scratch wave offset is no longer a part of the
calling convention ABI.

Update llvm/docs/AMDGPUUsage.rst to reflect the change.

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D75657
2020-03-19 15:35:16 -04:00
Scott Linder 60b1967c39 [AMDGPU] Add Scratch Wave Offset to Scratch Buffer Descriptor in entry functions
Add the scratch wave offset to the scratch buffer descriptor (SRSrc) in
the entry function prologue. This allows us to removes the scratch wave
offset register from the calling convention ABI.

As part of this change, allow the use of an inline constant zero for the
SOffset of MUBUF instructions accessing the stack in entry functions
when a frame pointer is not requested/required. Entry functions with
calls still need to set up the calling convention ABI stack pointer
register, and reference it in order to address arguments of called
functions. The ABI stack pointer register remains unswizzled, but is now
wave-relative instead of queue-relative.

Non-entry functions also use an inline constant zero SOffset for
wave-relative scratch access, but continue to use the stack and frame
pointers as before. When the stack or frame pointer is converted to a
swizzled offset it is now scaled directly, as the scratch wave offset no
longer needs to be subtracted first.

Update llvm/docs/AMDGPUUsage.rst to reflect these changes to the calling
convention.

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D75138
2020-03-19 15:35:16 -04:00
Scott Linder db099f994b [AMDGPU][NFC] Refactor some uses of unsigned to Register
Tags: #llvm

Differential Revision: https://reviews.llvm.org/D76035
2020-03-19 15:35:16 -04:00
Scott Linder 30bb113beb [AMDGPU][NFC] Refactor emitEntryFunctionPrologue
Remove dead code and factor repeated conditions out into a single check.
Rename and move code to make it more obvious what is running only for
entry functions. Simplify function arguments to make it clearer what the
relevant inputs are. Make flat scratch init accept an MBB iterator and
move it to where it was logically being emitted within the prologue.

These changes will make a future update to the calling convention
simpler.

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D75092
2020-03-19 15:35:16 -04:00
Cameron McInally 018dde4ce5 [AArch64][SVE] Add support for DestructiveBinaryImm DestructiveInstType
Support prefixing destructive operations, with the MOVPRFX instruction, to build constructive operations.

Differential Revision: https://reviews.llvm.org/D75064
2020-03-19 13:11:46 -05:00
Craig Topper c13aa36bb7 [X86] Attempt to more accurately model the cost of a bool reduction of wide vector type.
Previously we multiplied the cost for the table entries by the number of splits needed. But that implies that each split goes through a reduction to scalar independently. I think what really happens is that the we AND/OR the split pieces until we're down to a single value with a legal type and then do special reduction sequence on that.

So to model that this patch takes the number of splits minus one multiplied by the cost of a AND/OR at the legal element count and adds that on top of the table lookup.

Differential Revision: https://reviews.llvm.org/D76400
2020-03-19 09:31:05 -07:00
Djordje Todorovic d9b9621009 Reland D73534: [DebugInfo] Enable the debug entry values feature by default
The issue that was causing the build failures was fixed with the D76164.
2020-03-19 13:57:30 +01:00
Andrzej Warzynski 0ea4fb5bb7 [AArch64][SVE] Rename intrinsics for gather prefetch [NFC]
Summary:
In order to keep the names consistent with other SVE gather loads, the
intrinsics for gather prefetch are renamed as follows:
  * @llvm.aarch64.sve.gather.prfb -> @llvm.aarch64.sve.prfb.gather

Reviewed by: fpetrogalli

Differential Revision: https://reviews.llvm.org/D76421
2020-03-19 12:53:36 +00:00
Chen Zheng 3f85134d71 [PowerPC] implement target hook isProfitableToHoist
On Powerpc fma is faster than fadd + fmul for some types,
(PPCTargetLowering::isFMAFasterThanFMulAndFAdd). we should implement target
hook isProfitableToHoist to prevent simplifyCFGpass from breaking fma
pattern by hoisting fmul to predecessor block.

Reviewed By: nemanjai

Differential Revision: https://reviews.llvm.org/D76207
2020-03-19 00:17:25 -04:00
Chen Zheng aacf022cd5 [PowerPC] add IR level isFMAFasterThanFMulAndFAdd - NFC
And also refactor legacy MIR level isFMAFasterThanFMulAndFAdd.

Reviewed By: steven.zhang

Differential Revision: https://reviews.llvm.org/D76265
2020-03-18 23:24:40 -04:00
Eli Friedman e24e95fe90 Remove CompositeType class.
The existence of the class is more confusing than helpful, I think; the
commonality is mostly just "GEP is legal", which can be queried using
APIs on GetElementPtrInst.

Differential Revision: https://reviews.llvm.org/D75660
2020-03-18 13:53:17 -07:00
lewis-revill e9f22fd429 [TableGen][GlobalISel] Account for HwMode in RegisterBank register sizes
This patch generates TableGen descriptions for the specified register
banks which contain a list of register sizes corresponding to the
available HwModes. The appropriate size is used during codegen according
to the current HwMode. As this HwMode was not available on generation,
it is set upon construction of the RegisterBankInfo class. Targets
simply need to provide the HwMode argument to the
<target>GenRegisterBankInfo constructor.

The RISC-V RegisterBankInfo constructor has been updated accordingly
(plus an unused argument removed).

Differential Revision: https://reviews.llvm.org/D76007
2020-03-18 19:52:23 +00:00
Nemanja Ivanovic e009fad342 [PowerPC] Remove UB from PPCInstrInfo when handling rotates fed by constants
As pointed out in https://bugs.llvm.org/show_bug.cgi?id=45232 this code can
end up shifting a 64-bit unsigned value left by 64 bits. Althought this works
as expected on some platforms it is definitely UB. This patch removes the UB
and adds the associated test case.

Fixes: https://bugs.llvm.org/show_bug.cgi?id=45232
2020-03-18 13:40:39 -05:00
Simon Tatham e13d153c1b [ARM,MVE] Add intrinsics for the VQDMLAD family.
Summary:
This is another set of instructions too complicated to be sensibly
expressed in IR by anything short of a target-specific intrinsic.
Given input vectors a,b, the instruction generates intermediate values
2*(a[0]*b[0]+a[1]+b[1]), 2*(a[2]*b[2]+a[3]+b[3]), etc; takes the high
half of each double-width values, and overwrites half the lanes in the
output vector c, which you therefore have to provide the input value
of. Optionally you can swap the elements of b so that the are things
like a[0]*b[1]+a[1]*b[0]; optionally you can round to nearest when
taking the high half; and optionally you can take the difference
rather than sum of the two products. Finally, saturation is applied
when converting back to a single-width vector lane.

Reviewers: dmgreen, MarkMurrayARM, miyuki, ostannard

Reviewed By: miyuki

Subscribers: kristof.beyls, hiraditya, cfe-commits

Tags: #clang

Differential Revision: https://reviews.llvm.org/D76359
2020-03-18 17:11:22 +00:00
Matt Arsenault 4ea1baf6a0 AMDGPU: Initial, crude support for indirect calls
This isn't really usable, and requires using the
-amdgpu-fixed-function-abi flag to work.

Assumes a uniform call target, and will hit a verifier error if the
call target ends up in a VGPR. Also doesn't attempt to do anything
sensible for the reported register/stack usage.
2020-03-18 12:03:48 -04:00
Matt Arsenault ea4597eef1 Reapply "AMDGPU/GlobalISel: Fully handle 0 dmask case during legalize"
This reverts commit 9bca8fc4cf.

Rearrange handling to avoid changing the instruction in the case where
it's going to be erased and replaced with undef.
2020-03-18 12:01:22 -04:00
Piotr Sobczak d1a7bfca74 [AMDGPU] Fix AMDGPUUnifyDivergentExitNodes
Summary:
For the case where "done" bits on existing exports are removed
by unifyReturnBlockSet(), unify all return blocks - even the
uniformly reached ones. We do not want to end up with a non-unified,
uniformly reached block containing a normal export with the "done"
bit cleared.

That case is believed to be rare - possible with infinite loops
in pixel shaders.

This is a fix for D71192.

Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, kerbowa, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D76364
2020-03-18 16:49:30 +01:00
Chris Bowler c21866476e [PowerPC][AIX] Implement by-val caller arguments in a single register.
This is the first of a series of patches that adds caller support for
by-value arguments. This patch add support for arguments that are passed in a
single GPR.

There are 3 limitation cases:
-The by-value argument is larger than a single register.
-There are no remaining GPRs even though the by-value argument would
otherwise fit in a single GPR.
-The by-value argument requires alignment greater than register width.

Future patches will be required to add support for these cases as well
as for the callee handling (in LowerFormalArguments_AIX) that
corresponds to this work.

Differential Revision: https://reviews.llvm.org/D75863
2020-03-18 10:57:28 -04:00
Oliver Stannard 73cea83a6f [IPRA][ARM] Spill extra registers at -Oz
When optimising for code size at the expense of performance, it is often
worth saving and restoring some of r0-r3, if IPRA will be able to take
advantage of them. This doesn't cost any extra code size if we already
have a PUSH/POP pair, and increases the number of available registers
across any calls to the function.

We already have an optimisation which tries fold the subtract/add of the
SP into the PUSH/POP by using extra registers, which somewhat conflicts
with this. I've made the new optimisation less aggressive in cases where
the existing one is likely to trigger, which gives better results than
either of these optimisations by themselves.

Differential revision: https://reviews.llvm.org/D69936
2020-03-18 13:51:16 +00:00
Guillaume Chatelet d000655a8c [Alignment][NFC] Deprecate getMaxAlignment
Summary:
This is patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790

Reviewers: courbet

Subscribers: jholewinski, arsenm, dschuff, jyknight, sdardis, nemanjai, jvesely, nhaehnle, sbc100, jgravelle-google, hiraditya, aheejin, kbarton, fedor.sergeev, asb, rbar, johnrusso, simoncook, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, PkmX, jocewei, Jim, lenary, s.egerton, pzheng, sameer.abuasal, apazos, luismarques, kerbowa, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D76348
2020-03-18 14:48:45 +01:00
Oliver Stannard 6739805e24 [ARM] Track epilogue instructions with FrameDestroy flag (NFC)
Rather than trying to work out which instructions are part of the
epilogue by examining them, we can just mark them with the FrameDestroy
flag, like we do in the AArch64 backend.
2020-03-18 13:32:59 +00:00
Francesco Petrogalli 9bdcd9bf44 [llvm][SVE] Addressing mode for FF/NF loads.
Summary:
This patch adds addressing mode computation for the following SVE
instructions:

* ldff1{s}<T1> { <Zt>.<T2> }, <Pg>/Z, [<Xn|SP>{, <Xm>{, lsl #imm}}]
* ldnf1{s}<T1> { <Zt>.<T2> }, <Pg>/Z, [<Xn|SP>{, #<imm>, mul vl}]

Reviewers: andwar, sdesmalen, rengolin, efriedma

Subscribers: tschuett, hiraditya, rkruppe, psnobl, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D76209
2020-03-18 12:46:07 +00:00
Sander de Smalen 4788ca450f [AArch64][SVE] Change pointer type of nontemporal load/store intrinsics
Summary:
This fixes a discrepancy between the non-temporal loads/store
intrinsics and other SVE load intrinsics (such as nf/ff), so
that Clang can use the same code to generate these intrinsics.

Reviewers: andwar, kmclaughlin, rengolin, efriedma

Reviewed By: efriedma

Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D76237
2020-03-18 12:44:51 +00:00
Simon Tatham 928776de92 [ARM,MVE] Add intrinsics for the VQDMLAH family.
Summary:
These are complicated integer multiply+add instructions with extra
saturation, taking the high half of a double-width product, and
optional rounding. There's no sensible way to represent that in
standard IR, so I've converted the clang builtins directly to
target-specific intrinsics.

Reviewers: dmgreen, MarkMurrayARM, miyuki, ostannard

Reviewed By: miyuki

Subscribers: kristof.beyls, hiraditya, cfe-commits

Tags: #clang

Differential Revision: https://reviews.llvm.org/D76123
2020-03-18 10:55:04 +00:00
Simon Tatham 28c5d97bee [ARM,MVE] Add intrinsics and isel for MVE integer VMLA.
Summary:
These instructions compute multiply+add in integers, with one of the
operands being a splat of a scalar. (VMLA and VMLAS differ in whether
the splat operand is a multiplier or the addend.)

I've represented these in IR using existing standard IR operations for
the unpredicated forms. The predicated forms are done with target-
specific intrinsics, as usual.

When operating on n-bit vector lanes, only the bottom n bits of the
i32 scalar operand are used. So we have to tell that to isel lowering,
to allow it to remove a pointless sign- or zero-extension instruction
on that input register. That's done in `PerformIntrinsicCombine`, but
first I had to enable `PerformIntrinsicCombine` for MVE targets
(previously all the intrinsics it handled were for NEON), and make it
a method of `ARMTargetLowering` so that it can get at
`SimplifyDemandedBits`.

Reviewers: dmgreen, MarkMurrayARM, miyuki, ostannard

Reviewed By: dmgreen

Subscribers: kristof.beyls, hiraditya, danielkiss, cfe-commits

Tags: #clang

Differential Revision: https://reviews.llvm.org/D76122
2020-03-18 10:55:04 +00:00
Guillaume Chatelet c3df69faa0 [Alignment][NFC] Deprecate getTransientStackAlignment
Summary:
This is patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790

Reviewers: courbet

Subscribers: jholewinski, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D76301
2020-03-18 09:02:48 +01:00
Pengfei Wang 974d649f8e CET for Exception Handle
Summary:
Bug fix for https://bugs.llvm.org/show_bug.cgi?id=45182
Exception handle may indirectly jump to catch pad, So we should add ENDBR instruction before catch pad instructions.

Reviewers: craig.topper, hjl.tools, LuoYuanke, annita.zhang, pengfei

Reviewed By: LuoYuanke

Subscribers: hiraditya, llvm-commits

Patch By: Xiang Zhang (xiangzhangllvm)

Differential Revision: https://reviews.llvm.org/D76190
2020-03-17 22:35:05 -07:00
Vitaly Buka 9bca8fc4cf Revert "AMDGPU/GlobalISel: Fully handle 0 dmask case during legalize"
The patch introduced use-after-poison.

This reverts commit d0fe13ecf9.
2020-03-17 22:04:14 -07:00
Nico Weber 4e0fe038f4 Revert "Avoid emitting unreachable SP adjustments after `throw`"
This reverts commit 65b21282c7.
Breaks sanitizer bots (https://reviews.llvm.org/D75712#1927668)
and causes https://crbug.com/1062021 (which may or may not
be a compiler bug, not clear yet).
2020-03-17 20:49:22 -04:00
Matt Arsenault c9b454a1b7 AMDGPU/GlobalISel: Fix verifier errors on image atomics 2020-03-17 20:06:25 -04:00
Scott Linder 68f163df0e [AMDGPU] Print DWARF register numbers in AMDGPUInstPrinter
Summary:
Explanation is in a comment in the diff, but essentially printing a
physical register name here is ambiguous. Until we can implement
printing a DWARF register name here just use the encoding directly.

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D76253
2020-03-17 19:42:10 -04:00
Simon Pilgrim 68224c1952 [TargetLowering] Only demand a rotation's modulo amount bits
ISD::ROTL/ROTR rotation values are guaranteed to act as a modulo amount, so for power-of-2 bitwidths we only need the lowest bits.

Differential Revision: https://reviews.llvm.org/D76201
2020-03-17 21:23:46 +00:00
Scott Constable 080dd10f7d Move RDF from Hexagon to Codegen
RDF is designed to be target agnostic. Therefore it would be useful to have it available for other targets, such as X86.

Based on a previous patch by Krzysztof Parzyszek

Differential Revision: https://reviews.llvm.org/D75932
2020-03-17 12:43:14 -07:00
Sebastian Neubauer 6e29846b29 [AMDGPU] Fix whole wavefront mode
We cannot move wwm over exec copies because the exec register needs an exact exec mask.

Differential Revision: https://reviews.llvm.org/D76232
2020-03-17 17:23:23 +01:00
Simon Pilgrim c9656a3b31 [DAGCombiner] matchRotateSub - handle shift amount truncation
Under certain circumstances we'll end up in the position where the negated shift amount will get truncated to the type specified getScalarShiftAmountTy(), so we need to test for a truncated version of the shift amount as well.

This allows us to remove half of the remaining patterns tested for by X86ISelLowering's combineOrShiftToFunnelShift.
2020-03-17 16:01:23 +00:00
Matt Arsenault 039c917b43 AMDGPU/GlobalISel: Fix asserting on gather4 intrinsics 2020-03-17 11:07:30 -04:00
alex-t 48a9cf9043 [AMDGPU] Enable SEXT divergence driven selection.
Summary: This change enable the divergence driven selection for the SEXT DAG opcode.

Reviewers: vpykhtin, rampitec

Reviewed By: vpykhtin

Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, kerbowa, llvm-commits

Differential Revision: https://reviews.llvm.org/D76230
2020-03-17 17:30:11 +03:00
Simon Atanasyan 73b1da1605 [MIPS] Implement MIPS3D vector instructions
Patch by Michael Roe.

Differential Revision: https://reviews.llvm.org/D76247
2020-03-17 17:17:51 +03:00
Matt Arsenault d0fe13ecf9 AMDGPU/GlobalISel: Fully handle 0 dmask case during legalize
For normal loads, fully eliminate the load. For the TFE case, adjust
the dmask value in the instruction so the selector doesn't need to
handle it. For the TFE special case, I guess it would be possible to
replace the loaded data register with undef, but as-is this will start
treating it as a well defined value.
2020-03-17 10:15:30 -04:00
Matt Arsenault d9a012ed8a AMDGPU/GlobalISel: Adjust image load register type based on dmask
Trim elements that won't be written. The equivalent still needs to be
done for writes. Also start widening 3 elements to 4
elements. Selection will get the count from the dmask.
2020-03-17 10:09:18 -04:00
Matt Arsenault 83ffbf2618 AMDGPU/GlobalISel: Legalize non-a16 non-NSA images 2020-03-17 10:02:09 -04:00
Matt Arsenault 2aba9b6cf8 AMDGPU/GlobalISel: Legalize a16 images
Pack the address registers in the legalizer. Avoid introducing a huge
family of new intermediate operations by filling dead operands with
noreg.
2020-03-17 10:02:09 -04:00
Kazushi (Jam) Marukawa 6bbbead7be [VE] Move VEInstPrinter.cpp and VEInstPrinter.h into MCTargetDesc
Summary:
Move them into MCTargetDesc to follow other architectures (a263aa2).

Reviewed By: simoll

Differential Revision: https://reviews.llvm.org/D76270
2020-03-17 11:33:54 +01:00
QingShan Zhang b83490bdb7 [PowerPC] Fix a typo of the condition of checking the fusion candidate 2020-03-17 10:04:18 +00:00
QingShan Zhang 0b126eec6d [NFC][PowerPC] Simplify the logic in lower select_cc
The logic in select_cc is messy and hard to follow. This is a NFC patch to simplify the logic.

Differential Revision: https://reviews.llvm.org/D75834
2020-03-17 03:47:39 +00:00
Vitaly Buka f20dcc31e3 Fix unused function warning 2020-03-16 19:45:36 -07:00
Shengchen Kan 39bcc76a92 [X86] Disable nop padding before instruction following hardcode
Reviewers: reames, MaskRay, craig.topper, LuoYuanke, jyknight

Reviewed By: LuoYuanke

Subscribers: annita.zhang, llvm-commits, hiraditya

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D76176
2020-03-17 09:45:12 +08:00
Craig Topper 85726bbcba [X86] Disable fast-isel call lowering for functions with vXi1 arguments on avx512.
This fails an assert because the type is marked in the calling
convention td file as needing promotion, but the code doesn't know
how to do it.

It also much more complicated because we try to pass these in
xmm/ymm/zmm registers. As of a few weeks ago we do this promotion
from getRegisterTypeForCallingConv before the td file generated
code gets involved.
2020-03-16 18:20:42 -07:00
Eric Christopher 8b3b04eb41 Make isValidImmForSVEVecImmAddrMode inline static rather than just static.
Fixes -Werror builds.
2020-03-16 17:39:01 -07:00
Evgenii Stepanov 2a3723ef11 [memtag] Plug in stack safety analysis.
Summary:
Run StackSafetyAnalysis at the end of the IR pipeline and annotate
proven safe allocas with !stack-safe metadata. Do not instrument such
allocas in the AArch64StackTagging pass.

Reviewers: pcc, vitalybuka, ostannard

Reviewed By: vitalybuka

Subscribers: merge_guards_bot, kristof.beyls, hiraditya, cfe-commits, gilang, llvm-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D73513
2020-03-16 16:35:25 -07:00
Sriraman Tallam df082ac45a Basic Block Sections support in LLVM.
This is the second patch in a series of patches to enable basic block
sections support.

This patch adds support for:

* Creating direct jumps at the end of basic blocks that have fall
through instructions.
* New pass, bbsections-prepare, that analyzes placement of basic blocks
in sections.
* Actual placing of a basic block in a unique section with special
handling of exception handling blocks.
* Supports placing a subset of basic blocks in a unique section.
* Support for MIR serialization and deserialization with basic block
sections.

Parent patch : D68063
Differential Revision: https://reviews.llvm.org/D73674
2020-03-16 16:06:54 -07:00
Craig Topper 378b1e6080 [X86] Assign avx512bf16 instructions to the SSEPackedSingle ExeDomain. 2020-03-16 14:07:01 -07:00
Benjamin Kramer 05ff3323e0 [AArch64] Remove unused variable 2020-03-16 21:59:55 +01:00
Francesco Petrogalli 0f2b68d9c7 Implement IR intrinsics for gather prefetch.
Summary:
Intrinsics and relative codegen has been implemented for the following
SVE instructions:

1. PRF<T> <prfop>, <Pg>, [<Xn|SP>, <Zm>.S, <mod>] -> 32-bit          scaled offset
2. PRF<T> <prfop>, <Pg>, [<Xn|SP>, <Zm>.D, <mod>] -> 32-bit unpacked scaled offset
3. PRF<T> <prfop>, <Pg>, [<Xn|SP>, <Zm>.D]        -> 64-bit          scaled offset
4. PRF<T> <prfop>, <Pg>, [<Zn>.S{, #<imm>}]       -> 32-bit element
5. PRF<T> <prfop>, <Pg>, [<Zn>.D{, #<imm>}]       -> 64-bit element

The instructions are associated the following intrinsics, respectively:

1. void @llvm.aarch64.sve.gather.prf<T>.scaled.<mod>.nx4vi32(
          i8* %base,
          <vscale x 4 x i32> %offset,
          <vscale x 4 x i1> %Pg,
          i32 %prfop)

2. void @llvm.aarch64.sve.gather.prf<T>.scaled.<mod>.nx2vi32(
          i8* %base,
          <vscale x 2 x i32> %offset,
          <vscale x 2 x i1> %Pg,
          i32 %prfop)

3. void @llvm.aarch64.sve.gather.prf<T>.scaled.nx2vi64(
          i8* %base,
          <vscale x 2 x i64> %offset,
          <vscale x 2 x i1> %Pg,
          i32 %prfop)

4. void @llvm.aarch64.sve.gather.prf<T>.nx4vi32(
          <vscale x 4 x i32> %bases,
          i64 %imm,
          <vscale x 4 x i1> %Pg,
          i32 %prfop)

5. void @llvm.aarch64.sve.gather.prf<T>.nx2vi64(
          <vscale x 2 x i64> %bases,
          i64 %imm,
          <vscale x 2 x i1> %Pg,
          i32 %prfop)

The intrinsics are the IR counterpart of the following SVE ACLE functions:

* void svprf<T>(svbool_t pg, const void *base, svprfop op)
* void svprf<T>_vnum(svbool_t pg, const void *base, int64_t vnum, svprfop op)
* void svprf<T>_gather[_u32base](svbool_t pg, svuint32_t bases, svprfop op)
* void svprf<T>_gather[_u64base](svbool_t pg, svuint64_t bases, svprfop op)
* void svprf<T>_gather_[s32]offset(svbool_t pg, const void *base, svint32_t offsets, svprfop op)
* void svprf<T>_gather_[u32]offset(svbool_t pg, const void *base, svint32_t offsets, svprfop op)
* void svprf<T>_gather_[s64]offset(svbool_t pg, const void *base, svint64_t offsets, svprfop op)
* void svprf<T>_gather_[u64]offset(svbool_t pg, const void *base, svint64_t offsets, svprfop op)
* void svprf<T>_gather[_u32base]_offset(svbool_t pg, svuint32_t bases, int64_t offset, svprfop op)
* void svprf<T>_gather[_u64base]_offset(svbool_t pg, svuint64_t bases,int64_t offset, svprfop op)

Reviewers: andwar, sdesmalen, efriedma, rengolin

Subscribers: tschuett, hiraditya, rkruppe, psnobl, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D75580
2020-03-16 18:52:35 +00:00
Simon Pilgrim ebb181cf40 [X86] matchScalarReduction - add support for partial reductions
Add optional support for opt-in partial reduction cases by providing an optional partial mask to indicate which elements have been extracted for the scalar reduction.
2020-03-16 18:01:02 +00:00
Matt Arsenault 80b627d69d AMDGPU/GlobalISel: Fix handling of G_ANYEXT with s1 source
We were letting G_ANYEXT with a vcc register bank through, which was
incorrect and would select to an invalid copy. Fix this up like G_ZEXT
and G_SEXT. Also drop old code to fixup the non-boolean case in
RegBankSelect. We now have to perform that expansion during selection,
so there's no benefit to doing it during RegBankSelect.
2020-03-16 12:59:54 -04:00
Matt Arsenault c460dc6eeb AMDGPU/GlobalISel: Fix some illegal scalar argument types
Fixes integers that don't evenly divide to i32 pieces. We should
probably extract some of the code in the legalizer to start handling
argument breakdowns. I'm dissatisfied with the argument lowering's
handling of vectors for example, and we should not be producing the
weird G_EXTRACTs we do now.
2020-03-16 12:51:23 -04:00
Matt Arsenault 84386b2d8a AMDGPU: Drop special case f64 fround lowering
The result is better if ftrunc is emitted and separately legalized
when unavailable.
2020-03-16 12:09:30 -04:00
Matt Arsenault 57d896e838 AMDGPU/GlobalISel: Make some large merges legal
We allow up to 1024-bit registers, so we should support merges all the
way to the maximum.
2020-03-16 10:49:10 -04:00
Simon Pilgrim e43a085781 [X86] X86::isConstantSplat - enable partial undef bit handling by default.
We currently only ever use this for lowering constant uniform values (shift/rotate by immediate) so we can safely enable it by default (it treats the undef bits as zero when extracting constants).

This is necessary for an upcoming patch that will use SimplifyDemandedBits more aggressively on funnel shift amounts and causes regressions in vXi64 constant without it.
2020-03-16 12:56:24 +00:00
Simon Pilgrim ac4609cb1d [X86] LowerRotate - use X86::isConstantSplat to detect constant splat rotation amounts.
Avoid code duplication and matches what we do for the similar LowerFunnelShift and LowerScalarImmediateShift methods.
2020-03-16 12:56:23 +00:00
Jonas Paulsson 132f25bcca [SystemZ] Avoid scalarization of [SU]INT_TO_FP ISD-nodes.
The type legalizer will scalarize vector conversions from integer to floating
point if the source element size is less than that of the result.

This is avoided now by inserting a zero/sign-extension of the source vector
before type legalization.

Review: Ulrich Weigand

Differential revision: https://reviews.llvm.org/D75978
2020-03-16 13:07:42 +01:00
Shengchen Kan b1a7a245ec [NFC][MC] Rename alignBranches* to emitInstruction*
alignBranches is X86 specific, change the name in a
more general one since other target can do some state
chang before and after emitting the instruction.
2020-03-16 17:13:14 +08:00
Simon Atanasyan e0ab0e6a28 [MIPS] Implement PUL.PS and PUU.PS instructions
Patch by Michael Roe.

Differential Revision: https://reviews.llvm.org/D75812
2020-03-16 09:39:47 +03:00
Philip Reames a79863f2f7 Support prefix padding for alignment purposes (Relaxable instructions only)
Now that D75203 has landed and baked for a few days, extend the basic approach to prefix padding as well. The patch itself is fairly straight forward.

For the moment, this patch adds the functional support and some basic testing there of, but defaults to not enabling prefix padding. I want to be able to phrase a separate patch which adds the target specific reasoning and test it cleanly. I haven't decided whether I want to common it with the nop logic or not.

Differential Revision: https://reviews.llvm.org/D75300
2020-03-15 19:53:41 -07:00
Craig Topper b2da1ddaef [X86] Add a non-zero cost for truncating v32i16->v32i8 on avx512bw. 2020-03-15 17:18:46 -07:00
Benjamin Kramer caef4a81c9 [AVR] Make helper functions static. NFC. 2020-03-15 16:50:15 +01:00
Sander de Smalen 8105935d3a [TypeSize] Allow returning scalable size in implicit conversion to uint64_t
This patch removes compiler runtime assertions that ensure the implicit
conversion are only guaranteed to work for fixed-width vectors.

With the assert it would be impossible to get _anything_ to build until
the
entire codebase has been upgraded, even when the indiscriminate uses of
the size as uint64_t would work fine for both scalable and fixed-width
types.

This issue will need to be addressed differently, with build-time errors
rather than assertion failures, but that effort falls beyond the scope
of this patch.

Returning the scalable size and avoiding the assert in getFixedSize()
is a temporary stop-gap in order to use LLVM for compiling and using
the SVE ACLE intrinsics.

Reviewers: efriedma, huntergr, rovka, ctetreau, rengolin

Reviewed By: efriedma

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D75297
2020-03-15 13:48:49 +00:00
Krzysztof Parzyszek 3656558cec [Hexagon] Only allow single HVX vector loads/stores in lowering
This will prevent store widening from forming vector pair stores,
which eventually end up broken up into single stores.
2020-03-14 14:26:01 -05:00
Simon Pilgrim ee862adf60 Fix signed/unsigned comparison warning. 2020-03-14 18:42:27 +00:00
Simon Pilgrim 0cb2f089c1 [X86] getFauxShuffleMask - pull out repeated byte sizes varaibles. NFC. 2020-03-14 17:36:17 +00:00
Simon Pilgrim f47f4c137b [X86] getFauxShuffleMask - merge insertelement paths
Merge the INSERT_VECTOR_ELT/SCALAR_TO_VECTOR and PINSRW/PINSRB shuffle mask paths - they both do the same thing (find source vector + handle implicit zero extension). The PINSRW/PINSRB path also handled in the insertion of zero case which needed to be added to the general case as well.
2020-03-14 13:11:03 +00:00
Shengchen Kan e6f1dd40bd [X86] Disable nop padding before instruction following a prefix
Reviewers: reames, MaskRay, craig.topper, LuoYuanke, jyknight

Reviewed By: LuoYuanke

Subscribers: hiraditya, llvm-commits, annita.zhang

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D76052
2020-03-14 13:15:30 +08:00
Diogo Sampaio 83cdb654e4 [AArch64][Fix] LdSt optimization generate premature stack-popping
Summary:
When moving add and sub to memory operand instructions,
aarch64-ldst-opt would prematurally pop the stack pointer,
before memory instructions that do access the stack using
indirect loads.
e.g.
```
int foo(int offset){
    int local[4] = {0};
    return local[offset];
}
```
would generate:
```
sub     sp, sp, #16            ; Push the stack
mov     x8, sp                 ; Save stack in register
stp     xzr, xzr, [sp], #16    ; Zero initialize stack, and post-increment, making it invalid
------ If an exception goes here, the stack value might be corrupted
ldr     w0, [x8, w0, sxtw #2]  ; Access correct position, but it is not guarded by SP
```

Reviewers: fhahn, foad, thegameg, eli.friedman, efriedma

Reviewed By: efriedma

Subscribers: efriedma, kristof.beyls, hiraditya, danielkiss, llvm-commits, simon_tatham

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D75755
2020-03-14 02:03:10 +00:00
Craig Topper 755e00876c [X86] Remove isel patterns for X86VBroadcast+trunc+extload. Replace with DAG combines.
This is a little more complicated than I'd like it to be. We have
to manually match a trunc+srl+load pattern that generic DAG
combine won't do for us due to isTypeDesirableForOp.
2020-03-13 18:12:16 -07:00
Stanislav Mekhanoshin c262b69dcc [AMDGPU] Fix endcf collapse
Only collapse inner endcf if the outer one belongs to SI_IF.
If it does belong to SI_ELSE then mask being restored in fact
a partial inverse of what we need.

Differential Revision: https://reviews.llvm.org/D76154
2020-03-13 13:50:21 -07:00
Matt Arsenault 015b640be4 AMDGPU: Add flag to used fixed function ABI
Pass all arguments to every function, rather than only passing the
minimum set of inputs needed for the call graph.
2020-03-13 13:27:05 -07:00
Matt Arsenault bb8622094d AMDGPU: Don't handle kernarg.segment.ptr in functions
Just lower this to null. Pass implicitarg.ptr in its place in the
argument list.
2020-03-13 12:51:12 -07:00
Nico Weber f82b32a51e Revert "Reland "[DebugInfo] Enable the debug entry values feature by default""
This reverts commit 5aa5c943f7.
Causes clang to assert, see
https://bugs.chromium.org/p/chromium/issues/detail?id=1061533#c4
for a repro.
2020-03-13 15:37:44 -04:00
Stanislav Mekhanoshin 32e90cbcd1 [AMDGPU] Disable endcf collapse
There are some functional regressions and I suspect our
scopes are not as perfectly enclosed as I expected.
Disable it for now.

Differential Revision: https://reviews.llvm.org/D76148
2020-03-13 12:33:22 -07:00
Simon Pilgrim 05c0d34918 [X86][SSE] Prefer trunc(movd(x)) to pextrb(x,0)
If we're extracting the 0'th index of a v16i8 vector we're better off using MOVD than PEXTRB, unless we're storing the value or we require the implicit zero extension of PEXTRB.

The biggest perf diff is on SLM targets where MOVD (uops=1, lat=3 tp=1) is notably faster than PEXTRB (uops=2, lat=5, tp=4).

This matches what we already do for PEXTRW.

Differential Revision: https://reviews.llvm.org/D76138
2020-03-13 18:43:04 +00:00
Philip Reames 1b86ad27a7 Use 15 byte long nops on modern Intel processors
Back in D42616, we switched our default nop length from 15 to 10 bytes because some platforms have painful decode stalls when encountering multiple instruction prefixes. (10 byte long nops come from the fact that prefixes are used to pad after 8 bytes, and some platforms have issues w/more than two prefixes.)

Based on Agner's guides, it appears to be the case that modern Intel (SandyBridge and later) can decode an arbitrary number of prefixes without issue. Intel's guide only provides up to 9 bytes; I read that as providing a safe default for all their chips. Older chips and Atom series have serious decode stalls. I can't find a conclusive reference beyond those two.

Differential Revision: https://reviews.llvm.org/D75945
2020-03-13 10:51:09 -07:00
Simon Cook a26bd4ec16 [TableGen] Support combining AssemblerPredicates with ORs
For context, the proposed RISC-V bit manipulation extension has a subset
of instructions which require one of two SubtargetFeatures to be
enabled, 'zbb' or 'zbp', and there is no defined feature which both of
these can imply to use as a constraint either (see comments in D65649).

AssemblerPredicates allow multiple SubtargetFeatures to be declared in
the "AssemblerCondString" field, separated by commas, and this means
that the two features must both be enabled. There is no equivalent to
say that _either_ feature X or feature Y must be enabled, short of
creating a dummy SubtargetFeature for this purpose and having features X
and Y imply the new feature.

To solve the case where X or Y is needed without adding a new feature,
and to better match a typical TableGen style, this replaces the existing
"AssemblerCondString" with a dag "AssemblerCondDag" which represents the
same information. Two operators are defined for use with
AssemblerCondDag, "all_of", which matches the current behaviour, and
"any_of", which adds the new proposed ORing features functionality.

This was originally proposed in the RFC at
http://lists.llvm.org/pipermail/llvm-dev/2020-February/139138.html

Changes to all current backends are mechanical to support the replaced
functionality, and are NFCI.

At this stage, it is illegal to combine features with ands and ors in a
single AssemblerCondDag. I suspect this case is sufficiently rare that
adding more complex changes to support it are unnecessary.

Differential Revision: https://reviews.llvm.org/D74338
2020-03-13 17:13:51 +00:00
Simon Pilgrim a2db388dce [CostModel][X86] Improve ISD::CTTZ costs accounting for BSF/TZCNT implementations 2020-03-13 16:51:13 +00:00
Simon Pilgrim 846c614f54 [X86] combineExtractWithShuffle - pull out repeated getSizeInBits() call. NFC. 2020-03-13 15:36:04 +00:00
Simon Pilgrim fe047fbccc [X86] LowerEXTRACT_VECTOR_ELT - pull out repeated getOperand() calls. NFC.
Also, cleanup LowerEXTRACT_VECTOR_ELT_SSE4 comments which had references to non-constant extraction indices.
2020-03-13 15:36:02 +00:00
Andrzej Warzynski a0c15ed460 [AArch64][SVE] Add the @llvm.aarch64.sve.dup.x intrinsic
Summary:
This intrinsic implements the unpredicated duplication of scalar values
and is mapped to (through ISD::SPLAT_VECTOR):
  * DUP <Zd>.<T>, #<imm>
  * DUP <Zd>.<T>, <R><n|SP>

Reviewed by: sdesmalen

Differential Revision: https://reviews.llvm.org/D75900
2020-03-13 12:40:22 +00:00
David Green 2c6c169dbd [ARM] Optimise ASRL/LSRL to smaller shifts using demand bits.
The ASRL/LSRL long shifts are generated from 64bit shifts. Once we have
them, it might turn out that enough of the 64bit result was not required
that we can use a smaller shift to perform the same result. As the
smaller shift can in general be folded in more way, such as into add
instructions in one of the test cases here, we can use the demand bit
analysis to prefer the smaller shifts where we can.

Differential Revision: https://reviews.llvm.org/D75371
2020-03-13 10:09:03 +00:00
David Green f67d93dc23 [ARM] Constant long shift combines
This changes the way that asrl and lsrl intrinsics are lowered, going
via a the ISEL ASRL and LSLL nodes instead of straight to machine nodes.
On top of that, it adds some constant folds for long shifts, in case it
turns out that the shift amount was either constant or 0.

Differential Revision: https://reviews.llvm.org/D75553
2020-03-13 08:54:59 +00:00
QingShan Zhang d0fb34dc09 [PowerPC] Replace the PPCISD:: SExtVElems with ISD::SIGN_EXTEND_INREG to leverage the combine rules
The PPCISD::SExtVElems was added by commit https://reviews.llvm.org/D34009. However,
we have another ISD node ISD::SIGN_EXTEND_INREG that perfectly match the semantics
of SExtVElems. And the DAGCombiner has some combine rules for SIGN_EXTEND_INREG
that produce better code.

Differential Revision: https://reviews.llvm.org/D70771
2020-03-13 07:28:28 +00:00
Craig Topper 09c8f38924 [X86] Add isel patterns for X86VBroadcast with i16 truncates from i16->i64 zextload/extload.
We can form vpbroadcastw with a folded load.

We had patterns for i16->i32 zextload/extload, but nothing prevents
i64 from occuring.

I'd like to move this all to DAG combine to fix more cases, but
this is trivial fix to minimize test diffs when moving to a combine.
2020-03-13 00:10:48 -07:00
Amy Kwan 1ba3d2639d [PowerPC][NFC] Rename instruction formats in PPCInstrPrefix.td
This patch renames some of the instruction formats within PPCInstrPrefix.td to
adopt a more uniform naming convention. It also adds the naming convention
extension, `_MEM` to indicate instruction formats for memory ops.

Differential Revision: https://reviews.llvm.org/D75819
2020-03-13 00:50:08 -05:00
Matt Arsenault ccc6e780c8 AMDGPU: Directly annotate functions if they have calls
Currently we infer whether the flat-scratch-init kernel input should
be enabled based on calls. Move this handling, so we can decide if the
full set of ABI inputs is needed in kernels. Ideally we would have an
analysis of some sort, rather than the function attributes.
2020-03-12 19:10:59 -04:00
Stanislav Mekhanoshin a73528649c [AMDGPU] Simplify exec copies
The patch removes late endcf handling and only leaves the
related portion with redundant exec mask copy elimination.

Differential Revision: https://reviews.llvm.org/D76095
2020-03-12 14:54:19 -07:00
Simon Pilgrim e91feeed21 [AMDGPU] Add ISD::FSHR -> ALIGNBIT support
This patch allows ISD::FSHR(i32) patterns to lower to ALIGNBIT instructions.

This improves test coverage of ISD::FSHR matching - x86 has both FSHL/FSHR instructions and we prefer FSHL by default.

Differential Revision: https://reviews.llvm.org/D76070
2020-03-12 20:16:57 +00:00
Thomas Lively 4e589e6c26 [WebAssembly] Fix SIMD shift unrolling to avoid assertion failure
Summary:
Using the default DAG.UnrollVectorOp on v16i8 and v8i16 vectors
results in i8 or i16 nodes being inserted into the SelectionDAG. Since
those are illegal types, this causes a legalization assertion failure
for some code patterns, as uncovered by PR45178. This change unrolls
shifts manually to avoid this issue by adding and using a new optional
EVT argument to DAG.ExtractVectorElements to control the type of the
extract_element nodes.

Reviewers: aheejin, dschuff

Subscribers: sbc100, jgravelle-google, hiraditya, sunfish, zzheng, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D76043
2020-03-12 12:20:14 -07:00
Stanislav Mekhanoshin 360aff0493 [AMDGPU] Simplify nested SI_END_CF
This is to replace the optimization from the SIOptimizeExecMaskingPreRA.
We have less opportunities in the control flow lowering because many
VGPR copies are still in place and will be removed later, but we know
for sure an instruction is SI_END_CF and not just an arbitrary S_OR_B64
with EXEC.

The subsequent change needs to convert s_and_saveexec into s_and and
address new TODO lines in tests, then code block guarded by the
-amdgpu-remove-redundant-endcf option in the pre-RA exec mask optimizer
will be removed.

Differential Revision: https://reviews.llvm.org/D76033
2020-03-12 11:25:07 -07:00
Zarko Todorovski d688312660 [PowerPC][AIX] Implement formal arguments passed in stack memory.
This patch is the callee side counterpart for https://reviews.llvm.org/D73209.
It removes the fatal error when we pass more formal arguments than available
registers.

Differential Revision: https://reviews.llvm.org/D74225
2020-03-12 11:48:00 -04:00
Xiangling Liao 3e53bf5781 [PowerPC32] Fix the `setcc` inconsistent result type problem
Summary:
On 32-bit PPC target[AIX and BE], when we convert an `i64` to `f32`, a `setcc` operand expansion is needed. The expansion will set the result type of expanded `setcc` operation based on if the subtarget use CRBits or not. If the subtarget does use the CRBits, like AIX and BE, then it will set the result type to `i1`, leading to an inconsistency with original `setcc` result type[i32].
And the reason why it crashed underneath is because we don't set result type of setcc consistent in those two places.

This patch fixes this problem by setting original setcc opnode result type also with `getSetCCResultType`  interface.

Reviewers: sfertile, cebowleratibm, hubert.reinterpretcast, Xiangling_L

Reviewed By: sfertile

Subscribers: wuzish, nemanjai, hiraditya, kbarton, jsji, shchenz, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D75702
2020-03-12 10:50:37 -04:00
Simon Moll c8e1081da6 [VE][nfc] Use RRIm for RRINDm, remove the latter
Summary:
De-duplicate isel instruction classes by using RRIm for RRINDm. The latter
becomes obsolete.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D76063
2020-03-12 15:38:07 +01:00
Sean Fertile 8b39341fb0 [PowerPC][AIX] Fix printing of program counter for AIX assembly.
Program counter on AIX is the dollar-sign.

Differential Revision:https://reviews.llvm.org/D75627
2020-03-12 10:37:18 -04:00
Andrzej Warzynski 46b9f14d71 [AArch64][SVE] Add intrinsics for non-temporal scatters/gathers
Summary:
This patch adds the following intrinsics for non-temporal gather loads
and scatter stores:
  * aarch64_sve_ldnt1_gather_index
  * aarch64_sve_stnt1_scatter_index
These intrinsics implement the "scalar + vector of indices" addressing
mode.

As opposed to regular and first-faulting gathers/scatters, there's no
instruction that would take indices and then scale them. Instead, the
indices for non-temporal gathers/scatters are scaled before the
intrinsics are lowered to `ldnt1` instructions.

The new ISD nodes, GLDNT1_INDEX and SSTNT1_INDEX, are only used as
placeholders so that we can easily identify the cases implemented in
this patch in performGatherLoadCombine and performScatterStoreCombined.
Once encountered, they are replaced with:
  * GLDNT1_INDEX -> SPLAT_VECTOR + SHL + GLDNT1
  * SSTNT1_INDEX -> SPLAT_VECTOR + SHL + SSTNT1

The patterns for lowering ISD::SHL for scalable vectors (required by
this patch) were missing, so these are added too.

Reviewed By: sdesmalen

Differential Revision: https://reviews.llvm.org/D75601
2020-03-12 13:55:56 +00:00
Simon Pilgrim 1e686d2689 [X86] Add FeatureFast7ByteNOP flag
Lets us remove another SLM proc family flag usage.

This is NFC, but we should probably check whether atom/glm/knl? should be using this flag as well...
2020-03-12 13:06:43 +00:00
Dylan McKay 2cf4b4de0c [AVR] Fix reads of uninitialized variables from constructor of AVRSubtarget
The initialization order was not correct. These bugs were discovered by
valgrind. They appear to work fine in practice but this patch should
unblock switching the AVR backend on by default as now a standard AVR
llc invocation runs without memory errors.

The AVRISelLowering constructor would run before the subtarget boolean
fields were initialized to false. Now, the initialization order is
correct.
2020-03-13 00:57:19 +13:00
Simon Pilgrim 4689eae820 [X86] combineOrShiftToFunnelShift - remove shift by immediate handling.
Now that D75114 has landed, DAGCombiner handles this case so the code is redundant.
2020-03-12 11:46:51 +00:00
Simon Tatham 3f8e714e2f [ARM,MVE] Add intrinsics and isel for MVE fused multiply-add.
Summary:
This adds the ACLE intrinsic family for the VFMA and VFMS
instructions, which perform fused multiply-add on vectors of floats.

I've represented the unpredicated versions in IR using the cross-
platform `@llvm.fma` IR intrinsic. We already had isel rules to
convert one of those into a vector VFMA in the simplest possible way;
but we didn't have rules to detect a negated argument and turn it into
VFMS, or rules to detect a splat argument and turn it into one of the
two vector/scalar forms of the instruction. Now we have all of those.

The predicated form uses a target-specific intrinsic as usual, but
I've stuck to just one, for a predicated FMA. The subtraction and
splat versions are code-generated by passing an fneg or a splat as one
of its operands, the same way as the unpredicated version.

In arm_mve_defs.h, I've had to introduce a tiny extra piece of
infrastructure: a record `id` for use in codegen dags which implements
the identity function. (Just because you can't declare a Tablegen
value of type dag which is //only// a `$varname`: you have to wrap it
in something. Now I can write `(id $varname)` to get the same effect.)

Reviewers: dmgreen, MarkMurrayARM, miyuki, ostannard

Reviewed By: dmgreen

Subscribers: kristof.beyls, hiraditya, danielkiss, cfe-commits, llvm-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D75998
2020-03-12 11:13:50 +00:00
Dylan McKay 13be27482e [AVR] Fix read of uninitialized variable AVRSubtarget:::ELFArch
Found by the LLVM MemorySanitizer tests when switching AVR to a default
backend.

ELFArch must be initialized before the call to
initializeSubtargetDependencies().

The uninitialized read would occur deep within TableGen'd code.
2020-03-13 00:09:13 +13:00
Qiu Chaofan 096d545376 [PowerPC] Add strict-fp intrinsic to FP arithmetic
This patch adds basic strict-fp intrinsics support to PowerPC backend,
including basic arithmetic operations (add/sub/mul/div).

Reviewed By: steven.zhang, andrew.w.kaylor

Differential Revision: https://reviews.llvm.org/D63916
2020-03-12 17:02:54 +08:00
Sebastian Neubauer 4327a9b46b [AMDGPU] Use progbits type for .AMDGPU.disasm section
The note section type implies a specific format that this section does
not have thus tools like readelf fail here. Progbits has no format and
another pipeline compiler already sets the type to progbits.

Differential Revision: https://reviews.llvm.org/D75913
2020-03-12 09:08:11 +01:00
Shengchen Kan 3a503ce663 [X86] Reduce the number of emitted fragments due to branch align
Summary:
Currently, a BoundaryAlign fragment may be inserted after the branch
that needs to be aligned to truncate the current fragment, this fragment is
unused at most of time. To avoid that, we can insert a new empty Data
fragment instead. Non-relaxable instruction is usually emitted into Data
fragment, so the inserted empty Data fragment will be reused at a high
possibility.

Reviewers: annita.zhang, reames, MaskRay, craig.topper, LuoYuanke, jyknight

Reviewed By: reames, LuoYuanke

Subscribers: llvm-commits, dexonsmith, hiraditya

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D75438
2020-03-12 15:37:35 +08:00
Djordje Todorovic 3b984641a7 [DebugInfo] Fix build failure on the mingw
Add the workaround for the X86::MOV16ri when describing call site
parameters.
2020-03-12 08:18:01 +01:00
QingShan Zhang 518292dbdf [PowerPC] Add the MacroFusion support for Power8
This patch is intend to implement the missing P8 MacroFusion for LLVM
according to Power8 User's Manual Section 10.1.12 Instruction Fusion

Differential Revision: https://reviews.llvm.org/D70651
2020-03-12 05:15:41 +00:00
Teresa Johnson 8f5e3c74b6 [PowerPC] Fix compile time issue in recursive CTR analysis code
Summary:
Avoid re-examining operands on recursive walk looking for CTR.
This was causing huge compile time after some earlier optimization
created a large expression.

The start of the expression (created by IndVarSimplify) looked like:

%469 = lshr i64 trunc (i128 xor (i128 udiv (i128 mul (i128 zext (i64 add (i64 trunc (i128 xor (i128 lshr (i128 mul (i128 zext (i64 add (i64 trunc (i128 xor (i128 lshr (i128 mul (i128 zext (i64 add (i64 trunc (i128 xor (i128 lshr (i128 mul (i128 zext (i64 add (i64 trunc (i128 xor (i128 lshr (i128 mul (i128 zext (i64 add (i64 trunc (i128 xor (i128 lshr (i128 mul (i128 zext (i64 add (i64 trunc (i128 xor (i128 lshr (i128 mul (i128 zext (i64 add (i64 trunc (i128 xor (i128 lshr (i128 mul (i128 zext (i64 add (i64 trunc (i128 xor (i128 lshr (i128 mul (i128 zext (i64 add (i64 trunc (i128 xor (i128 lshr (i128 mul (i128 zext (i64 add (i64 trunc (i128 xor (i128 lshr (i128 mul (i128 zext (i64 add (i64 trunc (i128 xor (i128 lshr (i128 mul (i128 zext (i64 add (i64 trunc (i128 xor (i128 lshr (i128 mul (i128 zext (i64 add (i64 trunc (i128 xor (i128 lshr (i128 mul (i128 zext (i64 add (i64 trunc (i128 xor (i128 lshr (i128 mul (i128 zext (i64 add (i64 trunc (i128 xor (i128 lshr (i128 mul (i128 zext (i64 add (i64 trunc (i128 xor (i128 lshr (i128 mul (i128 zext (i64 add (i64 trunc (i128 xor (i128 lshr (i128 mul (i128 zext (i64 add (i64 trunc (i128 xor (i128 lshr (i128 mul (i128 zext (i64 add (i64 trunc (i128 xor (i128 lshr (i128 mul (i128 zext (i64 add (i64 trunc (i128 xor (i128 lshr (i128 mul (i128 zext (i64 add (i64 trunc (i128 xor (i128 lshr (i128 mul (i128 zext (i64 add (i64 trunc (i128 xor (i128 lshr (i128 mul (i128 zext (i64 add (i64 trunc (i128 xor (i128 lshr (i128 mul (i128 zext (i64 add (i64 trunc (i128 xor (i128 lshr (i128 mul (i128 zext (i64 add (i64 trunc (i128 xor (i128 lshr (i128 mul (i128 zext (i64 add (i64 trunc (i128 xor (i128 lshr (i128 mul (i128 zext (i64 add (i64 ptrtoint (i8 @_ZN4absl13hash_internal13CityHashState5kSeedE to i64), i64 120) to i128), i128 8192506886679785011), i128 64), i128 mul (i128 zext (i64 add (i64 ptrtoint (i8 @_ZN4absl13hash_internal13CityHashState5kSeedE to i64), i64 120) to i128), i128 8192506886679785011)) to i64), i64 45) to i128), i128 8192506886679785011), i128 64), i128 mul (i128 zext (i64 add (i64 trunc (i128 xor (i128 lshr (i128 mul (i128 zext (i64 add (i64 ptrtoint (i8 @_ZN4absl13hash_internal13CityHashState5kSeedE to i64), i64 120) to i128), i128 8192506886679785011), i128 64), i128 mul (i128 zext (i64 add (i64 ptrtoint (i8 @_ZN4absl13hash_internal13CityHashState5kSeedE to i64), i64 120) to i128), i128 8192506886679785011)) to i64), i64 45) to i128), ...

with the _ZN4absl13hash_internal13CityHashState5kSeedE referenced many times.

Reviewers: hfinkel

Subscribers: nemanjai, hiraditya, kbarton, jsji, shchenz, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D75790
2020-03-11 16:11:14 -07:00
Matt Arsenault 1e0c540360 AMDGPU: Don't hard error on LDS globals in functions
Instead, emit a trap and a warning. We force inlining of this
situation, so any function where this happens should be dead as
indirect or external calls are not yet supported. This should avoid
erroring on dead code.
2020-03-11 15:34:11 -04:00
Francesco Petrogalli 4dde9e9b02 [llvm][CodeGen] IR intrinsics for SVE2 contiguous conflict detection instructions.
Summary:
The IR intrinsics are mapped to the following SVE2 instructions:

* WHILERW <Pd>.<T>, <Xn>, <Xm>
* WHILEWR <Pd>.<T>, <Xn>, <Xm>

The intrinsics introduced in this patch are the IR counterpart of the
SVE ACLE functions `svwhilerw` and `svwhilewr` (all data type
variants).

Patch by Maciej Gąbka <maciej.gabka@arm.com>.

Reviewers: kmclaughlin, rengolin

Reviewed By: kmclaughlin

Subscribers: tschuett, kristof.beyls, hiraditya, danielkiss, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D75862
2020-03-11 18:28:02 +00:00
Stanislav Mekhanoshin 9801e5469b [AMDGPU] Disable nested endcf collapse
The assumption is that conditional regions are perfectly nested
and a mask restored at the exit from the inner block will be
completely covered by a mask restored in the outer.

It turns out with our current structurizer this is not always
the case.

Disable the optimization for now, but I want to keep it around
for a while to either try after further structurizer changes or
to move it into control flow lowering where we have more info
and reuse the test.

Differential Revision: https://reviews.llvm.org/D75958
2020-03-11 11:24:20 -07:00
Jay Foad a46dba24fa [AMDGPU] Extend macro fusion for ADDC and SUBB to SUBBREV
Summary:
There's a lot of test case churn but the overall effect is to increase
the number of back-to-back v_sub,v_subbrev pairs, which can execute with
no delay even on gfx10.

Reviewers: arsenm, rampitec, nhaehnle

Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, hiraditya, kerbowa, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D75999
2020-03-11 17:59:21 +00:00
Andrzej Warzynski a9f1583228 [AArch64][SVE] Add the @llvm.aarch64.sve.sel intrinsic
Reviewers: sdesmalen, efriedma

Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D75928
2020-03-11 17:05:21 +00:00
Matt Arsenault a2202f6a3f AMDGPU/GlobalISel: Manually RegBankSelect copies
This was failng on any pre-assigned copy to the VCC bank.

This is something of a workaround for the default implementation in
getInstrMappingImpl, and how it treats copy-like operations in
general.

Copy-like operations are considered to only have one result register
bank, rather than separate banks for each source like a normal
instruction. To avoid potentially mishandling reg_sequence with
impossible operand combinations, the generic implementation errors on
impossible costs. If the bank was already assigned, is treated it
as-if it were an unsatisfiable REG_SEQUENCE mapping. We really don't
get any value from any of what getInstrMappingImpl tries to do for
copies, so just directly emit the simple mapping we really want.
2020-03-11 11:12:12 -04:00
Sam Parker d941df363d [NFC][ARM] Reorder some logic
Move some logic around in LowOverheadLoop::ValidateLiveOut
2020-03-11 11:40:09 +00:00
Simon Pilgrim b3b4727a3e [X86] Replace (most) X86ISD::SHLD/SHRD usage with ISD::FSHL/FSHR generic opcodes (PR39467)
For i32 and i64 cases, X86ISD::SHLD/SHRD are close enough to ISD::FSHL/FSHR that we can use them directly, we just need to account for the operand commutation for SHRD.

The i16 SHLD/SHRD case is annoying as the shift amount is modulo-32 (vs funnel shift modulo-16), so I've added X86ISD::FSHL/FSHR equivalents, which matches the generic implementation in all other terms.

Something I'm slightly concerned with is that ISD::FSHL/FSHR legality is controlled by the Subtarget.isSHLDSlow() feature flag - we don't normally use non-ISA features for this but it allows the DAG combines to continue to operate after legalization in a lot more cases.

The X86 *bits.ll changes are all affected by the same issue - we now have a "FSHR(-1,-1,amt) -> ROTR(-1,amt) -> (-1)" simplification that reduces the dependencies enough for the branch fall through code to mess up.

Differential Revision: https://reviews.llvm.org/D75748
2020-03-11 11:17:49 +00:00
Anna Welker a6d3bec83f [TTI][ARM][MVE] Refine gather/scatter cost model
Refines the gather/scatter cost model, but also changes the TTI
function getIntrinsicInstrCost to accept an additional parameter
which is needed for the gather/scatter cost evaluation.
This did require trivial changes in some non-ARM backends to
adopt the new parameter.
Extending gathers and truncating scatters are now priced cheaper.

Differential Revision: https://reviews.llvm.org/D75525
2020-03-11 10:23:41 +00:00
Victor Campos 8a12553223 [ARM] Improve codegen of volatile load/store of i64
Summary:
Instead of generating two i32 instructions for each load or store of a volatile
i64 value (two LDRs or STRs), now emit LDRD/STRD.

These improvements cover architectures implementing ARMv5TE or Thumb-2.

The code generation explicitly deviates from using the register-offset
variant of LDRD/STRD. In this variant, the register allocated to the
register-offset cannot be reused in any of the remaining operands. Such
restriction seems to be non-trivial to implement in LLVM, thus it is
left as a to-do.

Reviewers: dmgreen, efriedma, john.brawn, nickdesaulniers

Reviewed By: efriedma, nickdesaulniers

Subscribers: danielkiss, alanphipps, hans, nathanchance, nickdesaulniers, vvereschaka, kristof.beyls, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D70072
2020-03-11 10:19:27 +00:00
Matt Arsenault edd0dfca0d AMDGPU/GlobalISel: Refine G_TRUNC legality rules
Scalarize most truncates. Avoid touching cases that could end up in
unresolvable infinite loops.
2020-03-10 15:32:22 -07:00
Matt Arsenault ce8a1f7294 GlobalISel: Implement fewerElementsVector for G_TRUNC
Extend fewerElementsVectorBasic to handle operands with different
element types.
2020-03-10 15:17:20 -07:00
Matt Arsenault 200b20639a AMDGPU: Use V_MAC_F32 for fmad.ftz
This avoids regressions in a future patch. I'm confused by the use of
the gfx9 usage legacy_mad. Was this a pointless instruction rename, or
uses fmul_legacy handling? Why is regular mac avilable in that case?
2020-03-10 14:41:06 -07:00
Jay Foad c8f0d27ef3 [AMDGPU] Fix the gfx10 scheduling model for f32 conversions
Summary:
As far as I can tell on gfx10 conversions to/from f32 (that are not
converting f32 to/from f64) are full rate instructions, but they were
marked as quarter rate instructions.

I have fixed this for gfx10 only. I assume the scheduling model was
correct for older architectures, though I don't have any documentation
handy to confirm that.

Reviewers: rampitec, arsenm

Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, kerbowa, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D75392
2020-03-10 19:31:24 +00:00
Matt Arsenault c4de8935a5 ARM: Fixup some tests using denormal-fp-math attribute
Don't use the deprecated, single mode form in tests. Also make sure to
parse the attribute, in case of the deprecated form.
2020-03-10 14:02:06 -04:00
Benjamin Kramer 247a177cf7 Give helpers internal linkage. NFC. 2020-03-10 18:27:42 +01:00
Kazushi (Jam) Marukawa 3dabad1af3 [VE] Target-specific bit size for sjljehprepare
Summary:
This patch extends the TargetMachine to let targets specify the integer size
used by the sjljehprepare pass. This is 64bit for the VE target and otherwise
defaults to 32bit for all targets, which was hard-wired before.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D71337
2020-03-10 17:51:16 +01:00
Simon Pilgrim c8ede5e485 [X86][SSE] getFauxShuffleMask - add support for INSERT_VECTOR_ELT(EXTRACT_VECTOR_ELT) shuffle pattern
We already do this for PINSRB/PINSRW and SCALAR_TO_VECTOR.
2020-03-10 15:42:37 +00:00
Simon Pilgrim e6a7e3b5e3 [X86][SSE] matchShuffleWithSHUFPD - add support for unary shuffles.
This causes one minor test change but is mainly necessary for an upcoming patch.
2020-03-10 15:42:36 +00:00
Matt Arsenault 67cfbec746 AMDGPU/GlobalISel: Insert readfirstlane on SGPR returns
In case the source value ends up in a VGPR, insert a readfirstlane to
avoid producing an illegal copy later. If it turns out to be
unnecessary, it can be folded out.
2020-03-10 11:18:48 -04:00
Sam Parker a314050065 [ARM][MVE] VFMA and VFMS validForTailPredication
Add four instructions to the whitelist.

Differential Revision: https://reviews.llvm.org/D75902
2020-03-10 14:58:29 +00:00
Jonas Paulsson 62ff9960d3 [SystemZ] Improve foldMemoryOperandImpl().
Swap the compare operands if LHS is spilled while updating the CCMask:s of
the CC users. This is relatively straight forward since the live-in lists for
the CC register can be assumed to be correct during register allocation
(thanks to 659efa2).

Also fold a spilled operand of an LOCR/SELR into an LOC(G).

Review: Ulrich Weigand

Differential Revision: https://reviews.llvm.org/D67437
2020-03-10 15:54:47 +01:00
Simon Pilgrim 5cbddf7cbc [X86][SSE] Add more accurate costs for fmaxnum/fminnum codegen
Based off llvm-mca reports on codegen in llvm\test\CodeGen\X86\fmaxnum.ll + llvm\test\CodeGen\X86\fminnum.ll
2020-03-10 11:59:40 +00:00
alex-t 39e1a90784 [AMDGPU] SI_INDIRECT_DST_V* pseudos expansion should place EXEC restore to separate basic block
Summary:
When SI_INDIRECT_DST_V* pseudos has indexes in VGPR, they get expanded into the self-looped basic block that modifies EXEC in a loop.

To keep EXEC consistent it is stored before and then re-stored after the pseudo expansion result.

%95:vreg_512 = SI_INDIRECT_DST_V16 %93:vreg_512(tied-def 0), %94:sreg_32, 0, killed %1500:vgpr_32

results to

    s_mov_b64 s[6:7], exec
BB0_16:
    v_readfirstlane_b32 s8, v28
    v_cmp_eq_u32_e32 vcc, s8, v28
    s_and_saveexec_b64 vcc, vcc
    s_set_gpr_idx_on s8, gpr_idx(DST)
    v_mov_b32_e32 v6, v25
    s_set_gpr_idx_off
    s_xor_b64 exec, exec, vcc
    s_cbranch_execnz BB0_16
; %bb.17:
    s_mov_b64 exec, s[6:7]

The bug appeared in case this expansion occurs in the ELSE block of the CF.

Originally

  %110:vreg_512 = SI_INDIRECT_DST_V16 %103:vreg_512(tied-def 0), %85:vgpr_32, 0, %107:vgpr_32,
   %112:sreg_64 = SI_ELSE %108:sreg_64, %bb.19, 0, implicit-def dead $exec, implicit-def dead $scc, implicit $exec

expanded to

         ******************   <== here exec has "THEN" context

    s_mov_b64 s[6:7], exec
BB0_16:
    v_readfirstlane_b32 s8, v28
    v_cmp_eq_u32_e32 vcc, s8, v28
    s_and_saveexec_b64 vcc, vcc
    s_set_gpr_idx_on s8, gpr_idx(DST)
    v_mov_b32_e32 v6, v25
    s_set_gpr_idx_off
    s_xor_b64 exec, exec, vcc
    s_cbranch_execnz BB0_16
; %bb.17:
    s_or_saveexec_b64 s[4:5], s[4:5]   <-- exec mask is restored for "ELSE" but immediately overwritten.
    s_mov_b64 exec, s[6:7]

The rest of the "ELSE" block is executed not by the workitems which constitute the "else mask" but by those which constitute "then mask"

SILowerControlFlow::emitElse always considers the basic block begin() as an insertion point for s_or_saveexec.

Proposed fix:  The SI_INDIRECT_DST_V* procedure should split the reminder block to create landing pad for the EXEC restoration.

Reviewers: rampitec, vpykhtin, nhaehnle

Reviewed By: vpykhtin

Subscribers: arsenm, kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, hiraditya, kerbowa, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D75472
2020-03-10 14:04:22 +03:00
Kerry McLaughlin 0bba37a320 [AArch64][SVE] Add SVE intrinsics for address calculations
Summary: Adds the @llvm.aarch64.sve.adr[b|h|w|d] intrinsics

Reviewers: sdesmalen, andwar, efriedma, dancgr, cameron.mcinally, rengolin

Reviewed By: sdesmalen

Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, danielkiss, cfe-commits, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D75858
2020-03-10 10:53:37 +00:00
James Greenhalgh f0de8d0940 [Arm] Do not lower vmax/vmin to Neon instructions
On some Arm cores there is a performance penalty when forwarding from an
S register to a D register.  Calculating VMAX in a D register creates
false forwarding hazards, so don't do that unless we're on a core which
specifically asks for it.

Patch by James Greenhalgh

Differential Revision: https://reviews.llvm.org/D75248
2020-03-10 10:48:48 +00:00
Simon Pilgrim 18c19441d1 [X86][AVX] combineX86ShuffleChain - combine binary shuffles to X86ISD::VPERM2X128
For pre-AVX512 targets, combine binary shuffles to X86ISD::VPERM2X128 if possible. This mainly helps optimize the blend(extract_subvector(x,1),y) pattern.

At some point soon we're going to have make a decision about when to combine AVX512 shuffles more aggressively - we bail out if there is any change in element size (to protect predicate mask merging) which means we miss out on a lot of optimizations.
2020-03-10 10:44:28 +00:00
Sam Parker ff9ac33e1e [ARM][MVE] Validate tail predication values
Iterate through the loop and check that the observable values
produced are the same whether tail predication happens or not.

We want to find out if the tail-predicated version of this loop will
produce the same values as the loop in its original form. For this to
be true, the newly inserted implicit predication must not change the
the (observable) results.

We're doing this because many instructions in the loop will not be
predicated and so the conversion from VPT predication to tail
predication can result in different values being produced, because of
falsely predicated lanes not being updated in the converted form.

A masked load, whether through VPT or tail predication, will write
zeros to any of the falsely predicated bytes. So, from the loads, we
know that the false lanes are zeroed and here we're trying to track
that those false lanes remain zero, or where they change, the
differences are masked away by their user(s).

All MVE loads and stores have to be predicated, so we know that any
load operands, or stored results are equivalent already. Other
explicitly predicated instructions will perform the same operation in
the original loop and the tail-predicated form too. Because of this,
we can insert loads, stores and other predicated instructions into
our KnownFalseZeros set and build from there.

Differential Revision: https://reviews.llvm.org/D75452
2020-03-10 09:59:01 +00:00
Djordje Todorovic 5aa5c943f7 Reland "[DebugInfo] Enable the debug entry values feature by default"
Differential Revision: https://reviews.llvm.org/D73534
2020-03-10 09:15:06 +01:00
Craig Topper ef4f939d38 [X86] Remove isel patterns for (X86VBroadcast (i16 (trunc (i32 (load))))). Replace with a DAG combine to form VBROADCAST_LOAD.
isTypeDesirableForOp prevents loads from being shrunk to i16 by DAG
combine. Because of this we can't just match the broadcast and a
scalar load. So look for broadcast+truncate+load and form a
vbroadcast_load during DAG combine. This replaces what was
previously done as an isel pattern and I think fixes it so we
won't change the size of a volatile load. But my main motivation
is just to clean up our isel patterns.
2020-03-10 00:07:07 -07:00
Matt Arsenault 627bb31a28 AMDGPU/GlobalISel: Avoid illegal vector exts for add/sub/mul
When expanding scalar packed operations, we should not introduce
illegal vector casts LegalizerHelper introduces. We're not in a
legalizer context, and there's no RegBankSelect apply or legalize
worklist.
2020-03-09 23:42:17 -04:00
Matt Arsenault ed72bcae34 AMDGPU/GlobalISel: Fix mishandling SGPR v2s16 add/sub/mul
We weren't considering the packed case correctly, and this was passing
through to the selector. The selector only checked the size, so this
would incorrectly compile to a single 32-bit scalar add.

As usual, the LegalizerHelper is somewhat awkward to use from
applyMappingImpl. I think this is the first place we've needed
multi-step legalization here though.
2020-03-09 22:51:54 -04:00
Wouter van Oortmerssen a7a3751775 [WebAssembly] Fixed FrameBaseLocal not being set.
Summary:
Fixes: https://bugs.llvm.org/show_bug.cgi?id=44920

WebAssemblyRegColoring may merge the vreg that currently represents
the FrameBase with one representing an argument.
WebAssemblyExplicitLocals picks up the corresponding local when
a vreg is first added to the Reg2Local mapping, except when it is
an argument instruction which are handled separately.

Note that this does not change that vregs representing the FrameBase
may get merged, it is not clear to me that this may have other
effects we may want to avoid?

Reviewers: dschuff

Reviewed By: dschuff

Subscribers: azakai, sbc100, hiraditya, aheejin, sunfish, llvm-commits, jgravelle-google

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D75718
2020-03-09 17:29:36 -07:00
George Burgess IV 174c3eb69f [x86][slh] Move isDataInvariant* functions
Patch by Zola Bridges!

From the review:

"""
I moved these functions to X86InstrInfo.cpp, so they are available from
another pass. In addition, this is a step toward resolving the FIXME to
move this metadata to the instruction tables.

This is the final step to make these two data invariance checks
available for non-SLH passes.

The other two steps were here:

- https://reviews.llvm.org/D70283
- https://reviews.llvm.org/D75650

Tested via llvm-lit llvm/test/CodeGen/X86/speculative-load-hardening*
"""

Differential Revision: https://reviews.llvm.org/D75654
2020-03-09 17:07:44 -07:00
George Burgess IV bb0ec1daff [x86][slh][NFC] Rm redundant liveness check
Patch by Zola Bridges!

From the review:

"""
In this changeset (https://reviews.llvm.org/D70283), I added a liveness
check everywhere the isDataInvariant* functions were used, so that I
could safely delete the checks within the function. I mistakenly left
that deletion out of the patch. The result is that the same condition is
checked twice for some instructions which is functionally fine, but not
good. This change deletes the redundant check that I intended to delete
in the last change.

This is the second of three patches that will make the data invariance
checks available for non-SLH passes and enable the FIXMEs related to
moving this metadata to the instruction tables to be resolved.

Tested via llvm-lit llvm/test/CodeGen/X86/speculative-load-hardening*
"""

Differential Revision: https://reviews.llvm.org/D75650
2020-03-09 17:07:44 -07:00
Jay Foad c7b2e7f527 [AMDGPU] Fix scheduling info for terminator SALU instructions
Summary:
Instruction variants like S_MOV_B32_term should have the same SchedRW
class as the base instruction, S_MOV_B32. This probably doesn't make any
difference in practice because as terminators, they'll always be
scheduled at the end of a basic block, but it's simply more correct than
giving them all the default SchedRW class of Write32Bit, which implies a
VALU operation.

Reviewers: rampitec, arsenm, nhaehnle

Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, hiraditya, kerbowa, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D75860
2020-03-09 21:39:52 +00:00
Matt Arsenault eb41627799 AMDGPU/GlobalISel: Improve handling of illegal return types
Most importantly, this fixes ret i8. Also make sure to handle
signext/zeroext for odd types > i32. Some of the corresponding
argument passing fixes also need to be handled.
2020-03-09 13:11:30 -07:00
Matt Arsenault 156a1b59df AMDGPU: Make signext/zeroext behave more sensibly over > i32
Interpret these as extending to the next multiple of 32-bits. This had
no effect with i48 for example, which is really split into {i32, i16},
which should extend the high part.
2020-03-09 12:56:10 -07:00
Matt Arsenault 209094eeb6 AMDGPU/GlobalISel: Start matching s_lshlN_add_u32 instructions
Use a hack to only enable this for GlobalISel.

Technically this also works with SelectionDAG, but the divergence
selection isn't reliable enough and a few cases fail, but I have no
desire to spend time writing the manual expansion code for it. The DAG
actually does a better job since it catches using v_add_lshl_u32 in
the mixed SGPR/VGPR cases.
2020-03-09 12:36:51 -07:00
Simon Pilgrim 4b130b883d [X86][SSE] SimplifyDemandedVectorEltsForTargetNode - reduce vector width of X86ISD::BLENDI
If we don't need the upper subvector elements of the BLENDI node then use a smaller vector size.

This causes a couple of minor regressions in insertelement-ones.ll which are more examples of PR26018; given how cheap allones generation is I don't consider that a showstopper, just an annoyance (and there's plenty of other poor codegen cases in that file).
2020-03-09 18:29:28 +00:00
Craig Topper 3dcc0db15e [X86] Teach combineToExtendBoolVectorInReg to create opportunities for using broadcast load instructions.
If we're inserting a scalar that is smaller than the element
size of the final VT, the value of the extra bits doesn't matter.

Previously we any_extended in the scalar domain before inserting.

This patch changes this to use a broadcast of the original
scalar type and then a bitcast to the final type. This might
enable the use of a broadcast load.

This recovers regressions from 07d68c24aa
and 9fcd212e2f without relying on
alignment of the load.

Differential Revision: https://reviews.llvm.org/D75835
2020-03-09 11:26:12 -07:00
Shiva Chen c3d981aeba [RISCV] Add new SchedRead SchedWrite
The patch fixes some typos and introduces ReadFMemBase, ReadFSGNJ32,
ReadFSGNJ64, WriteFSGNJ32, WriteFSGNJ64, ReadFMinMax32, ReadFMinMax64,
WriteFMinMax32, WriteFMinMax64, so the target CPU with different pipeline model
could use them to describe latency.

Differential Revision: https://reviews.llvm.org/D75515
2020-03-10 00:12:27 +08:00
Jay Foad daf686b7b9 [AMDGPU] Remove unused SchedWrite class 2020-03-09 16:09:43 +00:00
Djordje Todorovic c15c68abdc [CallSiteInfo] Enable the call site info only for -g + optimizations
Emit call site info only in the case of '-g' + 'O>0' level.

Differential Revision: https://reviews.llvm.org/D75175
2020-03-09 12:12:44 +01:00
KAWASHIMA Takahiro c8cd1a994d [AArch64] Add support for Fujitsu A64FX
A64FX is an Armv8.2-A CPU used in FUJITSU Supercomputer
PRIMEHPC FX1000, PRIMEHPC FX700, and supercomputer Fugaku.

https://www.fujitsu.com/global/products/computing/servers/supercomputer/specifications/

Differential Revision: https://reviews.llvm.org/D75594
2020-03-09 19:15:09 +09:00
Craig Topper 07d68c24aa [X86] Remove isel patterns that matched vXi16 X86VBroadcast with i8->i16 aextload input.
This was selecting VBROADCASTW which turned the 8-bit load into
a 16-bit load if it happened to be 2 byte aligned.

I have a plan to fix the regression with a follow up patch
which I'll post shortly.
2020-03-08 19:16:24 -07:00
Kang Zhang b0f3d49a05 [NFC][PowerPC] Order the MTSTR/MFSPR InstAlias definetion by SPR
Summary:
This NFC patch is only modify the position of MTSTR/MFSPR InstAlias
definition. So it will be easy to read.
2020-03-08 11:58:53 +00:00
Craig Topper 70e4fb8a53 [X86] Add DAG combine to turn (vzext_movl (vbroadcast_load)) -> vzext_load.
If we're zeroing the other elements then we don't need the broadcast.
2020-03-08 00:35:40 -08:00
Kang Zhang 0bec7e47d0 Revert "[NFC][PowerPC] Remove the repeated definition for some InstAlias of mtspr/mfspr"
This reverts commit 46126a30f2.
Some test cases failed.
2020-03-08 06:32:12 +00:00
Kang Zhang 46126a30f2 [NFC][PowerPC] Remove the repeated definition for some InstAlias of mtspr/mfspr
Summary:
Below InstAlias have been redeclaration, this patch is to remove them.
mtdec/mfdec   mtsdr1/mfsdr1     mtsrr0/mfsrr0    mtsrr1/mfsrr1
2020-03-08 06:02:55 +00:00
Craig Topper d81d451442 [X86] Add DAG combine to replace vXi64 vzext_movl+scalar_to_vector with vYi32 vzext_movl+scalar_to_vector if the upper 32 bits of the scalar are zero.
We can just use a 32-bit copy and zero in the SSE domain when we
zero the upper bits.

Remove an isel pattern that becomes dead with this.
2020-03-07 16:14:26 -08:00
Craig Topper d41ea65ee8 [X86] Add DAG combines to enable removing of movddup/vbroadcast + simple_load isel patterns. 2020-03-07 15:22:02 -08:00
Craig Topper bc65b68661 [X86] Add a DAG combine to turn vbroadcast(vzload X) -> vbroadcast_load
Remove now unneeded isel patterns.
2020-03-07 15:22:02 -08:00
Craig Topper ec1d1f6ae7 [X86] Use MVT instead of EVT in a couple shuffle lowering functions. 2020-03-07 09:50:53 -08:00
Reid Kleckner 65b21282c7 Avoid emitting unreachable SP adjustments after `throw`
In 172eee9c, we tried to avoid these by modelling the callee as
internally resetting the stack pointer.

However, for the majority of functions with reserved stack frames, this
would lead LLVM to emit extra SP adjustments to undo the callee's
internal adjustment. This lead us to fix the problem further on down the
pipeline in eliminateCallFramePseudoInstr. In 5b79e603d3, I added
use a heuristic to try to detect when the adjustment would be
unreachable.

This heuristic is imperfect, and when exception handling is involved, it
fails to fire. The new test is an example of this. Simply throwing an
exception with an active cleanup emits dead SP adjustments after the
throw. Not only are they dead, but if they were executed, they would be
incorrect, so they are confusing.

This change essentially reverts 172eee9c and makes the 5b79e603d3
heuristic responsible for preventing unreachable stack adjustments. This
means we may emit unreachable stack adjustments for functions using EH
with unreserved call frames, but that is not very many these days. Back
in 2016 when this change was added, we were focused on 32-bit, which we
observed to have fewer reserved frames.

Fixes PR45064

Reviewed By: hans

Differential Revision: https://reviews.llvm.org/D75712
2020-03-06 13:33:45 -08:00
Craig Topper fc3cdd2ee7 [X86] Cleanup patterns and ins for VCVTNEPS2BF16.
There was a noop bitconvert in the load pattern. While there
also make all the sources refer to src_v.RC even though its the
same as _.RC, but its consistent.
2020-03-06 10:15:37 -08:00
Simon Pilgrim 7a2ab876fd [Hexagon] Fix fshl/fshr -> combine() bug identified in D75114 2020-03-06 17:23:10 +00:00
Jay Foad 11d1573bb6 [APFloat] Make use of new overloaded comparison operators. NFC.
Reviewers: ekatz, spatel, jfb, tlively, craig.topper, RKSimon, nikic, scanon

Subscribers: arsenm, jvesely, nhaehnle, hiraditya, dexonsmith, kerbowa, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D75744
2020-03-06 16:42:53 +00:00
Lucas Prates af1c2e561e [ARM] Fix dropped dollar sign from symbols in branch targets
Summary:
ARMAsmParser was incorrectly dropping a leading dollar sign character
from symbol names in targets of branch instructions. This was caused by
an incorrect assumption that the contents following the dollar sign
token should be handled as a constant immediate, similarly to the #
token.

This patch avoids the operand parsing from consuming the dollar sign
token when it is followed by an identifier, making sure it is properly
parsed as part of the expression.

Reviewers: efriedma

Reviewed By: efriedma

Subscribers: danielkiss, chill, carwil, vhscampos, kristof.beyls, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D73176
2020-03-06 16:25:08 +00:00
Krzysztof Parzyszek 37a604c296 [Hexagon] Recognize undefined registers in expandPostRAPseudo 2020-03-06 08:27:42 -06:00
Xiangling Liao 362456bc53 [AIX] Handle LinkOnceODRLinkage and AppendingLinkage for static init gloabl arrays
Handle LinkOnceODRLinkage;
Handle AppendingLinkage type for llvm.global_ctors/dtors static init global arrays;

Differential Revision: https://reviews.llvm.org/D75305
2020-03-06 09:26:55 -05:00
Sam Parker 4cf0dddcc6 [ARM][MVE] Enable VMOVN for tail predication
These instructions also don't exchange lanes, so make them legal.

Differential Revision: https://reviews.llvm.org/D75669
2020-03-06 08:59:22 +00:00
Jim Lin c40a9010d9 [AVR][NFC] Remove trailing space 2020-03-06 10:40:27 +08:00
Jessica Paquette ef4282e0ee [AArch64][GlobalISel] Avoid copies to target register bank for subregister copies
Previously for any copy from a register bigger than the destination:

Copied to a same-sized register in the destination register bank.
Subregister copy of that to the destination.
This fails for copies from 128-bit FPRs to GPRs because the GPR register bank
can't accomodate 128-bit values.

Instead of special-casing such copies to perform the truncation beforehand in
the source register bank, generalize this:
a) Perform a subregister copy straight from source register whenever possible.
This results in shorter MIR and fixes the above problem.

b) Perform a full copy to target bank and then do a subregister copy only if
source bank can't support target's size. E.g. GPR to 8-bit FPR copy.

Patch by Raul Tambre (tambre)!

Differential Revision: https://reviews.llvm.org/D75421
2020-03-05 11:13:02 -08:00
Fangrui Song 3e851f4a68 [PowerPC] Delete PPCMachObjectWriter and powerpc{,64}-apple-darwin
Reviewed By: #powerpc, sfertile

Differential Revision: https://reviews.llvm.org/D75494
2020-03-05 11:05:26 -08:00
Philip Reames c93f1046fc [X86/MC] Factor out common code [NFC] 2020-03-05 09:43:41 -08:00
David Stuttard a74b33f612 AMDGPU: Fix SMRD test in trivially disjoint mem access code
Summary:
This seems like an obvious error - cut and paste issue?
The change does make a change to one of the lit tests - it stops s_buffer_load
re-ordering past an MUBUF instruction (which is not surprising).

Change-Id: I80be99de5b62af4f42e91af2591b76a52ac9efa6

Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, tpr, t-tye, hiraditya, kerbowa, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D75686
2020-03-05 17:14:01 +00:00
Chris Bowler c7b6fa8f4b [AIX] Extend int arguments to register width when passed in stack memory.
This is a follow up to the previous patch: [AIX] Implement caller
arguments passed in stack memory.

This corrects a defect in AIX 64-bit where an i32 is written to the
stack with stw (4 bytes) rather than the expected std (8 bytes.) Integer
arguments pass on the stack as images of their register representation.

I also took the opportunity to tidy up some of the calling convention
AIX tests I added in my last commit. This patch adds the missed assembly
expected output for the stack arg int case, which would have caught this
problem.

Differential Revision: https://reviews.llvm.org/D75126
2020-03-05 11:49:16 -05:00
Daniel Kiss 11ab687c66 [AArch64] Harmonize print format of hint instructions.
Summary:
Hint instructions printed as "hint\t#hintnum" except
in case of ARM v8.3a instruction only "hint #hintnum" is printed.
This patch changes all format to the fist one.

Reviewers: pbarrio, LukeCheeseman, vsk

Reviewed By: vsk

Subscribers: kristof.beyls, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D75625
2020-03-05 15:35:24 +01:00
Simon Pilgrim 1dbef64ef3 Fix "Value stored to 'RegForm' is never read" static analyzer warnings. NFC. 2020-03-05 14:22:24 +00:00
Sam Parker 77e30758dd [ARM][MVE] Enable *SHRN* for tail predication
These instructions don't swap lanes so make them valid.

Differential Revision: https://reviews.llvm.org/D75667
2020-03-05 11:00:45 +00:00
David Blaikie 7a6878a72e X86AsmBackend.cpp: #ifndef NDEBUG some only-used-in-asserts variables to fix the -Werror non-asserts build 2020-03-04 22:36:24 -08:00
Craig Topper 4c7c87f245 [X86] Simplify the code at the end of lowerShuffleAsBroadcast.
The original code could create a bitcast from f64 to i64 and back
on 32-bit targets. This was only working because getBitcast was
able to fold the casts away to avoid leaving the illegal i64 type.

Now we handle the scalar case directly by broadcasting using the
scalar type as the element type. Then bitcasting to the final VT.
This works since we ensure the scalar type is the same size as
the final VT element type. No more casts to i64.

For the vector case, we cast to VT or subvector of VT. And then
do the broadcast.

I think this all matches what we generated before, just in a more
readable way.
2020-03-04 20:45:02 -08:00
Philip Reames c94a4133bb Consistently capitalize a variable [NFC]
One instance in a copy paste was pointed out in a review, fix all instances at once.
2020-03-04 20:00:08 -08:00
Jim Lin ea6eb813c7 [AVR][NFC] Use Register instead of unsigned
Summary: Use Register type for variables instead of unsigned type.

Reviewers: dylanmckay

Reviewed By: dylanmckay

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D75595
2020-03-05 11:38:24 +08:00
hsmahesha 3fda1fde8f AMDGPU/GlobalISel: Support llvm.trap and llvm.debugtrap intrinsics
Summary: Lower trap and debugtrap intrinsics to AMDGPU machine instruction(s).

Reviewers: arsenm, nhaehnle, kerbowa, cdevadas, t-tye, kzhuravl

Reviewed By: arsenm

Subscribers: kzhuravl, jvesely, wdng, yaxunl, rovka, dstuttard, tpr, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D74688
2020-03-05 08:16:57 +05:30
Shengchen Kan b3722dea3b [X86] Add a private member function determinePaddingPrefix for X86AsmBackend
Summary: X86 can reduce the bytes of NOP by padding instructions with prefixes to get a better peformance in some cases. So a private member function `determinePaddingPrefix` is added to determine which prefix is the most suitable.

Reviewers: annita.zhang, reames, MaskRay, craig.topper, LuoYuanke, jyknight

Reviewed By: reames

Subscribers: llvm-commits, dexonsmith, hiraditya

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D75357
2020-03-05 09:26:33 +08:00
Philip Reames f708c823f0 [X86] Relax existing instructions to reduce the number of nops needed for alignment purposes
If we have an explicit align directive, we currently default to emitting nops to fill the space. As discussed in the context of the prefix padding work for branch alignment (D72225), we're allowed to play other tricks such as extending the size of previous instructions instead.

This patch will convert near jumps to far jumps if doing so decreases the number of bytes of nops needed for a following align. It does so as a post-pass after relaxation is complete. It intentionally works without moving any labels or doing anything which might require another round of relaxation.

The point of this patch is mainly to mock out the approach. The optimization implemented is real, and possibly useful, but the main point is to demonstrate an approach for implementing such "pad previous instruction" approaches. The key notion in this patch is to treat padding previous instructions as an optional optimization, not as a core part of relaxation. The benefit to this is that we avoid the potential concern about increasing the distance between two labels and thus causing further potentially non-local code grown due to relaxation. The downside is that we may miss some opportunities to avoid nops.

For the moment, this patch only implements a small set of existing relaxations.. Assuming the approach is satisfactory, I plan to extend this to a broader set of instructions where there are obvious "relaxations" which are roughly performance equivalent.

Note that this patch *doesn't* change which instructions are relaxable. We may wish to explore that separately to increase optimization opportunity, but I figured that deserved it's own separate discussion.

There are possible downsides to this optimization (and all "pad previous instruction" variants). The major two are potentially increasing instruction fetch and perturbing uop caching. (i.e. the usual alignment risks) Specifically:
 * If we pad an instruction such that it crosses a fetch window (16 bytes on modern X86-64), we may cause the decoder to have to trigger a fetch it wouldn't have otherwise. This can effect both decode speed, and icache pressure.
 * Intel's uop caching have particular restrictions on instruction combinations which can fit in a particular way. By moving around instructions, we can both cause misses an change misses into hits. Many of the most painful cases are around branch density, so I don't expect this to be too bad on the whole.

On the whole, I expect to see small swings (i.e. the typical alignment change problem), but nothing major or systematic in either direction.

Differential Revision: https://reviews.llvm.org/D75203
2020-03-04 16:52:35 -08:00
Craig Topper eadea7868f [X86] Convert vXi1 vectors to xmm/ymm/zmm types via getRegisterTypeForCallingConv rather than using CCPromoteToType in the td file
Previously we tried to promote these to xmm/ymm/zmm by promoting
in the X86CallingConv.td file. But this breaks when we run out
of xmm/ymm/zmm registers and need to fall back to memory. We end
up trying to create a non-sensical scalar to vector. This lead
to an assertion. The new tests in avx512-calling-conv.ll all
trigger this assertion.

Since we really want to treat these types like we do on avx2,
it seems better to promote them before the calling convention
code gets involved. Except when the calling convention is one
that passes the vXi1 type in a k register.

The changes in avx512-regcall-Mask.ll are because we indicated
that xmm/ymm/zmm types should be passed indirectly for the
Win64 ABI before we go to the common lines that promoted the
vXi1 types. This caused the promoted types to be picked up by
the default calling convention code. Now we promote them earlier
so they get passed indirectly as though they were xmm/ymm/zmm.

Differential Revision: https://reviews.llvm.org/D75154
2020-03-04 15:02:32 -08:00
Craig Topper 6ca96765c7 [X86] Disable commuting for the first source operand of zero masked scalar fma intrinsic instructions.
I believe this is the correct fix for D75506 rather than disabling all commuting. We can still commute the remaining two sources.

Differential Revision:m https://reviews.llvm.org/D75526
2020-03-04 14:35:53 -08:00
Matt Arsenault 15bf916b54 AMDGPU: Remove VOP3OpSelMods0 complex pattern
Use default operand of 0 instead.
2020-03-04 17:18:22 -05:00
Matt Arsenault 9e1d2afc13 AMDGPU/GlobalISel: Don't use vector G_EXTRACT in arg lowering
Create a wider source vector, and unmerge with dead defs like the
legalizer. The legalization handling for G_EXTRACT is incomplete, and
it's preferrable to keep everything in 32-bit pieces.

We should probably start moving these functions into utils, since we
have a growing number of places that do almost the same thing.
2020-03-04 16:49:01 -05:00
Matt Arsenault fb0c35fa34 GlobalISel: Set alignment on function argument stack load/store 2020-03-04 16:38:46 -05:00
Zola Bridges aa3f791fa9 [x86][SLH] Rm liveness check from data invariance check
SLH had two functions named isDataInvariant and isDataInvariantLoad that
checked whether the passed instruction was data invariant. For some instructions,
if the EFLAGS were dead then they were considered data invariant, otherwise
they were not considered data invariant.

In this patch, I extracted that EFLAGS liveness check and made it
explicit at every call to isDataInvariant and isDataInvariantLoad.
This makes the isDataInvariant function behave more generally
and preserves the liveness check behavior that SLH would like to have.

Tested via llvm-lit llvm/test/CodeGen/X86/speculative-load-hardening*

This is the first step in making these two data invariance checks
available for non-SLH passes. The second step is to move the passes from
SLH to X86InstrInfo.cpp. I'll follow up with a patch that does that.

Differential Revision: https://reviews.llvm.org/D70283
2020-03-04 21:49:49 +01:00
Wei Mi 3c96d01d2e Generate Callee Saved Register (CSR) related cfi directives like .cfi_restore.
https://reviews.llvm.org/D42848 only handled CFA related cfi directives but
didn't handle CSR related cfi. The patch adds the CSR part. Basically it reuses
the framework created in D42848. For each basicblock, the patch tracks which
CSR set have been saved at its CFG predecessors's exits, and compare the CSR
set with the set at its previous basicblock's exit (The previous block is the
block laid before the current block). If the saved CSR set at its previous
basicblock's exit is larger, .cfi_restore will be inserted.

The patch also generates proper .cfi_restore in epilogue to make sure the
saved CSR set is consistent for the incoming edges of each block.

Differential Revision: https://reviews.llvm.org/D74303
2020-03-04 11:18:37 -08:00
Nikita Popov 0e890cd4d4 [ConstantFolding] Always return something from ConstantFoldConstant
Spin-off from D75407. As described there, ConstantFoldConstant()
currently returns null for non-ConstantExpr/ConstantVector inputs,
but otherwise always returns non-null, independently of whether
any folding has happened or not.

This is confusing and makes consumer code more complicated.
I would expect either that ConstantFoldConstant() returns only if
it actually folded something, or that it always returns non-null.
I'm going to the latter possibility here, which appears to be more
useful considering existing usage.

Differential Revision: https://reviews.llvm.org/D75543
2020-03-04 18:24:47 +01:00
Craig Topper 06de426426 [X86] Directly form VBROADCAST_LOAD in lowerShuffleAsBroadcast on AVX targets.
If we would emit a VBROADCAST node, we can instead directly emit
a VBROADCAST_LOAD. This allows us to get rid of the special case
to use an f64 load on 32-bit targets for vXi64.

I believe there is more cleanup we can do later in this function,
but I'll do that in follow ups.
2020-03-04 09:11:57 -08:00
Kerry McLaughlin f5502c7035 [AArch64][SVE] Add SVE2 intrinsic for xar
Summary: Implements the @llvm.aarch64.sve.xar intrinsic

Reviewers: andwar, c-rhodes, dancgr, efriedma, rengolin

Reviewed By: andwar

Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, cfe-commits, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D75160
2020-03-04 11:44:32 +00:00
Simon Pilgrim e2f0093800 [AMDGPU] performCvtF32UByteNCombine - revisit node after src operand simplification.
If SimplifyDemandedBits succeeds in simplifying the byte src, add the CVT_F32_UBYTE node back to the worklist as we might be able to simplify further.

Yet another step towards removing SelectionDAG::GetDemandedBits.
2020-03-04 11:25:50 +00:00
Simon Tatham 068b2f313c [ARM,MVE] Add the `vshlcq` intrinsics.
Summary:
The VSHLC instruction performs a left shift of a whole vector register
by an immediate shift count up to 32, shifting in new bits at the low
end from a GPR and delivering the shifted-out bits from the high end
back into the same GPR.

Since the instruction produces two outputs (the shifted vector
register and the output GPR of shifted-out bits), it has to be
instruction-selected in C++ rather than Tablegen.

Reviewers: MarkMurrayARM, dmgreen, miyuki, ostannard

Reviewed By: miyuki

Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D75445
2020-03-04 08:49:27 +00:00
Simon Tatham 810127f6ab [ARM,MVE] Add the `vsbciq` intrinsics.
Summary:
These are exactly parallel to the existing `vadciq` intrinsics, which
we implemented last year as part of the original MVE intrinsics
framework setup.

Just like VADC/VADCI, the MVE VSBC/VSBCI instructions deliver two
outputs, both of which the intrinsic exposes: a modified vector
register and a carry flag. So they have to be instruction-selected in
C++ rather than Tablegen. However, in this case, that's trivial: the
same C++ isel routine we already have for VADC works unchanged, and
all we have to do is to pass it a different instruction id.

Reviewers: MarkMurrayARM, dmgreen, miyuki, ostannard

Reviewed By: miyuki

Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D75444
2020-03-04 08:49:27 +00:00
Craig Topper 9284abd004 [X86] Directly form VBROADCAST_LOAD for BUILD_VECTOR of splat loads in lowerBuildVectorAsBroadcast. 2020-03-03 22:27:34 -08:00
Matt Arsenault 88aced1e45 AMDGPU: Fix computation for getOccupancyWithLocalMemSize
The computation here didn't really make sense to me, and reported
wildy different results depending on the flat work group size
attribute.

I think this should really report a range derived from the possible
work group size bounds, and only allow an occupancy that is a multiple
of the group size.
2020-03-03 17:15:57 -05:00
Craig Topper 02f03a6fd4 [X86] Match vpmullq latency to uops.info. Correct port usage for 512-bit memory form
uops.info says these should be 15 cycle instructions. Uops.info also shows the 512-bit form uses port 0 and 5 for both register and memory. We had memory using 0 and 1.

Differential Revision: https://reviews.llvm.org/D75549
2020-03-03 12:16:03 -08:00
Craig Topper 3c4e635593 [X86] Always emit an integer vbroadcast_load from lowerBuildVectorAsBroadcast regardless of AVX vs AVX2
If we go with D75412, we no longer depend on the scalar type directly. So we don't need to avoid using i64. We already have AVX1 fallback patterns with i32 and i64 scalar types so we don't need to avoid using integer types on AVX1.

Differential Revision: https://reviews.llvm.org/D75413
2020-03-03 10:39:11 -08:00
Craig Topper 56cd3bc209 [X86] Directly emit VBROADCAST_LOAD from constant pool in lowerBuildVectorAsBroadcast
Also add a DAG combine to combine different sized broadcasts from
constant pool to avoid a regression.

Differential Revision: https://reviews.llvm.org/D75412
2020-03-03 10:39:10 -08:00
Craig Topper 68aeaab888 [X86] Don't count the chain uses when forming broadcast loads in lowerBuildVectorAsBroadcast.
The build_vector needs to be the only user of the data, but the
chain will likely have another use. So we can't make sure the
build_vector is the only user of the node.
2020-03-03 08:41:31 -08:00
Jonas Paulsson ae4d39c9e4 [SystemZ] Copy Access registers and CC with the correct register class.
On SystemZ there are a set of "access registers" that can be copied in and
out of 32-bit GPRs with special instructions. These instructions can only
perform the copy using low 32-bit parts of the 64-bit GPRs. However, the
default register class for 32-bit integers is GRX32, which also contains the
high 32-bit part registers.

In order to never end up with a case of such a COPY into a high reg, this
patch adds a new simple pre-RA pass that selects such COPYs into target
instructions.

This pass also handles COPYs from CC (Condition Code register), and COPYs to
CC can now also be emitted from a high reg in copyPhysReg().

Fixes: https://bugs.llvm.org/show_bug.cgi?id=44254

Review: Ulrich Weigand.

Differential Revision: https://reviews.llvm.org/D75014
2020-03-03 16:41:09 +01:00
Francesco Petrogalli 779e2c7a1a [llvm][CodeGen][SVE] Constrain prefetch intrinsic argument to immediate values.
Summary:
The argument that sets the prefetch type of a prefetch intrinsic must
be an immediate value.

Reviewers: andwar, sdesmalen, efriedma

Subscribers: tschuett, hiraditya, rkruppe, psnobl, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D75482
2020-03-03 15:25:08 +00:00
Sam Parker 5618e9be37 [RDA][ARM] collectKilledOperands across multiple blocks
Use MIOperand in collectLocalKilledOperands to make the search
global, as we already have to search for global uses too. This
allows us to delete more dead code when tail predicating.

Differential Revision: https://reviews.llvm.org/D75167
2020-03-03 15:23:05 +00:00
Sam Parker dfe8f5da4c [ARM][RDA] Allow multiple killed users
In RDA, check against the already decided dead instructions when
looking at users. This allows an instruction to be removed if it
has multiple users, but they're all dead.

This means that IT instructions can be considered killed once all
the itstate using instructions are dead.

Differential Revision: https://reviews.llvm.org/D75245
2020-03-03 15:12:29 +00:00
Jonas Paulsson 237625757a [SystemZ] Bugfix for backchain with packed-stack
The incoming back chain slot was implicitly allocated whenever a GPR was
saved in SystemZFrameLowering::getRegSpillOffset(), but in cases where no
GPRs were saved/restored this did not take effect.

Review: Ulrich Weigand

Differential Revision: https://reviews.llvm.org/D75367
2020-03-03 15:03:01 +01:00
Jonas Paulsson cdcce3cabf [SystemZ] Also accept ISD::USUBO in shouldFormOverflowOp().
Forming subtract with overflow is beneficial on SystemZ, just like additions.

Review: Ulrich Weigand

Differential Revision: https://reviews.llvm.org/D75290
2020-03-03 14:38:57 +01:00
Jim Lin 4e3b037665 [AVR] Fix incorrect register state for LDRdPtr
Summary:
LDRdPtr expanded from LDWRdPtr shouldn't define its second operand(SrcReg).
The second operand is its source register.
Add -verify-machineinstrs into command line of testcases can trigger this error.

Reviewers: dylanmckay

Reviewed By: dylanmckay

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D75437
2020-03-03 17:34:54 +08:00
Shengchen Kan af57b139a0 Temporarily Revert [X86] Not track size of the boudaryalign fragment during the layout
Summary: This reverts commit 2ac19feb15.
This commit causes some test cases to run fail when branch is aligned.
2020-03-03 11:15:56 +08:00
Jim Lin c0a2da9460 [AVR] Add missing ROLLOOP and RORLOOP into getTargetNodeName 2020-03-03 09:43:52 +08:00
Huihui Zhang 44fa47c9e7 [ARM][ConstantIslands] Fix stack mis-alignment caused by undoLRSpillRestore.
Summary:
It is not safe for ARMConstantIslands to undoLRSpillRestore. PrologEpilogInserter is
the one to ensure stack alignment, taking into consideration LR is spilled or not.

For noreturn function with StackAlignment 8 (function contains call/alloc),
undoLRSpillRestore cause stack be mis-aligned. Fixing stack alignment in
ARMConstantIslands doesn't give us much benefit, as undo LR spill/restore only
occur in large function with near branches only, also doesn't have callee-saved LR spill.

Reviewers: t.p.northover, rengolin, efriedma, apazos, samparker, ostannard

Reviewed By: ostannard

Subscribers: dmgreen, ostannard, kristof.beyls, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D75288
2020-03-02 16:28:57 -08:00
Philip Reames 7049cf6496 [BranchAlign] Fix bug w/nop padding for SS manipulation
X86 has several instructions which are documented as enabling interrupts exactly one instruction *after* the one which changes the SS segment register. Inserting a nop between these two instructions allows an interrupt to arrive before the execution of the following instruction which changes semantic behaviour.

The list of instructions is documented in "Table 24-3. Format of Interruptibility State" in Volume 3c of the Intel manual. They basically all come down to different ways to write to the SS register.

Differential Revision: https://reviews.llvm.org/D75359
2020-03-02 14:40:25 -08:00
Joerg Sonnenberger eb812efa12 Explicitly include <cassert> when using assert
Depending on the OS used, a module-enabled build can fail due to the
special handling <cassert> gets as textual header.
2020-03-02 22:45:28 +01:00
Jessica Paquette 02c154a9cb [AArch64][MachineOutliner] Don't outline CFI instructions
CFI instructions can only safely be outlined when the outlined call is a tail
call, or when the outlined frame is fixed up.

For the sake of correctness, disable outlining from CFI instructions.

Add machine-outliner-cfi.mir to test this.
2020-03-02 10:56:35 -08:00
Simon Pilgrim e20e6f26fa Fix shadow variable warning. NFC. 2020-03-02 18:53:19 +00:00
Simon Pilgrim 2b624e04c7 Fix 'unsigned variable can never be negative' cppcheck warning. NFCI. 2020-03-02 18:53:18 +00:00
Krzysztof Parzyszek 0fafb4becc [Hexagon] Use BUILD_PAIR to expand i128 instead of doing arithmetic 2020-03-02 09:52:07 -06:00
Simon Pilgrim f5ad93d2f7 [X86] Cleanup ShuffleDecode implementations. NFCI.
- Remove unnecessary includes from the headers
 - Fix cppcheck definition/declaration arg mismatch warnings
 - Tidyup old comments (MVT usage was removed a long time ago)
 - Use SmallVector::append for repeated mask entries
2020-03-02 15:06:35 +00:00
Luke Geeson 7d594cf003 [ARM] Add Cortex-M55 Support for clang and llvm
This patch upstreams support for the ARM Armv8.1m cpu Cortex-M55.

In detail adding support for:

 - mcpu option in clang
 - Arm Target Features in clang
 - llvm Arm TargetParser definitions

details of the CPU can be found here:
https://developer.arm.com/ip-products/processors/cortex-m/cortex-m55

Reviewers: chill

Reviewed By: chill

Subscribers: dmgreen, kristof.beyls, hiraditya, cfe-commits,
llvm-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D74966
2020-03-02 11:42:26 +00:00
Andrzej Warzynski 9249f60602 [AArch64][SVE] Add intrinsics for non-temporal gather-loads/scatter-stores
Summary:
This patch adds the following LLVM IR intrinsics for SVE:
1. non-temporal gather loads
  * @llvm.aarch64.sve.ldnt1.gather
  * @llvm.aarch64.sve.ldnt1.gather.uxtw
  * @llvm.aarch64.sve.ldnt1.gather.scalar.offset
2. non-temporal scatter stores
  * @llvm.aarch64.sve.stnt1.scatter
  * @llvm.aarch64.sve.ldnt1.gather.uxtw
  * @llvm.aarch64.sve.ldnt1.gather.scalar.offset
These intrinsic are mapped to the corresponding SVE instructions
(example for half-words, zero-extending):
  * ldnt1h { z0.s }, p0/z, [z0.s, x0]
  * stnt1h { z0.s }, p0/z, [z0.s, x0]

Note that for non-temporal gathers/scatters, the SVE spec defines only
one instruction type: "vector + scalar". For this reason, we swap the
arguments when processing intrinsics that implement the "scalar +
vector" addressing mode:
  * @llvm.aarch64.sve.ldnt1.gather
  * @llvm.aarch64.sve.ldnt1.gather.uxtw
  * @llvm.aarch64.sve.stnt1.scatter
  * @llvm.aarch64.sve.ldnt1.gather.uxtw
In other words, all intrinsics for gather-loads and scatter-stores
implemented in this patch are mapped to the same load and store
instruction, respectively.

The sve2_mem_gldnt_vs multiclass (and it's counterpart for scatter
stores) from SVEInstrFormats.td was split into:
  * sve2_mem_gldnt_vec_vs_32_ptrs (32bit wide base addresses)
  * sve2_mem_gldnt_vec_vs_62_ptrs (64bit wide base addresses)
This is consistent with what we did for
@llvm.aarch64.sve.ld1.scalar.offset and highlights the actual split in
the spec and the implementation.

Reviewed by: sdesmalen

Differential Revision: https://reviews.llvm.org/D74858
2020-03-02 10:38:28 +00:00
Simon Tatham 1a8cbfa514 [ARM,MVE] Add ACLE intrinsics for VCVT[ANPM] family.
Summary:
These instructions convert a vector of floats to a vector of integers
of the same size, with assorted non-default rounding modes.
Implemented in IR as target-specific intrinsics, because as far as I
can see there are no matches for that functionality in the standard IR
intrinsics list.

Reviewers: MarkMurrayARM, dmgreen, miyuki, ostannard

Reviewed By: dmgreen

Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D75255
2020-03-02 10:33:30 +00:00
Simon Tatham b08d2ddd69 [ARM,MVE] Add ACLE intrinsics for VCVT.F32.F16 family.
Summary:
These instructions make a vector of `<4 x float>` by widening every
other lane of a vector of `<8 x half>`.

I wondered about representing these using standard IR, along the lines
of a shufflevector to extract elements of the input into a `<4 x half>`
followed by an `fpext` to turn that into `<4 x float>`. But it looks as
if that would take a lot of work in isel lowering to make it match any
pattern I could sensibly write in Tablegen, and also I haven't been
able to think of any other case where that pattern might be generated
in IR, so there wouldn't be any extra code generation win from doing
it that way.

Therefore, I've just used another target-specific intrinsic. We can
always change it to the other way later if anyone thinks of a good
reason.

(In order to put the intrinsic definition near similar things in
`IntrinsicsARM.td`, I've also lifted the definition of the
`MVEMXPredicated` multiclass higher up the file, without changing it.)

Reviewers: MarkMurrayARM, dmgreen, miyuki, ostannard

Reviewed By: miyuki

Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D75254
2020-03-02 10:33:30 +00:00
Simon Tatham 69441e53c9 [ARM,MVE] Correct MC operands in VCVT.F32.F16. (NFC)
Summary:
The two MVE instructions that convert between v4f32 and v8f16 were
implemented as instances of the same class, with the same MC operand
list.

But that's not really appropriate, because the narrowing conversion
only partially overwrites its output register (it only has 4 f16
values to write into a vector of 8), so even when unpredicated, it
needs a $Qd_src input, a constraint tying that to the $Qd output, and
a vpred_n.

The widening conversion is better represented like any other
instruction that completely replaces its output when unpredicated: it
should have no $Qd_src operand, and instead, a vpred_r containing a
$inactive parameter. That's a better match to other similar
instructions, such as its integer analogue, the VMOVL instruction that
makes a v4i32 by sign- or zero-extending every other lane of a v8i16.

This commit brings the widening VCVT.F32.F16 into line with the other
instructions that behave like it. That means you can write isel
patterns that use it unpredicated, without having to add a pointless
undefined $QdSrc operand.

No existing code generation uses that instruction yet, so there should
be no functional change from this fix.

Reviewers: MarkMurrayARM, dmgreen, miyuki, ostannard

Reviewed By: dmgreen

Subscribers: kristof.beyls, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D75253
2020-03-02 10:33:30 +00:00
Simon Tatham a41ecf0eb0 [ARM,MVE] Add ACLE intrinsics for VQMOV[U]N family.
Summary:
These instructions work like VMOVN (narrowing a vector of wide values
to half size, and overwriting every other lane of an output register
with the result), except that the narrowing conversion is saturating.
They come in three signedness flavours: signed to signed, unsigned to
unsigned, and signed to unsigned. All are represented in IR by a
target-specific intrinsic that takes two separate 'unsigned' flags.

Reviewers: MarkMurrayARM, dmgreen, miyuki, ostannard

Reviewed By: dmgreen

Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D75252
2020-03-02 10:33:30 +00:00
Anna Welker 394974111b [ARM][MVE] Restrict allowed types of gather/scatter offsets
The MVE gather instructions smaller than 32bits zext extend the values
in the offset register, as opposed to sign extending them. We need to
make sure that the code that we select from is suitably extended, which
this patch attempts to fix by tightening up the offset checks.

Differential Revision: https://reviews.llvm.org/D75361
2020-03-02 10:04:12 +00:00
Kang Zhang 4962a0b26a [NFC][PowerPC] Move some alias definition from PPCInstrInfo.td to PPCInstr64Bit.td
Summary:
Some 64-bit instructions alias definition is in PPCInstrInfo.td, it should be
moved to PPCInstr64Bit.td.
2020-03-02 09:54:15 +00:00
Jim Lin bfdb834bc3 [Sparc] Fix incorrect operand for matching CMPri pattern
Summary:
It should be normal constant instead of target constant.
Pattern CMPri can be matched if the constant can be fitted into immediate field.
Otherwise, pattern CMPrr will be matched.
This fixed bug https://bugs.llvm.org/show_bug.cgi?id=44091.

Reviewers: dcederman, jyknight

Reviewed By: jyknight

Subscribers: jonpa, hiraditya, fedor.sergeev, jrtc27, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D75227
2020-03-02 11:36:32 +08:00
Shengchen Kan 2ac19feb15 [X86] Not track size of the boudaryalign fragment during the layout
Summary:
Currently the boundaryalign fragment caches its size during the process
of layout and then it is relaxed and update the size in each iteration. This
behaviour is unnecessary and ugly.

Reviewers: annita.zhang, reames, MaskRay, craig.topper, LuoYuanke, jyknight

Reviewed By: MaskRay

Subscribers: hiraditya, dexonsmith, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D75404
2020-03-02 09:32:30 +08:00
Craig Topper 2f4f8fcf64 [X86] Don't add DELETED_NODES to DAG combine worklist after calling SimplifyDemandedBits/SimplifyDemandedVectorElts.
These AddToWorklist calls were added in 84cd968f75.
It's possible the SimplifyDemandedBits/SimplifyDemandedVectorElts
triggered CSE that deleted N. Detect that and avoid adding N
to the worklist.

Fixes PR45067.
2020-03-01 00:06:32 -08:00
Fangrui Song 9569a1472e [PowerPC] Move .got2/.toc logic from PPCLinuxAsmPrinter::doFinalization() to emitEndOfAsmFile()
Delete redundant .p2align 2 and improve tests.
2020-02-29 17:12:36 -08:00
Craig Topper 5d6dfd877f [X86] Tighten up the SDTypeProfile for X86ISD::CVTNE2PS2BF16. NFCI 2020-02-29 13:22:13 -08:00
Simon Pilgrim 7e9747b50b [X86][F16C] Remove cvtph2ps intrinsics and use generic half2float conversion (PR37554)
This removes everything but int_x86_avx512_mask_vcvtph2ps_512 which provides the SAE variant, but even this can use the fpext generic if the rounding control is the default.

Differential Revision: https://reviews.llvm.org/D75162
2020-02-29 18:57:35 +00:00
Fangrui Song 692e0c9648 [MC] Add MCStreamer::emitInt{8,16,32,64}
Similar to AsmPrinter::emitInt{8,16,32,64}.
2020-02-29 09:40:21 -08:00
Benjamin Kramer 186dd63182 ArrayRef'ize restoreCalleeSavedRegisters. NFCI.
restoreCalleeSavedRegisters can mutate the contents of the
CalleeSavedInfos, so use a MutableArrayRef.
2020-02-29 09:50:23 +01:00
Shengchen Kan 95fa5c4f24 [X86] Move the function getOrCreateBoundaryAlignFragment
MCObjectStreamer is more suitable to create fragments than
X86AsmBackend, for example, the function getOrCreateDataFragment is
defined in MCObjectStreamer.

Differential Revision: https://reviews.llvm.org/D75351
2020-02-29 15:11:16 +08:00
Shengchen Kan 129a762555 [X86] Disable the NOP padding for branches when bundle is enabled
When bundle is enabled, data fragment itself has a space to emit NOP
to bundle-align instructions. The behaviour makes it impossible for
us to determine whether the macro fusion really happen when emitting
instructions. In addition, boundary-align fragment is also used to
emit NOPs to align instructions, currently using them together sometimes
makes code crazy.

Differential Revision: https://reviews.llvm.org/D75346
2020-02-29 15:07:06 +08:00
Craig Topper 9fcd212e2f [X86] Remove isel patterns from broadcast of loadi32.
We already combine non extending loads with broadcasts in DAG
combine. All these patterns are picking up is the aligned extload
special case. But the only lit test we have that exercsises it is
using v8i1 load that datalayout is reporting align 8 for. That
seems generous. So without a realistic test case I don't think
there is much value in these patterns.
2020-02-28 16:39:27 -08:00
Jay Foad 7d973307d5 [AMDGPU] Fix scheduling model for V_MULLIT_F32
This was incorrectly marked as a half rate 64-bit instruction by D45073.
2020-02-28 23:22:58 +00:00
Craig Topper f2d45e5097 [X86] Canonicalize (bitcast (vbroadcast_load)) so that the cast and vbroadcast_load are both integer or fp.
Helps a little with some isel pattern matching. Especially on
32-bit targets where we sometimes use f64 loads.
2020-02-28 15:07:49 -08:00
Craig Topper b68eeff05c [X86] Cleanup a comment around bitcasting X86ISD::VBROADCAST_LOAD and add an assert to make sure memory VT size doesn't change. 2020-02-28 15:07:49 -08:00
Vedant Kumar 0368b42295 [entry values] ARM: Add a describeLoadedValue override (PR45025)
As a narrow stopgap for the assertion failure described in PR45025, add
a describeLoadedValue override to ARMBaseInstrInfo and use it to detect
copies in which the forwarding reg is a super/sub reg of the copy
destination. For the moment this is unsupported.

Several follow ups are possible:

1) Handle VORRq. At the moment, we do not, because isCopyInstrImpl
   returns early when !MI.isMoveReg().

2) In the case where forwarding reg is a super-reg of the copy
   destination, we should be able to describe the forwarding reg as a
   subreg within the copy destination. I'm not 100% sure about this, but
   it looks like that's what's done in AArch64InstrInfo.

3) In the case where the forwarding reg is a sub-reg of the copy
   destination, maybe we could describe the forwarding reg using the
   copy destinaion and a DW_OP_LLVM_fragment (I guess this should be
   possible after D75036).

https://bugs.llvm.org/show_bug.cgi?id=45025
rdar://59772698

Differential Revision: https://reviews.llvm.org/D75273
2020-02-28 14:30:40 -08:00
Jay Foad 43830790d7 [AMDGPU] Remove dubious logic in bidirectional list scheduler
Summary:
pickNodeBidirectional tried to compare the best top candidate and the
best bottom candidate by examining TopCand.Reason and BotCand.Reason.
This is unsound because, after calling pickNodeFromQueue, Cand.Reason
does not reflect the most important reason why Cand was chosen. Rather
it reflects the most recent reason why it beat some other potential
candidate, which could have been for some low priority tie breaker
reason.

I have seen this cause problems where TopCand is a good candidate, but
because TopCand.Reason is ORDER (which is very low priority) it is
repeatedly ignored in favour of a mediocre BotCand. This is not how
bidirectional scheduling is supposed to work.

To fix this I changed the code to always compare TopCand and BotCand
directly, like the generic implementation of pickNodeBidirectional does.
This removes some uncommented AMDGPU-specific logic; if this logic turns
out to be important then perhaps it could be moved into an override of
tryCandidate instead.

Graphics shader benchmarking on gfx10 shows a lot more positive than
negative effects from this change.

Reviewers: arsenm, tstellar, rampitec, kzhuravl, vpykhtin, dstuttard, tpr, atrick, MatzeB

Subscribers: jvesely, wdng, nhaehnle, yaxunl, t-tye, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D68338
2020-02-28 21:35:34 +00:00
Krzysztof Parzyszek e7b9a20584 [Hexagon] Map dcfetch intrinsic to Y2_dcfetchbo, not Y2_dcfetch 2020-02-28 14:19:20 -06:00
Craig Topper c0d0e6b198 [X86] Recognize CVTPH2PS from STRICT_FP_EXTEND
This should avoid scalarizing the cvtph2ps intrinsics with D75162

Differential Revision: https://reviews.llvm.org/D75304
2020-02-28 10:19:57 -08:00
Teresa Johnson f9ca75f19b [Inliner] Inlining should honor nobuiltin attributes
Summary:
Final patch in series to fix inlining between functions with different
nobuiltin attributes/options, which was specifically an issue in LTO.
See discussion on D61634 for background.

The prior patch in this series (D67923) enabled per-Function TLI
construction that identified the nobuiltin attributes.

Here I have allowed inlining to proceed if the callee's nobuiltins are a
subset of the caller's nobuiltins, but not in the reverse case, which
should be conservatively correct. This is controlled by a new option,
-inline-caller-superset-nobuiltin, which is enabled by default.

Reviewers: hfinkel, gchatelet, chandlerc, davidxl

Subscribers: arsenm, jvesely, nhaehnle, mehdi_amini, eraman, hiraditya, haicheng, dexonsmith, kerbowa, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D74162
2020-02-28 07:34:14 -08:00
Simon Pilgrim b6e80864b6 Fix MSVC "32-bit shift implicitly converted to 64 bits" warning. NFCI. 2020-02-28 15:23:37 +00:00
Krzysztof Parzyszek c8bfed05e2 Reland 7691790dfd with a MSAN fix
In some cases when HexagonTargetLowering::allowsMemoryAccess returned
true, it did not set the "Fast" argument, leaving it uninitialized.

[Hexagon] Improve casting of boolean HVX vectors to scalars

- Mark memory access for bool vectors as disallowed in target lowering.
  This will prevent combining bitcasts of bool vectors with stores.
- Replace the actual bitcasting code with a faster version.
- Handle casting of v16i1 to i16.
2020-02-28 08:32:58 -06:00
David Green e2a2f3f7fc [ARM] MVE VMLAS
This addes extra patterns for the VMLAS MVE instruction, which performs
Qda = Qda * Qn + Rm, a similar pattern to the existing VMLA. The sinking
of splat(Rm) into the loop is already performed, meaning we just need
extra Pat's in tablegen.

Differential Revision: https://reviews.llvm.org/D75115
2020-02-28 14:27:21 +00:00
Jay Foad 970558df94 [AMDGPU] Mark the scheduling model as complete 2020-02-28 13:35:55 +00:00
Jay Foad addcbc401c [AMDGPU] Update a comment missed in 74e2974ac6 2020-02-28 13:35:55 +00:00
Simon Cook ca950a6bb1 [RISCV] Compress instructions based on function features
When running under LTO, it is common to not specify the architecture
spec, which is used for setting up the target machine, and instead rely
on features specified in each function to generate the correct
instructions.

This works for the code generator, but the RISC-V backend uses the
AsmPrinter to do instruction compression, which does not see these
features but instead uses a MCSubtargetInfo object to see whether
compression is enabled. Since this is configured based on the
TargetMachine at startup, it will result in compressed instructions not
being emitted when it has not been given the 'c' TargetFeature, but the
function has it.

This changes the RISCVAsmPrinter to re-initialize the STI feature set
based on the current MachineFunction, such that compressed instructions
are now correctly emitted regardless of the method used to enable them.

Differential revision: https://reviews.llvm.org/D73339
2020-02-28 11:52:55 +00:00
Peter Smith 2a92fc9b8e [MC][ELF][ARM] Add relocations for some pc-relative fixups
Add ELF relocations for the following fixups:
fixup_thumb_adr_pcrel_10 -> R_ARM_THM_PC8
fixup_thumb_cp -> R_ARM_THM_PC8
fixup_t2_adr_pcrel_12 -> R_ARM_THM_PREL_11_0
fixup_t2_ldst_pcrel_12 -> R_ARM_THM_PC12

While these relocations are short-ranged there is support in the open
source ELF linker's in binutils and soon to be in LLD. MC will no longer
resolve pc-relative fixups to global symbols due to interpositioning
concerns. We can handle these at link time by implementing the relocations.

The R_ARM_THM_PC8 has some extra encoding rules for addends that llvm-mc
sidesteps by not supporting addends for these instructions, using the wide
Thumb 2 instruction if it is available. I think that this is a reasonable
compromise given that these are rare.

This partiall reverts D72892, the Thumb fixups no longer need to be
evaluated at assembly time.

Differential Revision: https://reviews.llvm.org/D75039
2020-02-28 11:29:29 +00:00
Sam Parker bf61421a02 [RDA] Track implicit-defs
Ensure that we're recording implicit defs, as well as visiting implicit
uses and implicit defs when we're walking through operands.

Differential Revision: https://reviews.llvm.org/D75185
2020-02-28 11:14:42 +00:00
Stefan Agner 2f95d5f103 [ARM][Thumb2] support .w assembler qualifier for dmb/dsb/isb
Support the explicit wide assembler qualifier for the dmb/dsb/isb synchronization barrier instructions.

Differential revision: https://reviews.llvm.org/D75143
2020-02-28 11:08:24 +00:00
Stefan Agner b4207e705b [ARM][Thumb2] Support .w assembler qualifier for pld/pldw/pli
Accept explicit wide assembler qualifier for the pld/pldw/pli.

Differential revision: https://reviews.llvm.org/D75144
2020-02-28 11:08:24 +00:00
Stanislav Mekhanoshin 6b813f2762 [AMDGPU] Enable runtime unroll for LDS
We want to do unroll for LDS even for runtime trip count
to combine LDS operations.

Differential Revision: https://reviews.llvm.org/D75293
2020-02-27 12:59:35 -08:00
Sanjay Patel 90fd859f51 [x86] use instruction-level fast-math-flags to drive MachineCombiner
The code changes here are hopefully straightforward:

1. Use MachineInstruction flags to decide if FP ops can be reassociated
   (use both "reassoc" and "nsz" to be consistent with IR transforms;
   we probably don't need "nsz", but that's a safer interpretation of
   the FMF).
2. Check that both nodes allow reassociation to change instructions.
   This is a stronger requirement than we've usually implemented in
   IR/DAG, but this is needed to solve the motivating bug (see below),
   and it seems unlikely to impede optimization at this late stage.
3. Intersect/propagate MachineIR flags to enable further reassociation
   in MachineCombiner.

We managed to make MachineCombiner flexible enough that no changes are
needed to that pass itself. So this patch should only affect x86
(assuming no other targets have implemented the hooks using MachineIR
flags yet).

The motivating example in PR43609 is another case of fast-math transforms
interacting badly with special FP ops created during lowering:
https://bugs.llvm.org/show_bug.cgi?id=43609
The special fadd ops used for converting int to FP assume that they will
not be altered, so those are created without FMF.

However, the MachineCombiner pass was being enabled for FP ops using the
global/function-level TargetOption for "UnsafeFPMath". We managed to run
instruction/node-level FMF all the way down to MachineIR sometime in the
last 1-2 years though, so we can do better now.

The test diffs require some explanation:

1. llvm/test/CodeGen/X86/fmf-flags.ll - no target option for unsafe math was
   specified here, so MachineCombiner kicks in where it did not previously;
   to make it behave consistently, we need to specify a CPU schedule model,
   so use the default model, and there are no code diffs.
2. llvm/test/CodeGen/X86/machine-combiner.ll - replace the target option for
   unsafe math with the equivalent IR-level flags, and there are no code diffs;
   we can't remove the NaN/nsz options because those are still used to drive
   x86 fmin/fmax codegen (special SDAG opcodes).
3. llvm/test/CodeGen/X86/pow.ll - similar to #1
4. llvm/test/CodeGen/X86/sqrt-fastmath.ll - similar to #1, but MachineCombiner
   does some reassociation of the estimate sequence ops; presumably these are
   perf wins based on latency/throughput (and we get some reduction of move
   instructions too); I'm not sure how it affects numerical accuracy, but the
   test reflects reality better now because we would expect MachineCombiner to
   be enabled if the IR was generated via something like "-ffast-math" with clang.
5. llvm/test/CodeGen/X86/vec_int_to_fp.ll - this is the test added to model PR43609;
   the fadds are not reassociated now, so we should get the expected results.
6. llvm/test/CodeGen/X86/vector-reduce-fadd-fast.ll - similar to #1
7. llvm/test/CodeGen/X86/vector-reduce-fmul-fast.ll - similar to #1

Differential Revision: https://reviews.llvm.org/D74851
2020-02-27 15:19:37 -05:00
Simon Pilgrim 168a44a70e [CostModel][X86] Improve extract/insert element costs (PR43605)
This tries to improve the accuracy of extract/insert element costs by accounting for subvector extraction/insertion for >128-bit vectors and the shuffling of elements to/from the 0'th index.

It also adds INSERTPS for f32 types and PINSR/PEXTR costs for integer types (at the moment we assume the same cost as MOVD/MOVQ - which isn't always true).

Differential Revision: https://reviews.llvm.org/D74976
2020-02-27 15:54:13 +00:00
Sam Parker 965ba4291a Revert "[ARM] Add CPSR as an implicit use of t2IT"
This reverts commit e58229fded.

Differential Revision: https://reviews.llvm.org/D75186
2020-02-27 15:43:44 +00:00
Simon Pilgrim f90cc633de Fix cppcheck definition/declaration arg mismatch warnings. NFCI. 2020-02-27 14:35:20 +00:00