Commit Graph

47199 Commits

Author SHA1 Message Date
Craig Topper 784f1bbf5e [X86] Remove SRAs from v16i8 multiply lowering on sse2 targets
Previously we unpacked the even bytes of each input into the high byte of 16-bit elements then did an v8i16 arithmetic shift right by 8 bits to fill the upper bits of each word with sign bits. Then we did the v8i16 multiply and then masked to zero the upper 8-bits of each result. The similar was done for all the odd bytes. The results are then packed together with packuswb

Since we are masking each multiply result element to 8-bits, and those 8-bits are determined only by the lower 8-bits of each of the inputs, we don't need to fill the upper bits with sign bits. So we can just unpack into the low byte of each element and treat the upper bits as garbage. This is what gcc also does.

Differential Revision: https://reviews.llvm.org/D44267

llvm-svn: 327093
2018-03-09 01:22:31 +00:00
Simon Pilgrim c286680032 [X86][AVX] Pull out variable permute creation from LowerBUILD_VECTORAsVariablePermute. NFCI.
This will make it easier to handle more complex cases than basic scaling or index masks.

llvm-svn: 327054
2018-03-08 20:07:06 +00:00
Krzysztof Parzyszek 480ab2bbc4 [Hexagon] Ignore indexed loads when handling unaligned loads
llvm-svn: 327037
2018-03-08 18:15:13 +00:00
Stefan Pintilie 235fb927b0 [Power9] Add more missing instructions to the Power 9 scheduler
With this patch we should be able to mark the Power 9 model as complete.

llvm-svn: 327021
2018-03-08 16:24:33 +00:00
Matt Arsenault c3fe46bbcf AMDGPU/GlobalISel: Pass subtarget + TM to LegalizerInfo
These are the parameters x86 already uses.

llvm-svn: 327020
2018-03-08 16:24:16 +00:00
Craig Topper a406796f5f [X86] Change X86::PMULDQ/PMULUDQ opcodes to take vXi64 type as input instead of vXi32.
This instruction can be thought of as reading either the even elements of a vXi32 input or the lower half of each element of a vXi64 input. We currently use the vXi32 interpretation, but vXi64 matches better with its broadcast behavior in EVEX.

I'm looking at moving MULDQ/MULUDQ creation to a DAG combine so we can do it when AVX512DQ is enabled without having to go through Custom lowering. But in some of the test cases we failed to use a broadcast load due to the size difference. This should help with that.

I'm also wondering if we can model these instructions in native IR and remove the intrinsics and I think using a vXi64 type will work better with that.

llvm-svn: 326991
2018-03-08 08:02:52 +00:00
Heejin Ahn 0de587296e [WebAssembly] Add except_ref as a first-class type
Summary: Add except_ref as a first-class type, according to the [[https://github.com/WebAssembly/exception-handling/blob/master/proposals/Level-1.md | Level 1 exception handling proposal ]].

Reviewers: dschuff

Subscribers: jfb, sbc100, llvm-commits

Differential Revision: https://reviews.llvm.org/D43706

llvm-svn: 326985
2018-03-08 04:05:37 +00:00
Weiming Zhao a4259cd3a6 [AArch64] Fix UB about shift amount exceeds data bit-width
Summary:
Fixes an UB caught by sanitizer. The shift amount might be larger than 32 so the operand should be 1ULL.
In this patch,  we replace the original expression with  existing API with uint64_t type.

Reviewers: eli.friedman, rengolin

Reviewed By: rengolin

Subscribers: rengolin, javed.absar, llvm-commits, kristof.beyls

Differential Revision: https://reviews.llvm.org/D44234

llvm-svn: 326969
2018-03-08 00:28:25 +00:00
Craig Topper 7ff9779768 [X86] Fix some isel patterns that used aligned vector load instructions with unaligned predicates.
These patterns weren't checking the alignment of the load, but were using the aligned instructions. This will cause a GP fault if the data isn't aligned.

I believe these were introduced in r312450.

llvm-svn: 326967
2018-03-08 00:21:17 +00:00
Rafael Espindola 06c064824e Delete code that is probably dead since r249303.
With r249303 the expression evaluation should expand variables that
are not in sections (and so don't have an atom).

llvm-svn: 326966
2018-03-08 00:17:13 +00:00
Simon Pilgrim 68594ee24a [X86][SSE] LowerBUILD_VECTORAsVariablePermute - reorder permute types. NFCI.
Reorder into 128/256/512 bit vector size groupings.

NFCI commit before some new features.

llvm-svn: 326963
2018-03-07 23:56:42 +00:00
Evandro Menezes f9bd871d32 [AArch64] Adjust the cost of integer vector division
Since there is no instruction for integer vector division, factor in the
cost of singling out each element to be used with the scalar division
instruction.

Differential revision: https://reviews.llvm.org/D43974

llvm-svn: 326955
2018-03-07 22:35:32 +00:00
Sebastian Pop 33bdb3e0e6 [AArch64] add missing pattern for insert_subvector undef
The attached testcase started failing after the patch to define
isExtractSubvectorCheap with the following pattern mismatch:

ISEL: Starting pattern match
  Initial Opcode index to 85068
    Match failed at index 85076
    LLVM ERROR: Cannot select: t47: v8i16 = insert_subvector undef:v8i16, t43, Constant:i64<0>

The code generated from llvm/lib/Target/AArch64/AArch64InstrInfo.td

def : Pat<(insert_subvector undef, (v4i16 FPR64:$src), (i32 0)),
          (INSERT_SUBREG (v8i16 (IMPLICIT_DEF)), FPR64:$src, dsub)>;

is in ninja/lib/Target/AArch64/AArch64GenDAGISel.inc
At the location of the error it is:
/* 85076*/    OPC_CheckChild2Type, MVT::i32,

And it failed to match the type of operand 2.
Adding another def-pat for i64 fixes the failed def-pat error:

def : Pat<(insert_subvector undef, (v4i16 FPR64:$src), (i64 0)),
          (INSERT_SUBREG (v8i16 (IMPLICIT_DEF)), FPR64:$src, dsub)>;

llvm-svn: 326949
2018-03-07 22:07:13 +00:00
Craig Topper 80ec0c3106 [X86] Remove unused function argument. NFC
llvm-svn: 326939
2018-03-07 19:45:45 +00:00
Craig Topper c3c15dd640 [X86] Make the MUL->VPMADDWD work before op legalization on AVX1 targets. Simplify feature checks by using isTypeLegal.
The v8i32 conversion on AVX1 targets was only working after LowerMUL splits 256-bit vectors.

While I was there I've also made it so we don't have to check for AVX2 and BWI directly and instead just ask if the type is legal.

Differential Revision: https://reviews.llvm.org/D44190

llvm-svn: 326917
2018-03-07 17:53:18 +00:00
Krzysztof Parzyszek 2c3edf0567 [Hexagon] Rewrite non-HVX unaligned loads as pairs of aligned ones
This is a follow-up to r325169, this time for all types, not just HVX
vector types.

Disable this by default, since it's not always safe. 

llvm-svn: 326915
2018-03-07 17:27:18 +00:00
Farhana Aleen 89196642f7 [AMDGPU] Increased vector length for global/constant loads.
Summary: GCN ISA supports instructions that can read 16 consecutive dwords from memory through the scalar data cache;
         loadstoreVectorizer should take advantage of the wider vector length and pack 16/8 elements of dwords/quadwords.

Author: FarhanaAleen

Reviewed By: rampitec

Subscribers: llvm-commits, AMDGPU

Differential Revision: https://reviews.llvm.org/D44179

llvm-svn: 326910
2018-03-07 17:09:18 +00:00
Farhana Aleen 347d12b4ce Revert "[AMDGPU] Widened vector length for global/constant address space."
This reverts commit ce988cc100dc65e7c6c727aff31ceb99231cab03.

llvm-svn: 326907
2018-03-07 16:55:27 +00:00
Stefan Pintilie f8438e8e59 [PowerPC] LSR tunings for PowerPC
The purpose of this patch is to have LSR generate better code on Power.
This is done by overriding isLSRCostLess.

Differential Revision: https://reviews.llvm.org/D40855

llvm-svn: 326906
2018-03-07 16:53:09 +00:00
Farhana Aleen 0d03d0588d [AMDGPU] Widened vector length for global/constant address space.
llvm-svn: 326904
2018-03-07 16:29:05 +00:00
Simon Dardis 52ae4f078e [mips] Correct the definition of m(f|t)c(0|2)
These instructions are defined as taking a GPR register and a
coprocessor register for ISAs up to MIPS32. MIPS32 extended the
definition to allow a selector--a value from 0 to 32--to access
another register.

These instructions are now internally defined as being MIPS-I
instructions, but are rejected for pre-MIPS32 ISA's if they have
an explicit selector which is non-zero. This deviates slightly from
GAS's behaviour which rejects assembly instructions with an
explicit selector for pre-MIPS32 ISAs.

E.g:

mfc0 $4, $5, 0
is rejected by GAS for MIPS-I to MIPS-V but will be accepted
with this patch for MIPS-I to MIPS-V.

Reviewers: atanasyan

Differential Revision: https://reviews.llvm.org/D41662

llvm-svn: 326890
2018-03-07 11:39:48 +00:00
Sjoerd Meijer af30f06d5c [ARM] Fix for PR36577
Don't PerformSHLSimplify if the given node is used by a node that also uses a
constant because we may get stuck in an infinite combine loop.

bugzilla: https://bugs.llvm.org/show_bug.cgi?id=36577

Patch by Sam Parker.

Differential Revision: https://reviews.llvm.org/D44097

llvm-svn: 326882
2018-03-07 09:10:44 +00:00
Jonas Paulsson 91c853a79d [SystemZ] NFC refactoring in SystemZHazardRecognizer.
Use Reset() after emitting a call.

Review: Ulrich Weigand
llvm-svn: 326881
2018-03-07 08:57:09 +00:00
Jonas Paulsson 9b0f28f009 [SystemZ] Improve getCurrCycleIdx() in SystemZHazardRecognizer.
getCurrCycleIdx() returns the decoder cycle index which the next candidate SU
will be placed on.

This patch improves this method by passing the candidate SU to it so that if
SU will begin a new group, the index of that group is returned instead.

Review: Ulrich Weigand
llvm-svn: 326880
2018-03-07 08:54:32 +00:00
Jonas Paulsson e18dbeb24f [SystemZ] NFC refactoring in SystemZHazardRecognizer.
Handle the not-taken branch in emitInstruction() where the TakenBranch
argument is available. This is cleaner than relying on EmitInstruction().

Review: Ulrich Weigand
llvm-svn: 326879
2018-03-07 08:45:09 +00:00
Jonas Paulsson 61fbcf5825 [SystemZ] Improved debug dumping during post-RA scheduling.
Review: Ulrich Weigand
llvm-svn: 326878
2018-03-07 08:39:00 +00:00
Clement Courbet 327fac4d75 [X86] Add IMUL scheduling info on sandybridge, fix it on >=haswell.
Summary:
Only IMUL16rri uses an extra P0156. IMUL32* and IMUL16rr only use
P1.
This was computed using https://github.com/google/EXEgesis/blob/master/exegesis/tools/compute_itineraries.cc

This can easily be validated by running perf on the following code:

```
int main(int argc, char**argv) {
  int a = argc;
  int b = argc;
  int c = argc;
  int d = argc;

  for (int i = 0; i < LOOP_ITERATIONS; ++i) {
    asm volatile(
      R"(
        .rept 10000
        imull $0x2, %%edx, %%eax
        imull $0x2, %%ecx, %%ebx
        imull $0x2, %%eax, %%edx
        imull $0x2, %%ebx, %%ecx
        .endr
      )"
      : "+a"(a), "+b"(b), "+c"(c), "+d"(d)
      :
      :);
  }
  return a+b+c+d;
}
```
-> test.cc

perf stat -x, -e cycles --pfm-events=uops_executed_port:port_0:u,uops_executed_port:port_1:u,uops_executed_port:port_2:u,uops_executed_port:port_3:u,uops_executed_port:port_4:u,uops_executed_port:port_5:u,uops_executed_port:port_6:u,uops_executed_port:port_7:u test

Reviewers: craig.topper, RKSimon, gadi.haber

Subscribers: llvm-commits, gchatelet, chandlerc

Differential Revision: https://reviews.llvm.org/D43460

llvm-svn: 326877
2018-03-07 08:14:02 +00:00
Craig Topper 80d3bb3b4b [TargetLowering] Rename DAGCombinerInfo::isAfterLegalizeVectorOps to DAGCombiner::isAfterLegalizeDAG since that's what it checks. NFC
The code checks Level == AfterLegalizeDAG which is the fourth and last of the possible DAG combine stages that we have.

There is a Level called AfterLegalVectorOps, but that's the third DAG combine and it doesn't always run.

A function called isAfterLegalVectorOps should imply it returns true in either of the DAG combines that runs after the legalize vector ops stage, but that's not what this function does.

llvm-svn: 326832
2018-03-06 19:44:52 +00:00
Craig Topper 274e08dd81 [X86] Reject registers that require a REX prefix in inline asm constraints in 32-bit mode
We don't currently reject r8-r15 or xmm8-32 or bpl/spl/sil/dil in 32-bit mode.

Differential Revision: https://reviews.llvm.org/D44031

llvm-svn: 326826
2018-03-06 18:56:33 +00:00
Stanislav Mekhanoshin 0f72225433 [AMDGPU] Add default ISA version targets
In case if -mattr used to modify feature set bits in llvm-mc call
getIsaVersion can fail to identify specific ISA due to test mismatch.
Adding default fallback tests which will always correctly report at
least major version.

Differential Revision: https://reviews.llvm.org/D44163

llvm-svn: 326825
2018-03-06 18:33:55 +00:00
Sebastian Pop 41073e8046 [AArch64] define isExtractSubvectorCheap
Following the ARM-neon backend, define isExtractSubvectorCheap to return true
when extracting low and high part of a neon register.

The patch disables a test in llvm/test/CodeGen/AArch64/arm64-ext.ll This
testcase is fragile in the sense that it requires a BUILD_VECTOR to "survive"
all DAG transforms until ISelLowering. The testcase is supposed to check that
AArch64TargetLowering::ReconstructShuffle() works, and for that we need a
BUILD_VECTOR in ISelLowering. As we now transform the BUILD_VECTOR earlier into
an VEXT + vector_shuffle, we don't have the BUILD_VECTOR pattern when we get to
ISelLowering. As there is no way to disable the combiner to only exercise the
code in ISelLowering, the patch disables the testcase.

Differential revision: https://reviews.llvm.org/D43973

llvm-svn: 326811
2018-03-06 16:54:55 +00:00
Yaxun Liu 46439e8d4a [AMDGPU] Fix lowering OpenCL enqueue_kernel
One addrspacecast disappeared in clang emitted IR for
block invoke function due to adoption of the new
addr space mapping.

Differential Revision: https://reviews.llvm.org/D43785

llvm-svn: 326806
2018-03-06 16:04:39 +00:00
Simi Pallipurath 75c6bfeac9 [ARM]Decoding MSR with unpredictable destination register causes an assert
This patch handling:

    Enable parsing of raw encodings of system registers .
    Allows UNPREDICTABLE sysregs to be decoded to a raw number in the same way that disasslib does, rather than llvm crashing.
    Disassemble msr/mrs with unpredictable sysregs as SoftFail.
    Fix regression due to SoftFailing some encodings.

Patch by Chris Ryder

Differential revision:https://reviews.llvm.org/D43374

llvm-svn: 326803
2018-03-06 15:21:19 +00:00
Dylan McKay 8f46486c65 [AVR] Remove the earlyclobber flag from LDDWRdYQ
Before I started maintaining the AVR backend, this instruction
never originally used to have an earlyclobber flag.

Some time afterwards (years ago), I must've added it back in, not realising that it
was left out for a reason.

This pseudo instrction exists solely to work around a long standing bug
in the register allocator.

Before this commit, the LDDWRdYQ pseudo was not actually working around
any bug. With the earlyclobber flag removed again, the LDDWRdYQ pseudo
now correctly works around PR13375 again.

llvm-svn: 326774
2018-03-06 11:20:25 +00:00
Martin Storsjo a7adc3185b [X86] Handle EAX being live when calling chkstk for x86_64
EAX can turn out to be alive here, when shrink wrapping is done
(which is allowed when using dwarf exceptions, contrary to the
normal case with WinCFI).

This fixes PR36487.

Differential Revision: https://reviews.llvm.org/D43968

llvm-svn: 326764
2018-03-06 06:00:13 +00:00
Simon Pilgrim 53ff5ae8a1 [X86] cvttpd2dq lowering has been supported for some time
Tests in vec_fp_to_int.ll 

llvm-svn: 326751
2018-03-05 23:00:39 +00:00
Nemanja Ivanovic 6cc31ca814 [PowerPC] Do not emit record-form rotates when record-form andi suffices
Up until Power9, the performance profile for rlwinm., rldicl. and andi. looked
more or less equivalent. However with Power9, the rotates are still 2-way
cracked whereas the and-immediate is not.

This patch just ensures that we don't emit record-form rotates when an andi.
is adequate.

As first pointed out by Carrot in https://bugs.llvm.org/show_bug.cgi?id=30833
(this patch is a fix for that PR).

Differential Revision: https://reviews.llvm.org/D43977

llvm-svn: 326736
2018-03-05 19:27:16 +00:00
Sebastian Pop ac0bfb5938 fix PR36582
The error occurs when reading i16 elements (as in the testcase) from a v8i8
with a pattern of <0,2,4,6>. As all the data in the vector is accessed, the
operation is not a VUZP. The patch stops the pattern recognition of VUZP when
EXTRACT_VECTOR_ELT has a different element type than BUILD_VECTOR.

llvm-svn: 326722
2018-03-05 17:35:49 +00:00
Evandro Menezes cd855f70c5 [AArch64] Improve code generation of constant vectors
Use the whole gammut of constant immediates available to set up a vector.
Instead of using, for example, `mov w0, #0xffff; dup v0.4s, w0`, which
transfers between register files, use the more efficient `movi v0.4s, #-1`
instead.  Not limited to just a few values, but any immediate value that can
be encoded by all the variants of `FMOV`, `MOVI`, `MVNI`, thus eliminating
the need to there be patterns to optimize special cases.

Differential revision: https://reviews.llvm.org/D42133

llvm-svn: 326718
2018-03-05 17:02:47 +00:00
Matt Arsenault e31ab94e97 AMDGPU/GlobalISel: Add InstrMapping for G_EXTRACT
llvm-svn: 326715
2018-03-05 16:25:18 +00:00
Matt Arsenault 71272e6d4e AMDGPU/GlobalISel: Make some G_EXTRACTs legal
As far as I can tell legalization of weird sizes for the
output type isn't implemented.

llvm-svn: 326714
2018-03-05 16:25:15 +00:00
Matt Arsenault 4cc0b85276 AMDGPU: Fix build warning about override
llvm-svn: 326713
2018-03-05 16:25:10 +00:00
Alexander Timofeev 2e5eeceeb7 Pass Divergence Analysis data to Selection DAG to drive divergence
dependent instruction selection.

Differential revision: https://reviews.llvm.org/D35267

llvm-svn: 326703
2018-03-05 15:12:21 +00:00
Stefan Pintilie d45db612c6 [Power9] Add more missing instructions to the Power 9 scheduler
Adding more instructions using InstRW so that we can move away from ItinRW
and ultimately have a complete Power 9 scheduler.

llvm-svn: 326701
2018-03-05 14:34:59 +00:00
Oliver Stannard f20222a83c [ARM][Asm] VMOVSRR and VMOVRRS need sequential S registers
These instructions require that the two S registers are adjacent (but not the R
registers), because only the first register is included in the encoding, but we
were not checking this in the assembler.

Differential revision: https://reviews.llvm.org/D44084

llvm-svn: 326696
2018-03-05 13:27:26 +00:00
Thomas Preud'homme c699eaa311 Fix location of comment in EmitPopInst
Comment about folding return in LDM was not moved along with the
corresponding code in r242714. This commit fixes that.

llvm-svn: 326690
2018-03-05 11:49:00 +00:00
Craig Topper f546b2c06f [X86] Replace usages of X86Subtarget::hasFp256 with hasAVX. Remove hasFP256.
Almost none of these usages were FP specific. And we had no clear guideliness on when to use hasAVX vs hasFP256.

I might also remove hasInt256 too since its an alias for hasAVX2.

llvm-svn: 326682
2018-03-05 00:13:35 +00:00
Craig Topper f2aae62228 [X86] Add a DAG combine to turn stores of vXi1 constants into scalar stores.
llvm-svn: 326679
2018-03-04 19:33:15 +00:00
Simon Pilgrim 3a76bc29d7 [X86][MMX] Remove completed _mm_cvtsi32_si64 todo
rL322525 - mmx zero constant support
rL322553 - mmx i32 zero extended value
rL326497 - mmx i64 general constant handling

Not all constants are folded, we generate some on the GPRs (similar to SSE build vector) where appropriate

llvm-svn: 326673
2018-03-04 14:57:26 +00:00
Craig Topper 12c35e1940 [X86] Fix unused variable in release builds.
llvm-svn: 326672
2018-03-04 02:14:16 +00:00
Craig Topper a476026f70 [X86] Combine (store (v1i1 (scalar_to_vector (i8 X)))) -> (store (i8 X)).
llvm-svn: 326670
2018-03-04 01:48:02 +00:00
Craig Topper be31585be8 [X86] Lower v1i1/v2i1/v4i1/v8i1 load/stores to i8 load/store during op legalization if AVX512DQ is not supported.
We were previously doing this with isel patterns. Moving it to op legalization gives us chance to see the required bitcast earlier. And it lets us remove some isel patterns.

llvm-svn: 326669
2018-03-04 01:48:00 +00:00
Simon Pilgrim ad403a483a [X86] This bit-test TODO has been moved in PR36551
llvm-svn: 326658
2018-03-03 16:31:17 +00:00
Craig Topper d4b6601662 [X86] Remove 'else' after return. NFC
llvm-svn: 326642
2018-03-03 05:18:21 +00:00
Krzysztof Parzyszek e3e963236a [Hexagon] Generate valignb for shifting shuffles (instead of vdelta)
llvm-svn: 326627
2018-03-02 22:22:19 +00:00
Sameer AbuAsal 2646a41e54 [RISCV] Implement MC relaxations for compressed instructions.
Summary:
     This patch implements relaxation for RISCV in the MC layer.
      The following relaxations are currently handled:
      1) Relax C_BEQZ to BEQ and C_BNEZ to BNEZ in RISCV.
      2) Relax and C_J $imm  to JAL x0, $imm  and CJAL to JAL ra, $imm.

Reviewers: asb, llvm-commits, efriedma

Reviewed By: asb

Subscribers: shiva0217

Differential Revision: https://reviews.llvm.org/D43055

llvm-svn: 326626
2018-03-02 22:04:12 +00:00
Sam Clegg bd1716aed1 [WebAssembly] Avoid cast ExprType to wasm::ValType
This cast was causing invalid signatures to be written
for libcall functions.

Add an MC test which includes a call to builtin memcpy.

Differential Revision: https://reviews.llvm.org/D44037

llvm-svn: 326618
2018-03-02 21:33:14 +00:00
Heejin Ahn 0c69a3e3d9 Reland "[WebAssembly] More uses of uint8_t for single byte values"
Summary:
Original change was D43991 (rL326541) and was reverted by rL326571 and
rL326572. This adds also the necessary MCCodeEmitter patch.

Reviewers: sbc100

Subscribers: jfb, dschuff, sbc100, jgravelle-google, sunfish, llvm-commits, ncw

Differential Revision: https://reviews.llvm.org/D44034

llvm-svn: 326614
2018-03-02 20:52:59 +00:00
Ulrich Weigand db16beed8a [SystemZ] Allow LRV/STRV with volatile memory accesses
The byte-swapping loads and stores do not actually perform multiple
accesses to their memory operand, so they are OK to use with volatile
memory operands as well.  Remove overly cautious check.

llvm-svn: 326613
2018-03-02 20:51:59 +00:00
Ulrich Weigand 8b19be46c7 [SystemZ] Add support for anyregcc calling convention
This adds back-end support for the anyregcc calling convention
for use with patchpoints.

Since all registers are considered call-saved with anyregcc
(except for 0 and 1 which may still be clobbered by PLT stubs
and the like), this required adding support for saving and
restoring vector registers in prologue/epilogue code for the
first time.  This is not used by any other calling convention.

llvm-svn: 326612
2018-03-02 20:40:11 +00:00
Ulrich Weigand 5eb64110d2 [SystemZ] Support stackmaps and patchpoints
This adds back-end support for the @llvm.experimental.stackmap and
@llvm.experimental.patchpoint intrinsics.

llvm-svn: 326611
2018-03-02 20:39:30 +00:00
Ulrich Weigand 3206388870 [SystemZ] Fix common-code users of stack size
On SystemZ we need to provide a register save area of 160 bytes to
any called function.  This size needs to be added when allocating
stack in the function prologue.  However, it was not accounted for
as part of MachineFrameInfo::getStackSize(); instead the back-end
used a private routine getAllocatedStackSize().

This is OK for code-gen purposes, but it breaks other users of
the getStackSize() routine, in particular it breaks the recently-
added -stack-size-section feature.

Fix this by updating the main stack size tracked by common code
(in emitPrologue) instead of using the private routine.

No change in code generation intended.

llvm-svn: 326610
2018-03-02 20:38:41 +00:00
Ulrich Weigand 18f6930fef [SystemZ] Support vector registers in inline asm
This adds support for specifying vector registers for use with inline
asm statements, either via the 'v' constraint or by explicit register
names (v0 ... v31).

llvm-svn: 326609
2018-03-02 20:36:34 +00:00
Krzysztof Parzyszek f608812bde [Hexagon] Handle VACOPY in isel lowering
llvm-svn: 326599
2018-03-02 18:35:57 +00:00
Simon Pilgrim 8cbc1d232b [X86][BTVER2] Fix throughput of YMM bitwise instructions
These instructions are double-pumped, split into 2 128-bit ops and then passing through either FPU pipe.

Found while testing llvm-mca (D43951)

llvm-svn: 326597
2018-03-02 18:20:35 +00:00
Craig Topper 6b1419b547 [X86] Reject xmm16-31 in inline asm constraints when AVX512 is disabled
Fixes PR36532

Differential Revision: https://reviews.llvm.org/D43960

llvm-svn: 326596
2018-03-02 18:19:40 +00:00
Derek Schuff 57feeed307 [X86][x32] Save callee-save register used as base pointer for x32 ABI
For the x32 ABI, since the base pointer register (EBX) is a callee save register
it should be saved before use.

This fixes https://bugs.llvm.org/show_bug.cgi?id=36011

Differential Revision: https://reviews.llvm.org/D42358

Patch by Pratik Bhatu

llvm-svn: 326593
2018-03-02 17:46:39 +00:00
Benjamin Kramer 4925653555 [ARM] Fold variable into assert.
Avoids unused variable warnings in Release mode.

llvm-svn: 326592
2018-03-02 17:39:20 +00:00
Matt Arsenault b9699c009d AMDGPU/GlobalISel: InstrMapping for G_ZEXT
llvm-svn: 326589
2018-03-02 16:55:37 +00:00
Matt Arsenault 1c1aab99ae AMDGPU/GlobalISel: InstrMapping for G_TRUNC
llvm-svn: 326588
2018-03-02 16:55:33 +00:00
Matt Arsenault ef8db767d7 AMDGPU/GlobalISel: Define InstrMappings for G_FCMP
Patch by Tom Stellard

llvm-svn: 326587
2018-03-02 16:53:15 +00:00
Matt Arsenault 2607dc60de AMDGPU/GlobalISel: Define instruction mapping for @llvm.minnum
Patch by Tom Stellard

llvm-svn: 326586
2018-03-02 16:40:17 +00:00
Momchil Velikov 505614bb4f [ARM] Fix access to stack arguments when re-aligning SP in Armv6m
When an Armv6m function dynamically re-aligns the stack, access to incoming
stack arguments (and to stack area, allocated for register varargs) is done via
SP, which is incorrect, as the SP is offset by an unknown amount relative to the
value of SP upon function entry.

This patch fixes it, by making access to "fixed" frame objects be done via FP
when the function needs stack re-alignment.  It also changes the access to
"fixed" frame objects be done via FP (instead of using R6/BP) also for the case
when the stack frame contains variable sized objects. This should allow more
objects to fit within the immediate offset of the load instruction.

All of the above via a small refactoring to reuse the existing
`ARMFrameLowering::ResolveFrameIndexReference.`

Differential Revision: https://reviews.llvm.org/D43566

llvm-svn: 326584
2018-03-02 15:47:14 +00:00
Stefan Pintilie b5a9440a80 [Power9] Add missing instructions to the Power 9 scheduler
Adding more instructions using InstRW so that we can move away from ItinRW
and ultimately have a complete Power 9 scheduler.

llvm-svn: 326578
2018-03-02 14:41:38 +00:00
David Stenberg 3fb8c324b3 Test commit: Remove an extraneous space. NFC
Test commit access.

llvm-svn: 326573
2018-03-02 14:28:56 +00:00
Nicholas Wilson be28e61a03 Revert "[WebAssembly] More uses of uint8_t" and "[WebAssembly] Update tests"
This reverts commits r326541 and r326571.

The tests were correct, and were updated with incorrect expectations.
The original commit was broken and should be reverted to get things back
to a working state.

llvm-svn: 326572
2018-03-02 14:07:39 +00:00
Florian Hahn 9deef20b6c [ARM] Fix codegen for VLD3/VLD4/VST3/VST4 with WB
Code generation of VLD3, VLD4, VST3 and VST4 with register writeback is
broken due to 2 separate bugs:

1) VLD1d64TPseudoWB_register and VLD1d64QPseudoWB_register are missing
   rules to expand them to non pseudo MIR. These are selected for
   ARMISD::VLD3_UPD/VLD4_UPD with v1i64 vectors in SelectVLD.

2) Selection of the right VLD/VST instruction is broken for load and
   store of 3 and 4 v1i64 vectors. SelectVLD and SelectVST are called
   with MIR opcode for fixed writeback (ie increment is access size)
   and call getVLDSTRegisterUpdateOpcode() to select an opcode with
   register writeback if base register update is of a different size.
   Since getVLDSTRegisterUpdateOpcode() only knows about
   VLD1/VLD2/VST1/VST2 the call is currently conditional on the number
   of element in the vector.

   However, VLD1/VST1 is selected by SelectVLD/SelectVST's caller for
   load and stores of 3 or 4 v1i64 vectors. Therefore the opcode is not
   updated which later lead to a fixed writeback instruction being
   constructed with an extra operand for the register writeback.

This patch addresses the two issues as follows:
- it adds the necessary mapping from VLD1d64TPseudoWB_register and
  VLD1d64QPseudoWB_register to VLD1d64Twb_register and
  VLD1d64Qwb_register respectively. Like for the existing _fixed
  variants, the cost of these is bumped for unaligned access.
- it changes the logic in SelectVLD and SelectVSD to call isVLDfixed
  and isVSTfixed respectively to decide whether the opcode should be
  updated. It also reworks the logic and comments for pushing the
  writeback offset operand and r0 operand to clarify the logic:
  writeback offset needs to be pushed if it's a register writeback,
  r0 needs to be pushed if not and the instruction is a
  VLD1/VLD2/VST1/VST2.

Reviewers: rengolin, t.p.northover, samparker

Reviewed By: samparker

Patch by Thomas Preud'homme <thomas.preudhomme@arm.com>

Differential Revision: https://reviews.llvm.org/D42970

llvm-svn: 326570
2018-03-02 13:02:55 +00:00
Matt Arsenault b46c191c49 AMDGPU/GlobalISel: Define instruction mapping for @llvm.maxnum
Patch by Tom Stellard

llvm-svn: 326567
2018-03-02 12:23:00 +00:00
Simon Pilgrim c879aa7eab [X86] Remove old UNIMPLEMENTED list
All of these are implemented and have appropriate test coverage

llvm-svn: 326553
2018-03-02 11:59:37 +00:00
Heejin Ahn d684cb57f4 [WebAssembly] More uses of uint8_t for single byte values
Summary: It looks like this was missing from D43921.

Reviewers: sbc100

Subscribers: jfb, dschuff, jgravelle-google, sunfish, llvm-commits

Differential Revision: https://reviews.llvm.org/D43991

llvm-svn: 326541
2018-03-02 06:51:35 +00:00
Jan Vesely b283ea0f0f AMDGPU/GCN: Promote i16 ctpop
i16 capable ASICs do not support i16 operands for this instruction.
Add tablegen pattern to merge chained i16 additions.

Differential Revision: https://reviews.llvm.org/D43985

llvm-svn: 326535
2018-03-02 02:50:22 +00:00
Matt Arsenault 41d2e3d98e AMDGPU/GlobalISel: Define instruction mapping for G_FPTOSI
Patch by Tom Stellard

llvm-svn: 326534
2018-03-02 02:19:16 +00:00
Matt Arsenault b23041ad4d AMDGPU/GlobalISel: Define instruction mapping for G_FPTOUI
Patch by Tom Stellard

llvm-svn: 326533
2018-03-02 02:19:11 +00:00
Matt Arsenault 327d5fb2e5 AMDGPU/GlobalISel: Define instruction mapping for G_FMUL
llvm-svn: 326532
2018-03-02 02:17:01 +00:00
Matt Arsenault 5a9e834eac AMDGPU/GlobalISel: Define instruction mapping for G_FADD
Patch by Tom Stellard

llvm-svn: 326526
2018-03-02 01:22:13 +00:00
Matt Arsenault d99317f1b3 AMDGPU/GlobalISel: Define instruction mapping for G_SHL
Patch by Tom Stellard

llvm-svn: 326525
2018-03-02 01:22:10 +00:00
Matt Arsenault 3c7a123ccc AMDGPU/GlobalISel: Define instruction mapping for G_XOR
llvm-svn: 326524
2018-03-02 01:22:06 +00:00
Matt Arsenault c0f34c9e36 AMDGPU/GlobalISel: Define instruction mapping for G_AND
Patch by Tom Stellard

llvm-svn: 326523
2018-03-02 01:22:01 +00:00
Heejin Ahn e4a8deea84 [WebAssembly] Gather EH instructions in one place. NFC.
Summary:
- Gather EH instructions in one place for easy tracking (more will be
  added later)
- Variable name change

Reviewers: dschuff

Subscribers: jfb, sbc100, jgravelle-google, sunfish, llvm-commits

Differential Revision: https://reviews.llvm.org/D43742

llvm-svn: 326522
2018-03-02 01:03:40 +00:00
Yonghong Song 03e1c8b8f9 bpf: introduce -mattr=dwarfris to disable DwarfUsesRelocationsAcrossSections
Commit e4507fb8c94b ("bpf: disable DwarfUsesRelocationsAcrossSections")
disables MCAsmInfo DwarfUsesRelocationsAcrossSections unconditionally
so that dwarf will not use cross section (between dwarf and symbol table)
relocations. This new debug format enables pahole to dump structures
correctly as libdwarves.so does not have BPF backend support yet.

This new debug format, however, breaks bcc (https://github.com/iovisor/bcc)
source debug output as llvm in-memory Dwarf support has some issues to
handle it. More specifically, with DwarfUsesRelocationsAcrossSections
disabled, JIT compiler does not generate .debug_abbrev and Dwarf
DIE (debug info entry) processing is not happy about this.

This patch introduces a new flag -mattr=dwarfris
(dwarf relocation in section) to disable DwarfUsesRelocationsAcrossSections.
DwarfUsesRelocationsAcrossSections is true by default.

Signed-off-by: Yonghong Song <yhs@fb.com>
llvm-svn: 326505
2018-03-01 23:04:59 +00:00
Simon Pilgrim 90fd0622b6 [X86][MMX] Improve handling of 64-bit MMX constants
64-bit MMX constant generation usually ends up lowering into SSE instructions before being spilled/reloaded as a MMX type.

This patch bitcasts the constant to a double value to allow correct loading directly to the MMX register.

I've added MMX constant asm comment support to improve testing, it's better to always print the double values as hex constants as MMX is mainly an integer unit (and even with 3DNow! its just floats).

Differential Revision: https://reviews.llvm.org/D43616

llvm-svn: 326497
2018-03-01 22:22:31 +00:00
Krzysztof Parzyszek c5e0ed109d [Hexagon] Add trap1 instruction
llvm-svn: 326492
2018-03-01 21:54:08 +00:00
Matt Arsenault 364f12e8f9 AMDGPU/GlobalISel: Define instruction mapping for @llvm.amdgcn.cvt.pkrtz
Patch by Tom Stellard

llvm-svn: 326490
2018-03-01 21:25:30 +00:00
Matt Arsenault 5320ee4a05 AMDGPU/GlobalISel: Define instruction mapping for G_OR
Patch by Tom Stellard

llvm-svn: 326489
2018-03-01 21:25:25 +00:00
Matt Arsenault e65404f5c5 AMDGPU/GlobalISel: Remove default register mapping
This crashes for some opcodes, which prevents the SelectionDAG
fallback from working.

Patch by Tom Stellard

llvm-svn: 326487
2018-03-01 21:20:44 +00:00
Evandro Menezes 2bbb4a7c93 [AArch64] Clean up code (NFC)
Clean up a couple of functions in `AArch64TargetLowering` by removing
redundant statements.

llvm-svn: 326486
2018-03-01 21:17:36 +00:00
Matt Arsenault 1422a19a88 AMDGPU/GlobalISel: Use a more correct getValueMapping
This was finding the wrong size registers for anything with
more than 2 components.

Patch by Tom Stellard

llvm-svn: 326483
2018-03-01 21:08:51 +00:00
Matt Arsenault 62669ede94 AMDGPU/GlobalISel: Define instruction mapping for G_BITCAST
Patch by Tom Stellard

llvm-svn: 326482
2018-03-01 20:59:44 +00:00
Matt Arsenault 0529a8e2de AMDGPU/GlobalISel: Mark i32->i64 zext as legal
llvm-svn: 326481
2018-03-01 20:56:21 +00:00
Martin Storsjo c61ff3bef1 [AArch64] Add support for secrel add/load/store relocations for COFF
Differential Revision: https://reviews.llvm.org/D43288

llvm-svn: 326480
2018-03-01 20:42:28 +00:00
Matt Arsenault 36b99e1937 AMDGPU/GlobalISel: InstrMapping for llvm.amdgcn.exp.compr
Patch by Tom Stellard

llvm-svn: 326479
2018-03-01 20:40:55 +00:00
Matt Arsenault 8931bbf8df AMDGPU/GlobalISel: Define instruction mapping for @llvm.amdgcn.exp
Patch by Tom Stellard

llvm-svn: 326477
2018-03-01 20:24:37 +00:00
Matt Arsenault 50721ab325 AMDGPU/GlobalISel: Define InstrMappings for G_ICMP
Patch by Tom Stellard

llvm-svn: 326472
2018-03-01 19:27:10 +00:00
Matt Arsenault dc14ec05d4 AMDGPU/GlobalISel: Make i32 mul legal
llvm-svn: 326471
2018-03-01 19:22:05 +00:00
Matt Arsenault 06cbb27a79 AMDGPU/GlobalISel: Define instruction mapping for G_IMPLICIT_DEF
Patch by Tom Stellard

llvm-svn: 326470
2018-03-01 19:16:52 +00:00
Matt Arsenault e3d9ecf2b9 AMDGPU/GlobalISel: Define instruction mapping for G_FCONSTANT
Patch by Tom Stellard

llvm-svn: 326468
2018-03-01 19:13:30 +00:00
Matt Arsenault 51b0b20023 AMDGPU/GlobalISel: Add copyCost for VGPR->SGPR copies
Patch by Tom Stellard

llvm-svn: 326467
2018-03-01 19:09:25 +00:00
Matt Arsenault 3f6a204eaa AMDGPU/GlobalISel: Make i32 xor legal
llvm-svn: 326466
2018-03-01 19:09:21 +00:00
Matt Arsenault 8e80a5fbca AMDGPU/GlobalISel: Mark 32/64-bit G_FCMP as legal
Patch by Tom Stellard

llvm-svn: 326465
2018-03-01 19:09:16 +00:00
Matt Arsenault dd022ce064 AMDGPU/GlobalISel: Mark 32-bit G_FPTOSI as legal
Patch by Tom Stellard

llvm-svn: 326464
2018-03-01 19:04:25 +00:00
Sam Clegg 503fdea3cb [WebAssembly] Fix broken gcc build after rL326454
The gcc builders were broken by rL326454
See: https://reviews.llvm.org/D43921

llvm-svn: 326460
2018-03-01 18:48:08 +00:00
Artem Belevich 8c9749b1dc [NVPTX] use pattern matching to lower int_nvvm_match_all_sync*.
Now that patterns can handle intrinsics returning multiple results,
use tablegen'ed pattern matching instead of custom lowering.

Differential Revision: https://reviews.llvm.org/D43890

llvm-svn: 326457
2018-03-01 18:28:45 +00:00
Sam Clegg 03e101f1b0 [WebAssembly] Use uint8_t for single byte values to match the spec
The original BinaryEncoding.md document used to specify that
these values were `varint7`, but the official spec lists them
explicitly as single byte values and not LEB.

A similar change for wabt is in flight:
 https://github.com/WebAssembly/wabt/pull/782

Differential Revision: https://reviews.llvm.org/D43921

llvm-svn: 326454
2018-03-01 18:06:21 +00:00
Alexander Timofeev 0081d23fd8 [AMDGPU] : fix for the crash in SIRegisterInfo when the regiser class not found
Differential revision: https://reviews.llvm.org./D43334

llvm-svn: 326451
2018-03-01 17:36:43 +00:00
Krzysztof Parzyszek 22a21d4c5d [Hexagon] Add guest registers
llvm-svn: 326450
2018-03-01 17:03:26 +00:00
Stefan Pintilie e894e0ff6f [Power9] Add missing instructions to the Power 9 scheduler
Adding more instructions using InstRW so that we can move away from ItinRW
and ultimately have a complete Power 9 scheduler.

Differential Revision: https://reviews.llvm.org/D43899

llvm-svn: 326447
2018-03-01 16:16:08 +00:00
Sebastian Pop c33af715d7 [AArch64] generate vuzp instead of mov
when a BUILD_VECTOR is created out of a sequence of EXTRACT_VECTOR_ELT with a
specific pattern sequence, either <0, 2, 4, ...> or <1, 3, 5, ...>, replace the
BUILD_VECTOR with either vuzp1 or vuzp2.

With this patch LLVM generates the following code for the first function fun1 in the testcase:
adrp x8, .LCPI0_0
ldr  q0, [x8, :lo12:.LCPI0_0]
tbl  v0.16b, { v0.16b }, v0.16b
ext  v1.16b, v0.16b, v0.16b, #8
uzp1 v0.8b, v0.8b, v1.8b
str  d0, [x8]
ret

Without this patch LLVM currently generates this code:
adrp    x8, .LCPI0_0
ldr     q0, [x8, :lo12:.LCPI0_0]
tbl     v0.16b, { v0.16b }, v0.16b
mov     v1.16b, v0.16b
mov     v1.b[1], v0.b[2]
mov     v1.b[2], v0.b[4]
mov     v1.b[3], v0.b[6]
mov     v1.b[4], v0.b[8]
mov     v1.b[5], v0.b[10]
mov     v1.b[6], v0.b[12]
mov     v1.b[7], v0.b[14]
str     d1, [x8]
ret

llvm-svn: 326443
2018-03-01 15:47:39 +00:00
Craig Topper cb7881c649 [X86] Stop passing two arguments by reference. NFC
I think these used to be out parameters, but they haven't been for a while.

llvm-svn: 326417
2018-03-01 06:25:13 +00:00
Craig Topper ccfa5257a6 [X86] Make sure we don't combine (fneg (fma X, Y, Z)) to a target specific node when there are no FMA instructions.
This would cause a 'cannot select' error at isel when we should have emitted a lib call and an xor.

Fixes PR36553.

llvm-svn: 326393
2018-03-01 00:08:38 +00:00
Justin Lebar faaf2d298e [NVPTX] Lower loads from global constants using ld.global.nc (aka LDG).
Summary:
After D43914, loads from global variables in addrspace(1) happen with
ld.global.  But since they're constants, even better would be to use
ld.global.nc, aka ldg.

Reviewers: tra

Subscribers: jholewinski, sanjoy, hiraditya, llvm-commits

Differential Revision: https://reviews.llvm.org/D43915

llvm-svn: 326390
2018-02-28 23:58:05 +00:00
Justin Lebar 5a7de898d2 [NVPTX] Use addrspacecast instead of target-specific intrinsics in NVPTXGenericToNVVM.
Summary:
NVPTXGenericToNVVM was using target-specific intrinsics to do address
space casts.  Using the addrspacecast instruction is (a lot) simpler.
But it also has the advantage of being understandable to other passes.
In particular, InferAddrSpaces is able to understand these address space
casts and remove them in most cases.

Reviewers: tra

Subscribers: jholewinski, sanjoy, hiraditya, llvm-commits

Differential Revision: https://reviews.llvm.org/D43914

llvm-svn: 326389
2018-02-28 23:57:48 +00:00
Craig Topper e31b9d1e5f [X86] Lower extract_element from k-registers by bitcasting from v16i1 to i16 and extending/truncating.
This is equivalent to what isel was doing anyway but by canonicalizing earlier we can remove some patterns.

llvm-svn: 326375
2018-02-28 22:23:55 +00:00
Simon Pilgrim 72b86586b0 [X86][AVX512] Improve support for signed saturation truncation stores
Matches what we already manage for unsigned saturation truncation stores

Differential Revision: https://reviews.llvm.org/D43629

llvm-svn: 326372
2018-02-28 21:42:19 +00:00
Krzysztof Parzyszek b1cdb60e75 [Hexagon] Implement target feature +reserved-r19
llvm-svn: 326364
2018-02-28 20:29:36 +00:00
Tim Renouf 2a99fa2c08 [AMDGPU] added writelane intrinsic
Summary:
For use by LLPC SPV_AMD_shader_ballot extension.

The v_writelane instruction was already implemented for use by SGPR
spilling, but I had to add an extra dummy operand tied to the
destination, to represent that all lanes except the selected one keep
the old value of the destination register.

.ll test changes were due to schedule changes caused by that new
operand.

Differential Revision: https://reviews.llvm.org/D42838

llvm-svn: 326353
2018-02-28 19:10:32 +00:00
Artem Belevich 18a7c51520 [NVPTX] Removed always-true predicates in NVPTX.
NVPTX stopped supporting GPUs older than sm_20 (Fermi) quite a while back.
Removal of support of pre-Fermi GPUs made a lot of predicates in the NVPTX
backend pointless as they can't ever be false any more.
It's time to retire them. NFC intended.

Differential Revision: https://reviews.llvm.org/D43843

llvm-svn: 326349
2018-02-28 18:51:22 +00:00
Chih-Hung Hsieh 9f9e4681ac [TLS] use emulated TLS if the target supports only this mode
Emulated TLS is enabled by llc flag -emulated-tls,
which is passed by clang driver.
When llc is called explicitly or from other drivers like LTO,
missing -emulated-tls flag would generate wrong TLS code for targets
that supports only this mode.
Now use useEmulatedTLS() instead of Options.EmulatedTLS to decide whether
emulated TLS code should be generated.
Unit tests are modified to run with and without the -emulated-tls flag.

Differential Revision: https://reviews.llvm.org/D42999

llvm-svn: 326341
2018-02-28 17:48:55 +00:00
Pablo Barrio 512f7ee315 [ARM] Lower lower saturate to 0 and lower saturate to -1 using bit-operations
Summary:
Expressions of the form x < 0 ? 0 :  x; and x < -1 ? -1 : x can be lowered using bit-operations instead of branching or conditional moves

In thumb-mode this results in a two-instruction sequence, a shift followed by a bic or or while in ARM/thumb2 mode that has flexible second operand the shift can be folded into a single bic/or instructions. In most cases this results in smaller code and possibly less branches, and in no case larger than before.

Patch by Martin Svanfeldt

Reviewers: fhahn, pbarrio, rogfer01

Reviewed By: pbarrio, rogfer01

Subscribers: chrib, yroux, eugenis, efriedma, rogfer01, aemerson, javed.absar, kristof.beyls, llvm-commits

Differential Revision: https://reviews.llvm.org/D42574

llvm-svn: 326333
2018-02-28 17:13:07 +00:00
Simon Dardis 4529aac2de [mips] Begin reworking instruction predicates for ISAs/encodings (1/N)
The MIPS backend has inconsistent usage of instruction predicates
for assembly and code generation. The issue arises from supporting three
encodings, two (MIPS and microMIPS) of which have a near 1:1 instruction
mapping across ISA revisions and a third encoding with a more restricted
set of instructions (MIPS16e).

To enforce consistent usage, each of the ISA_* adjectives has (or will
have) the relevant encoding attached to it along the relevant ISA revision
where the instruction is defined.

Each instruction, pattern or alias will then have the correct ISA adjective
attached to it, and the base instruction description classes will have any
predicates relating to ISA encoding or revision removed.

Pseudo instructions will also be guarded for the encoding or ABI that they are
supported in.

Finally, the hasStandardEncoding() / inMicroMipsMode() / inMips16Mode() methods
of MipsSubtarget will be changed such that only one can be true at any one time.

The result of this is that code generation and assembly will produce the
correct encoding up front, while code generated from pseudo instructions
and other inserted sequences of instructions will be able to rely on the mapping
tables to produce the correct encoding. This should fix numerous bugs where
the result 'happens' to be correct but has edge cases where microMIPS and MIPS
have subtle differences (e.g. microMIPSR6 using 'j', 'jal' instructions.)

This patch starts the process by changing most of the ISA adjectives to make
use of the EncodingPredicate member of PredicateControl. Follow on patches
will annotate instructions with their correct ISA adjective and eliminate
the usage of "let Predicates = [..]", "let AdditionalPredicates = [..]" and
"isCodeGenOnly = 1" in the cases where it was used to control instruction
availability.

Contributions from Nitesh Jain.

Reviewers: atanasyan

Differential Revision: https://reviews.llvm.org/D41434

llvm-svn: 326322
2018-02-28 13:02:44 +00:00
Alexander Ivchenko c01f750480 [GlobalIsel][X86] Support G_INTTOPTR instruction.
Add legalization/selection for x86/x86_64 and
corresponding tests.

Reviewed By: igorb

Differential Revision: https://reviews.llvm.org/D43622

llvm-svn: 326320
2018-02-28 12:11:53 +00:00
Alexander Ivchenko 46e07e3623 [GlobalIsel][X86] Support G_PTRTOINT instruction.
Add legalization/selection for x86/x86_64 and
corresponding tests.

Reviewed By: igorb

Differential Revision: https://reviews.llvm.org/D43617

llvm-svn: 326311
2018-02-28 09:18:47 +00:00
Craig Topper 48d5ed265c [X86] Don't use EXTRACT_ELEMENT from v1i1 with i8/i32 result type when we need to guarantee zeroes in the upper bits of return.
An extract_element where the result type is larger than the scalar element type is semantically an any_extend of from the scalar element type to the result type. If we expect zeroes in the upper bits of the i8/i32 we need to mae sure those zeroes are explicit in the DAG.

For these cases the best way to accomplish this is use an insert_subvector to pad zeroes to the upper bits of the v1i1 first. We extend to either v16i1(for i32) or v8i1(for i8). Then bitcast that to a scalar and finish with a zero_extend up to i32 if necessary. We can't extend past v16i1 because that's the largest mask size on KNL. But isel is smarter enough to know that a zext of a bitcast from v16i1 to i16 can use a KMOVW instruction. The insert_subvectors will be dropped during isel because we can determine that the producing instruction already zeroed the upper bits of the k-register.

llvm-svn: 326308
2018-02-28 08:14:28 +00:00
Craig Topper ac799b05d4 [X86] Change the masked FPCLASS implementation to use AND instead of OR to combine the mask results.
While the description for the instruction does mention OR, its talking about how the individual classification test results are ORed together.

The incoming mask is used as a zeroing write mask. If the bit is 1 the classification is written to the output. The bit is 0 the output is 0. This equivalent to an AND.

Here is pseudocode from the intrinsics guide

FOR j := 0 to 1
        i := j*64
        IF k1[j]
                k[j] := CheckFPClass_FP64(a[i+63:i], imm8[7:0])
        ELSE
                k[j] := 0
        FI
ENDFOR
k[MAX:2] := 0

llvm-svn: 326306
2018-02-28 06:19:55 +00:00
Andrew Zhogin f8e88af11d [ARM] Cortex-A57 scheduler fix for ARM backend (missed 16-bit, v8.1/v8.2/v8.3, thumb and pseudo instructions)
Added missed scheduling info for ARM Cortex A57 (AArch32) to have CompleteModel with this checkCompleteness fix: https://reviews.llvm.org/D43235.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D43808

llvm-svn: 326304
2018-02-28 05:53:18 +00:00
Krzysztof Parzyszek 2373f8fcf3 [Hexagon] Recognize more sign-extensions as inputs to 32x32-bit multiply
llvm-svn: 326263
2018-02-27 22:44:41 +00:00
Konstantin Zhuravlyov 40b09e86b9 AMDGPU: Add fast fmaf feature to gfx702
Differential Revision: https://reviews.llvm.org/D43790

llvm-svn: 326252
2018-02-27 21:46:15 +00:00
Sjoerd Meijer fc0d02cbbf [ARM] Another f16 litpool fix
We were always setting the block alignment to 2 bytes in Thumb mode
and 4-bytes in ARM mode (r325754, and r325012), but this could cause 
reducing the block alignment when it already had been aligned (e.g. 
in Thumb mode when the block is a CPE that was already 4-byte aligned).

Patch by Momchil Velikov, I've only added a test.

Differential Revision: https://reviews.llvm.org/D43777

llvm-svn: 326232
2018-02-27 19:26:02 +00:00
Craig Topper 688d1eb919 Revert r326225 "[X86] Move the load folding tables to a separate .inc file"
The bots don't seem to like the .inc file. I must be missing some cmake incantation.

llvm-svn: 326228
2018-02-27 19:15:40 +00:00
Peter Collingbourne e8436e8631 ARM: Don't rewrite add reg, $sp, 0 -> mov reg, $sp if the add defines CPSR.
Differential Revision: https://reviews.llvm.org/D43807

llvm-svn: 326226
2018-02-27 19:00:59 +00:00
Craig Topper c0a1291478 [X86] Move the load folding tables to a separate .inc file
These tables add 3000 lines to X86InstrInfo.cpp. And if we ever manage to auto generate them they'll be a separate file anyway.

Differential Revision: https://reviews.llvm.org/D43806

llvm-svn: 326225
2018-02-27 18:46:11 +00:00
Krzysztof Parzyszek d70f5a0eb4 [Hexagon] Add patterns for compares of i1 values
llvm-svn: 326220
2018-02-27 18:31:46 +00:00
Simon Pilgrim ba43ec8702 [X86][AVX] combineLoopMAddPattern - support 256-bit cases on AVX1 via SplitBinaryOpsAndApply
llvm-svn: 326189
2018-02-27 12:20:37 +00:00
Jonas Paulsson f268cd0aad [SystemZ] Make sure SelectCode() is not called on a target opcode.
Since getNode() might not always return the requsted opcode, for instance if
called with (ISD::AND, -1) arguments, there should be a check so that
SelectCode() is only called when appropriate.

Review: Ulrich Weigand
llvm-svn: 326178
2018-02-27 07:53:23 +00:00
Craig Topper 264707bae4 [X86] Simplify if condition. NFC
SSE2 implies SSE1 and we already covered f32 in the SSE1 check so we don't need to check f32 in the SSE2 check.

llvm-svn: 326170
2018-02-27 06:00:38 +00:00
Craig Topper fcaa0323ec [X86] Replace an impossible if condition with an assert.
llvm-svn: 326167
2018-02-27 03:50:00 +00:00
Aditya Nandakumar 599990530e [GISel]: Don't assert when constraining RegisterOperands which are uses.
Currently we assert that only non target specific opcodes can have
missing RegisterClass constraints in the MCDesc. The backend can have
instructions with register operands but don't have RegisterClass
constraints (say using unknown_class) in which case the instruction
defining the register will constrain it.
Change the assert to only fire if a def has no regclass.

https://reviews.llvm.org/D43409

llvm-svn: 326142
2018-02-26 22:56:21 +00:00
Simon Pilgrim 9929f90740 [X86][SSE] Reduce FADD/FSUB/FMUL costs on later targets (PR36280)
Agner's tables indicate that for SSE42+ targets (Core2 and later) we can reduce the FADD/FSUB/FMUL costs down to 1, which should fix the Himeno benchmark.

Note: the AVX512 FDIV costs look rather dodgy, but this isn't part of this patch.

Differential Revision: https://reviews.llvm.org/D43733

llvm-svn: 326133
2018-02-26 22:10:17 +00:00
Craig Topper e5d39e42b9 [X86] Add constant folding to combineMOVMSK.
There's still some shortcoming in our ability to combine binops of constants with different sizes separated by an extend. I'll try to look at that next.

llvm-svn: 326128
2018-02-26 21:17:33 +00:00
Craig Topper 5e0ceb8865 [X86] Add a custom legalization for (i16 (bitcast v16i1)) and (i32 (bitcast v32i1)) without AVX512 to prevent scalarization
Summary:
We have an early DAG combine to turn these patterns into MOVMSK, but that combine doesn't work if the vXi1 type has more elements than the widest legal vXi8 type. Type legalization will eventually split it down to v16i1 or v32i1 and then the bitcast gets legalized to a truncstore and a scalar load. The truncstore will get lowered to a series of extracts and bit math.

This patch adds a custom legalization to use a sign extend and MOVMSK instead. This prevents the eventual scalarization.

Reviewers: spatel, RKSimon, zvi

Reviewed By: RKSimon

Subscribers: mgorny, llvm-commits

Differential Revision: https://reviews.llvm.org/D43593

llvm-svn: 326119
2018-02-26 20:32:27 +00:00
Simon Pilgrim db0ed7d724 [X86][AVX] createPSADBW - support 256-bit cases on AVX1 via SplitBinaryOpsAndApply
llvm-svn: 326104
2018-02-26 18:17:25 +00:00
Matt Arsenault 2a26a286db AMDGPU/GlobalISel: Make f64 constants legal
llvm-svn: 326101
2018-02-26 17:20:43 +00:00
Tim Renouf 832f90fa0c [AMDGPU] Scratch setup fix on AMDPAL gfx9+ merge shader
Summary:
With OS type AMDPAL, the scratch descriptor is hardwired to be loaded
from offset 0 of the global information table, whose low pointer is
passed in s0. For a merge shader on gfx9+, it needs to be s8 instead, as
the hardware reserves s0-s7.

Reviewers: kzhuravl

Subscribers: arsenm, nhaehnle, dstuttard, llvm-commits, t-tye, yaxunl, wdng, kzhuravl

Differential Revision: https://reviews.llvm.org/D42203

llvm-svn: 326088
2018-02-26 14:46:43 +00:00
Benjamin Kramer b84e158df7 [WebAssembly] Relax constexpr for old standard libraries.
This will still be constexpr when the standard library supports it, but
doesn't force constexpr. Old libraries will get a global constructor,
which is not too bad.

llvm-svn: 326080
2018-02-26 11:07:25 +00:00
Jonas Paulsson b1e81479e9 [XCore] Return true in enableMultipleCopyHints().
Enable multiple COPY hints to eliminate more COPYs during register allocation.

Note that this is something all targets should do, see
https://reviews.llvm.org/D38128.

Review: Robert Lytton
llvm-svn: 326069
2018-02-26 08:03:32 +00:00
Craig Topper 5c980eba47 [X86] Don't use getZExtValue when we have no idea how large the input elements are.
llvm-svn: 326066
2018-02-26 04:43:24 +00:00
Craig Topper 2286058f46 [X86] Use SelectionDAG::SplitVectorOperand to simplify some code. NFC
llvm-svn: 326065
2018-02-26 02:16:34 +00:00
Craig Topper 2bf8e3e0e1 [X86] Simplify the ReplaceNodeResults code for X86ISD::AVG.
This code seemed to try to widen to 128, 256, or 512 bit vectors, but we only create X86ISD::AVG with a power of 2 number of elements. This means the only nodes that need to be legalized are less than 128-bits and need to be widened up to 128 bits.

llvm-svn: 326064
2018-02-26 02:16:33 +00:00
Craig Topper 79d189f597 [X86] Remove VT.isSimple() check from detectAVGPattern.
Which types are considered 'simple' is a function of the requirements of all targets that LLVM supports. That shouldn't directly affect what types we are able to handle. The remainder of this code checks that the number of elements is a power of 2 and takes care of splitting down to a legal size.

llvm-svn: 326063
2018-02-26 02:16:31 +00:00
Craig Topper 6694df14e6 [X86] Use SDNode instead of SDPatternOperator. NFC
llvm-svn: 326048
2018-02-25 06:21:04 +00:00
Craig Topper 81c0eaf4c8 [X86] Allow int_x86_sse2_cvtps2dq and int_x86_avx_cvt_ps2dq_256 to select EVEX encoded instructions.
llvm-svn: 326041
2018-02-24 18:58:07 +00:00
Simon Pilgrim a4fb569483 [X86][SSE] combineSubToSubus - support v8i64 handling from SSSE3
Our UMIN/UMAX, vector truncation and shuffle combining is good enough to efficiently handle v8i64 with the number of leading zeros that are necessary for PSUBUS.

llvm-svn: 326034
2018-02-24 14:06:39 +00:00
Simon Pilgrim 8ad91261e8 [X86][SSE] combineSubToSubus - support v8i32 handling from SSSE3 (not SSE41)
Now that UMIN etc are Legal/Custom for SSE2+, we can efficiently match SUBUS v8i32 cases from SSSE3 which can perform efficient truncation with PSHUFB.

llvm-svn: 326033
2018-02-24 13:39:13 +00:00
Simon Pilgrim 744f008a75 [X86][SSE] combineSubToSubus - begun generalizing to work with any type sizes with SplitBinaryOpsAndApply
llvm-svn: 326030
2018-02-24 12:44:12 +00:00
Simon Pilgrim 51ce2ed367 Fix spelling in comment. NFCI.
llvm-svn: 326029
2018-02-24 12:27:02 +00:00
Jonas Paulsson 8ff0773b13 [Sparc] Return true in enableMultipleCopyHints().
Enable multiple COPY hints to eliminate more COPYs during register allocation.

Note that this is something all targets should do, see
https://reviews.llvm.org/D38128.

Review: James Y Knight
llvm-svn: 326028
2018-02-24 08:24:31 +00:00
Craig Topper 161c805da4 [X86] Use SelectionDAG::getNot instead of implementing manually. NFC
llvm-svn: 326020
2018-02-24 03:15:54 +00:00
Stanislav Mekhanoshin fa48c496e2 [AMDGPU] Shrinking V_SUBBREV_U32
V_SUBBREV_U32 is a commute opcode for V_SUBB_U32. However, when
we try to commute V_SUBB_U32 in order to shrink it we do not then
process V_SUBBREV_U32 and it stay VOP3. This is fixed.

Differential Revision: https://reviews.llvm.org/D43699

llvm-svn: 326011
2018-02-24 01:32:32 +00:00
Heejin Ahn 9386bde11b [WebAssembly] Add exception handling option and feature
Summary:
Add a llc command line option and WebAssembly architecture feature for
exception handling.

Reviewers: dschuff

Subscribers: jfb, sbc100, jgravelle-google, sunfish, llvm-commits

Differential Revision: https://reviews.llvm.org/D43683

llvm-svn: 326004
2018-02-24 00:40:50 +00:00
Craig Topper 7bcac492d4 [X86] Remove checks for '(scalar_to_vector (i8 (trunc GR32:)))' from scalar masked move patterns.
This portion can be matched by other patterns. We don't need it to make the larger pattern valid. It's sufficient to have a v1i1 mask input without caring where it came from.

llvm-svn: 325999
2018-02-24 00:15:05 +00:00
Yonghong Song 60fed1fef0 bpf: New optimization pass for eliminating unnecessary i32 promotions
This pass performs peephole optimizations to cleanup ugly code sequences at
MachineInstruction layer.

Currently, the only optimization in this pass is to eliminate type
promotion
sequences for zero extending 32-bit subregisters to 64-bit registers.

If the compiler could prove the zero extended source come from 32-bit
subregistere then it is safe to erase those promotion sequece, because the
upper half of the underlying 64-bit registers were zeroed implicitly
already.

Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
Reviewed-by: Yonghong Song <yhs@fb.com>
llvm-svn: 325991
2018-02-23 23:49:32 +00:00
Yonghong Song ae961bb061 bpf: New decoder namespace for 32-bit subregister load/store
When -mattr=+alu32 passed to the disassembler, use decoder namespace for
32-bit subregister.

This is to disassemble load and store instructions in preferred B format
as described in previous commit:

      w = *(u8 *) (r + off) // BPF_LDX | BPF_B
      w = *(u16 *)(r + off) // BPF_LDX | BPF_H
      w = *(u32 *)(r + off) // BPF_LDX | BPF_W

      *(u8 *) (r + off) = w // BPF_STX | BPF_B
      *(u16 *)(r + off) = w // BPF_STX | BPF_H
      *(u32 *)(r + off) = w // BPF_STX | BPF_W

NOTE: all other instructions should still use the default decoder
      namespace.

Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
Reviewed-by: Yonghong Song <yhs@fb.com>
llvm-svn: 325990
2018-02-23 23:49:31 +00:00
Yonghong Song ca31c3bb3f bpf: Enable 32-bit subregister support for -mattr=+alu32
After all those preparation patches, now we could enable 32-bit subregister
support once -mattr=+alu32 specified.

Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
Reviewed-by: Yonghong Song <yhs@fb.com>
llvm-svn: 325989
2018-02-23 23:49:30 +00:00
Yonghong Song fcd1e0f625 bpf: Support 32-bit subregister in various InstrInfo hooks
This patch support 32-bit subregister in three InstrInfo hooks, i.e.
copyPhysReg, loadRegFromStackSlot and storeRegToStackSlot,

Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
Reviewed-by: Yonghong Song <yhs@fb.com>
llvm-svn: 325988
2018-02-23 23:49:29 +00:00
Yonghong Song b1a52bd756 bpf: New instruction patterns for 32-bit subregister load and store
The instruction mapping between eBPF/arm64/x86_64 are:

         eBPF              arm64        x86_64
LD1   BPF_LDX | BPF_B       ldrb        movzbl
LD2   BPF_LDX | BPF_H       ldrh        movzwl
LD4   BPF_LDX | BPF_W       ldr         movl

movzbl/movzwl/movl on x86_64 accept 32-bit sub-register, for example %eax,
the same for ldrb/ldrh on arm64 which accept 32-bit "w" register. And
actually these instructions only accept sub-registers. There is no point
to have LD1/2/4 (unsigned) for 64-bit register, because on these arches,
upper 32-bits are guaranteed to be zeroed by hardware or VM, so load into
the smallest available register class is the best choice for maintaining
type information.

For eBPF we should adopt the same philosophy, to change current
format (A):

  r = *(u8 *) (r + off) // BPF_LDX | BPF_B
  r = *(u16 *)(r + off) // BPF_LDX | BPF_H
  r = *(u32 *)(r + off) // BPF_LDX | BPF_W

  *(u8 *) (r + off) = r // BPF_STX | BPF_B
  *(u16 *)(r + off) = r // BPF_STX | BPF_H
  *(u32 *)(r + off) = r // BPF_STX | BPF_W

into B:

  w = *(u8 *) (r + off) // BPF_LDX | BPF_B
  w = *(u16 *)(r + off) // BPF_LDX | BPF_H
  w = *(u32 *)(r + off) // BPF_LDX | BPF_W

  *(u8 *) (r + off) = w // BPF_STX | BPF_B
  *(u16 *)(r + off) = w // BPF_STX | BPF_H
  *(u32 *)(r + off) = w // BPF_STX | BPF_W

There is no change on encoding nor how should they be interpreted,
everything is as it is, load the specified length, write into low bits of
the register then zeroing all remaining high bits.

The only change is their associated register class and how compiler view
them.

Format A still need to be kept, because eBPF LLVM backend doesn't support
sub-registers at default, but once 32-bit subregister is enabled, it should
use format B.

This patch implemented this together with all those necessary extended load
and truncated store patterns.

Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
Reviewed-by: Yonghong Song <yhs@fb.com>
llvm-svn: 325987
2018-02-23 23:49:28 +00:00
Yonghong Song 63cf273f55 bpf: Support i32 in getScalarShiftAmountTy method
getScalarShiftAmount method should be implemented for eBPF backend to make
sure shift amount could still get correct type once 32-bit subregisters
support are enabled.

Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
Reviewed-by: Yonghong Song <yhs@fb.com>
llvm-svn: 325986
2018-02-23 23:49:26 +00:00
Yonghong Song 59fc805c7e bpf: Support condition comparison on i32
We need to support condition comparison on i32. All these comparisons are
supposed to be combined into BPF_J* instructions which only support i64.

For ISD::BR_CC we need to promote it to i64 first, then do custom lowering.

For ISD::SET_CC, just expand to SELECT_CC like what's been done for i64.

For ISD::SELECT_CC, we also want to do custom lower for i32. However, after
32-bit subregister support enabled, it is possible the comparison operands
are i32 while the selected value are i64, or the comparison operands are
i64 while the selected value are i32. We need to define extra instruction
pattern and support them in custom instruction inserter.

Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
Reviewed-by: Yonghong Song <yhs@fb.com>
llvm-svn: 325985
2018-02-23 23:49:25 +00:00
Yonghong Song 219156cff0 bpf: Handle i32 for ALU operations without ISA support
There is no eBPF ISA support for BSWAP, ROTR, ROTL, SREM, SDIVREM, MULHU,
ADDC, ADDE etc on i32.

They could be emulated by other basic BPF_ALU operations, we'd set their
lowering action the same as i64.

Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
Reviewed-by: Yonghong Song <yhs@fb.com>
llvm-svn: 325984
2018-02-23 23:49:24 +00:00
Yonghong Song 07a7a41753 bpf: New calling convention for 32-bit subregisters
This patch add new calling conventions to allow GPR32RegClass as valid
register class for arguments and return types.

New calling convention will only be choosen when -mattr=+alu32 specified.

Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
Reviewed-by: Yonghong Song <yhs@fb.com>
llvm-svn: 325983
2018-02-23 23:49:23 +00:00
Yonghong Song 42389377d8 bpf: New target attribute "alu32" for 32-bit subregister support
This new attribute aims to control the enablement of 32-bit subregister
support on eBPF backend.

Name the interface as "alu32" is because we in particular want to enable
the generation of BPF_ALU32 instructions by enable subregister support.

This attribute could be used in the following format with llc:

  llc -mtriple=bpf -mattr=[+|-]alu32

It is disabled at default.

Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
Reviewed-by: Yonghong Song <yhs@fb.com>
llvm-svn: 325982
2018-02-23 23:49:22 +00:00
Yonghong Song 0252f35362 bpf: Define instruction patterns for extensions and truncations between i32 to i64
For transformations between i32 and i64, if it is explicit signed extension:
  - first cast the operand to i64
  - then use SLL + SRA to finish the extension.

if it is explicit zero extension:
  - first cast the operand to i64
  - then use SLL + SRL to finish the extension.

if it is explicit any extension:
  - just refer to 64-bit register.

if it is explicit truncation:
  - just refer to 32-bit subregister.

NOTE: Some of the zero extension sequences might be unnecessary, they will be
removed by an peephole pass on MachineInstruction layer.

Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
Reviewed-by: Yonghong Song <yhs@fb.com>
llvm-svn: 325981
2018-02-23 23:49:21 +00:00
Yonghong Song 3a564a8f6e bpf: Tighten the immediate predication for 32-bit alu instructions
These 32-bit ALU insn patterns which takes immediate as one operand were
initially added to enable AsmParser support, and the AsmMatcher uses "ins"
and "outs" fields to deduct the operand constraint.

However, the instruction selector doesn't work the same as AsmMatcher. The
selector will use the "pattern" field for which we are not setting the
predication for immediate operands correctly.

Without this patch, i32 would eventually means all i32 operands are valid,
both imm and gpr, while these patterns should allow imm only.

Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
Reviewed-by: Yonghong Song <yhs@fb.com>
llvm-svn: 325980
2018-02-23 23:49:19 +00:00
Yonghong Song ec84e2f1b0 bpf: Use markSuperRegs to mark reserved registers
markSuperRegs is the canonical helper function used to mark reserved
registers. It could mark any overlapping sub-registers automatically.

Reviewed-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
llvm-svn: 325979
2018-02-23 23:49:18 +00:00
Nemanja Ivanovic bcc82c9a78 [PowerPC] Disable shrink-wrapping when getting PC address through the LR
The instruction sequence used to get the address of the PC into a GPR requires
that we clobber the link register. Doing so without having first saved it in
the prologue leaves the function unable to return. Currently, this sequence is
emitted into the entry block. To ensure the prologue is inserted before this
sequence, disable shrink-wrapping.

This fixes PR33547.

Differential Revision: https://reviews.llvm.org/D43677

llvm-svn: 325972
2018-02-23 23:08:34 +00:00
Eric Christopher a70ec1308a Sink the verification code around the assert where it's handled and wrap in NDEBUG.
This has the advantage of making release only builds more warning
free and there's no need to make this routine a class function if
it isn't using class members anyhow.

llvm-svn: 325967
2018-02-23 22:32:05 +00:00
Sriraman Tallam 609f8c013c Intrinsics calls should avoid the PLT when "RtLibUseGOT" metadata is present.
Differential Revision: https://reviews.llvm.org/D42216

llvm-svn: 325962
2018-02-23 21:32:06 +00:00
Craig Topper 16b20245ba [X86] Add assembler/disassembler support for blendm with zero masking and broacast.
Fixes PR31617

llvm-svn: 325957
2018-02-23 20:48:44 +00:00
Stefan Pintilie 626b651016 [Power9] Add missing instructions to the Power 9 scheduler
This is the first in a series of patches that will define more
instructions using InstRW so that we can move away from ItinRW
and ultimately have a complete Power 9 scheduler.

Differential Revision: https://reviews.llvm.org/D43635

llvm-svn: 325956
2018-02-23 20:37:10 +00:00
Krzysztof Parzyszek 96690ceceb [Hexagon] Recognize non-immediate constants in HexagonConstPropagation
llvm-svn: 325954
2018-02-23 20:33:26 +00:00
Simon Pilgrim 69b8fa8391 Fixed unused variable warning. NFCI.
llvm-svn: 325950
2018-02-23 20:16:18 +00:00
Craig Topper 61d6ddbf0a [X86] Add DAG combine to remove (and X, 1) from in front of a v1i1 scalar to vector.
These can be created by type legalization promoting the inputs to select to match scalar boolean contents.

We were trying to pattern match them away during isel, but its better to just remove them from the DAG.

I've cleaned up some patterns to not check for this 'and' anymore. But I suspect this has also opened up opportunities for pattern removal.

llvm-svn: 325949
2018-02-23 20:13:42 +00:00
Benjamin Kramer ae87f86ec4 [WebAssembly] Fix macro metaprogram to not duplicate code as much.
No functionality change intended.

llvm-svn: 325947
2018-02-23 20:13:03 +00:00
Simon Pilgrim 425965be0f [X86][SSE] Generalize x > C-1 ? x+-C : 0 --> subus x, C combine for non-uniform constants
llvm-svn: 325944
2018-02-23 19:58:44 +00:00
Evandro Menezes 1afffac05b [PATCH] [AArch64] Add new target feature to fuse conditional select
This feature enables the fusion of the comparison and the conditional select
instructions together.

Differential revision: https://reviews.llvm.org/D42392

llvm-svn: 325939
2018-02-23 19:27:43 +00:00
Geoff Berry d6ba3dbbbd Fix compiler warning introduced in r325931. NFC.
llvm-svn: 325938
2018-02-23 19:11:33 +00:00
Craig Topper 11704dcc72 [X86] Custom split v32i16/v64i8 bitcasts when AVX512F is available, but BWI is not.
The test changes you can see are related to the changes in ReplaceNodeResults. Though shuffle-vs-trunc-512.ll does have a test that exercises the code in LowerBITCAST. Looks like the test output didn't change because DAG combining is able to clean up the resulting type legalization. Adding the custom hook just makes type legalization work less hard.

Differential Revision: https://reviews.llvm.org/D43447

llvm-svn: 325933
2018-02-23 18:43:36 +00:00
Geoff Berry f8bf2ec0a8 [MachineOperand][Target] MachineOperand::isRenamable semantics changes
Summary:
Add a target option AllowRegisterRenaming that is used to opt in to
post-register-allocation renaming of registers.  This is set to 0 by
default, which causes the hasExtraSrcRegAllocReq/hasExtraDstRegAllocReq
fields of all opcodes to be set to 1, causing
MachineOperand::isRenamable to always return false.

Set the AllowRegisterRenaming flag to 1 for all in-tree targets that
have lit tests that were effected by enabling COPY forwarding in
MachineCopyPropagation (AArch64, AMDGPU, ARM, Hexagon, Mips, PowerPC,
RISCV, Sparc, SystemZ and X86).

Add some more comments describing the semantics of the
MachineOperand::isRenamable function and how it is set and maintained.

Change isRenamable to check the operand's opcode
hasExtraSrcRegAllocReq/hasExtraDstRegAllocReq bit directly instead of
relying on it being consistently reflected in the IsRenamable bit
setting.

Clear the IsRenamable bit when changing an operand's register value.

Remove target code that was clearing the IsRenamable bit when changing
registers/opcodes now that this is done conservatively by default.

Change setting of hasExtraSrcRegAllocReq in AMDGPU target to be done in
one place covering all opcodes that have constant pipe read limit
restrictions.

Reviewers: qcolombet, MatzeB

Subscribers: aemerson, arsenm, jyknight, mcrosier, sdardis, nhaehnle, javed.absar, tpr, arichardson, kristof.beyls, kbarton, fedor.sergeev, asb, rbar, johnrusso, simoncook, jordy.potman.lists, apazos, sabuasal, niosHD, escha, nemanjai, llvm-commits

Differential Revision: https://reviews.llvm.org/D43042

llvm-svn: 325931
2018-02-23 18:25:08 +00:00
Stefan Pintilie 15e6b10ee0 [PowerPC] Code cleanup. Remove instructions that were withdrawn from Power 9.
The following set of instructions was originally planned to be added for Power 9
and so code was added to support them. However, a decision was made later on to
withdraw support for these instructions in the hardware.
xscmpnedp
xvcmpnesp
xvcmpnedp
This patch removes support for the instructions that were not added.

Differential Revision: https://reviews.llvm.org/D43641

llvm-svn: 325918
2018-02-23 15:55:16 +00:00
Petar Jovanovic a7bd36e63e [mips] finish removal of unused fields in MipsInstructionSelector
r325916 missed to remove calls in constructor.

llvm-svn: 325917
2018-02-23 15:47:05 +00:00
Petar Jovanovic f49c5ce3a6 [mips] remove unused fields in MipsInstructionSelector
Unused fields cause buildbreak if -Werror,-Wunused-private-field is passed.

llvm-svn: 325916
2018-02-23 15:34:02 +00:00
Hans Wennborg 89c35fc44d Support for the mno-stack-arg-probe flag
Adds support for this flag. There is also another piece for clang
(separate review). More info:
https://bugs.llvm.org/show_bug.cgi?id=36221

By Ruslan Nikolaev!

Differential Revision: https://reviews.llvm.org/D43107

llvm-svn: 325900
2018-02-23 13:46:25 +00:00
Amaury Sechet 893a6b89ff [DAGCOmbine] Ensure that (brcond (setcc ...)) is handled in a canonical manner.
Summary:
There are transformation that change setcc into other constructs, and transform that try to reconstruct a setcc from the brcond condition. Depending on what order these transform are done, the end result differs.

Most of the time, it is preferable to get a setcc as a brcond argument (and this is why brcond try to recreate the setcc in the first place) so we ensure this is done every time by also doing it at the setcc level when the only user is a brcond.

Reviewers: spatel, hfinkel, niravd, craig.topper

Subscribers: nhaehnle, llvm-commits

Differential Revision: https://reviews.llvm.org/D41235

llvm-svn: 325892
2018-02-23 11:50:42 +00:00
Petar Jovanovic fac93e28f0 [MIPS GlobalISel] Adding GlobalISel
Add GlobalISel infrastructure up to the point where we can select a ret
void.

Patch by Petar Avramovic.

Differential Revision: https://reviews.llvm.org/D43583

llvm-svn: 325888
2018-02-23 11:06:40 +00:00
Nicolai Haehnle 6cf306deca AMDGPU: Track physreg uses in SILoadStoreOptimizer
Summary:
This handles def-after-use of physregs, and allows us to merge loads and
stores even across some physreg defs (typically M0 defs).

Change-Id: I076484b2bda27c2cf46013c845a0380c5b89b67b

Reviewers: arsenm, mareko, rampitec

Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D42647

llvm-svn: 325882
2018-02-23 10:45:56 +00:00
Jonas Paulsson 07d6aea61a [Mips] Return true in enableMultipleCopyHints().
Enable multiple COPY hints to eliminate more COPYs during register allocation.

Note that this is something all targets should do, see
https://reviews.llvm.org/D38128.

Review: Simon Dardis
llvm-svn: 325870
2018-02-23 08:30:15 +00:00
Sam Clegg 6c899ba6de [WebAssembly] Add first claass symbol table to wasm objects
This is combination of two patches by Nicholas Wilson:
  1. https://reviews.llvm.org/D41954
  2. https://reviews.llvm.org/D42495

Along with a few local modifications:
- One change I made was to add the UNDEFINED bit to the binary format
  to avoid the extra byte used when writing data symbols.  Although this
  bit is redundant for other symbols types (i.e. undefined can be
  implied if a function or global is a wasm import)
- I prefer to be explicit and consistent and not have derived flags.
- Some field renaming.
- Some reverting of unrelated minor changes.
- No test output differences.

Differential Revision: https://reviews.llvm.org/D43147

llvm-svn: 325860
2018-02-23 05:08:34 +00:00
Richard Smith ade53736b0 Revert r325128 ("[X86] Reduce Store Forward Block issues in HW").
This is causing miscompiles in some situations. See the llvm-commits thread for the commit for details.

llvm-svn: 325852
2018-02-23 01:43:46 +00:00
Craig Topper 0dcc88a500 [X86] Turn setne X, signedmax into setgt signedmax, X in LowerVSETCC to avoid an invert
We won't be able to fold the constant pool load, but its still better than materialing ones and xoring for the invert if we used PCMPEQ.

This will fix another regression from D42948.

llvm-svn: 325845
2018-02-23 00:21:39 +00:00
Evandro Menezes 5c986b010b [AArch64] Refactor macro fusion (NFC)
Move checks for each fusion case into separate functions for better
legibility and maintainability.

Differential revision: https://reviews.llvm.org/D43649

llvm-svn: 325844
2018-02-23 00:14:39 +00:00
Rafael Espindola ba02f3f242 Fix grammar. NFC.
Thank to Eric Christopher for noticing.

llvm-svn: 325842
2018-02-22 23:59:46 +00:00
Craig Topper d2fab30827 [X86] Turn setne X, signedmin into setgt X, signedmin in LowerVSETCC to avoid an invert
This will fix one of the regressions from D42948.

Differential Revision: https://reviews.llvm.org/D43531

llvm-svn: 325840
2018-02-22 23:46:28 +00:00
Benjamin Kramer a01e97d748 Fix the build of the wasm backend.
toString conflicts with llvm::toString here. Yay for overly generic
function names.

llvm-svn: 325833
2018-02-22 22:29:27 +00:00
Craig Topper 1aed540ea2 [X86] Make the subus special case in LowerVSETCC self contained
Previously this code overrode the flags and opcode used by the later code in LowerVSETCC. This makes the code difficult to read and follow.

This patch moves all the SUBUS code into its own function and makes it responsible for creating its own SDNodes on success.

Differential Revision: https://reviews.llvm.org/D43530

llvm-svn: 325827
2018-02-22 20:24:18 +00:00
Nicolai Haehnle 40b140fef1 AMDGPU: Stop using .NAME in .td files
Summary:
.NAME is a bit of an odd duck, in that we should really treat it like
a template argument, but we currently don't, and so when and where
NAME is initialized and how is pretty inconsistent. Best to just avoid
using it as a field of already instantiated records, and use cast to
string instead.

Change-Id: I5a0c202401cede3d5c3827ab9c7858ea48b29108

Reviewers: arsenm, rampitec

Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D43551

llvm-svn: 325794
2018-02-22 15:25:11 +00:00
Shiva Chen 7c17242b92 [RISCV] Implement c.lui immediate operand constraint
Implement c.lui immediate constraint to [1, 31] and [0xfffe0, 0xfffff].
The RISC-V ISA describes the constraint as [1, 63], with that value
being loaded in to bits 17-12 of the destination register and sign extended
from bit 17. Therefore, this 6-bit immediate can represent values in the
ranges [1, 31] and [0xfffe0, 0xfffff].

Differential Revision: https://reviews.llvm.org/D42834

llvm-svn: 325792
2018-02-22 15:02:28 +00:00
Stefan Maksimovic ed797a3049 [mips] Generate memory dependencies for byVal arguments
There were no memory dependencies made between stores generated
when lowering formal arguments and loads generated when
call lowering byVal arguments which made the Post-RA scheduler
place a load before a matching store.

Make the fixed object stored to mutable so that the load
instructions can have their memory dependencies added

Set the frame object as isAliased which clears the underlying
objects vector in ScheduleDAGInstrs::buildSchedGraph().
This results in addition of all stores as dependenies for loads.

This problem appeared when passing a byVal parameter
coupled with a fastcc function call.

Differential Revision: https://reviews.llvm.org/D37515

llvm-svn: 325782
2018-02-22 13:40:42 +00:00
Alex Bradbury 8d8d0a733f [RISCV][NFC] Make logic in RISCVMCCodeEmitter::getImmOpValue more defensive
As pointed out by @sabuasal in a comment on D23568, the logic in  
RISCVMCCodeEmitter::getImmOpValue could be more defensive. Although with the  
current instruction definitions it is always the case that `VK_RISCV_LO` is  
always used with either an I- or S-format instruction, this may not always be  
the case in the future. Add a check to ensure we will get an assertion in  
debug builds if that changes.

llvm-svn: 325775
2018-02-22 13:24:25 +00:00
Sjoerd Meijer d31a8c0595 Recommit: [ARM] f16 constant pool fix
This recommits r325754; the modified and failing test case
actually didn't need any modifications.

llvm-svn: 325765
2018-02-22 10:43:57 +00:00
David Green 01e0f25a9f [ARM] Fix issue with large xor constants.
Fixup to rL325573 for large xor constants.

Thanks to Eli Friedman for the catch.

Differential revision: https://reviews.llvm.org/D43549

llvm-svn: 325761
2018-02-22 09:38:57 +00:00
Sjoerd Meijer 9a25247f80 Revert r325754 and r325755 (f16 literal pool) because buildbots were unhappy.
llvm-svn: 325756
2018-02-22 08:41:55 +00:00
Sjoerd Meijer 7d5909eb0f [ARM] f16 constant pool fix
This is a follow up of r325012, that allowed half types in constant pools.
Proper alignment was enforced when a big basic block was split up, but not when
a CPE was placed before/after a block; the successor block had the wrong
alignment.

Differential Revision: https://reviews.llvm.org/D43580

llvm-svn: 325754
2018-02-22 08:16:05 +00:00
Hiroshi Inoue 7f9f92f8b6 [NFC] fix trivial typos in comments
"a a" -> "a"

llvm-svn: 325752
2018-02-22 07:48:29 +00:00
Nemanja Ivanovic e54a9ee8ac [PowerPC] Do not produce invalid CTR loop with an FRem
An FRem instruction inside a loop should prevent the loop from being converted
into a CTR loop since this is not an operation that is legal on any PPC
subtarget. This will always be a call to a library function which means the
loop will be invalid if this instruction is in the body.

Fixes PR36292.

llvm-svn: 325739
2018-02-22 03:02:41 +00:00
Simon Pilgrim 55b7e01116 [X86][MMX] Generlize MMX_MOVD64rr combines to accept v4i16/v8i8 build vectors as well as v2i32
Also handle both cases where the lower 32-bits of the MMX is undef or zero extended.

llvm-svn: 325736
2018-02-21 23:07:30 +00:00
Yonghong Song 9fdd139b41 bpf: disable DwarfUsesRelocationsAcrossSections
The pahole does not work with BPF backend properly:

  -bash-4.2$ cat test.c
  struct test_t {
    int a;
    int b;
  };
  int test(struct test_t *s) {
    return s->a;
  }
  -bash-4.2$ clang -g -O2 -target bpf -c test.c
  -bash-4.2$ pahole test.o
  struct clang version 7.0.0 (trunk 325446) (llvm/trunk 325464) {
          clang version 7.0.0 (trunk 325446) (llvm/trunk 325464) clang version 7.0.0 (trunk 325446) (llvm/trunk 325464); /*     0     4 */
          clang version 7.0.0 (trunk 325446) (llvm/trunk 325464) clang version 7.0.0 (trunk 325446) (llvm/trunk 325464); /*     4     4 */

          /* size: 8, cachelines: 1, members: 2 */
          /* last cacheline: 8 bytes */
  };
  -bash-4.2$

The reason is that BPF backend is not yet implemented in elfutils backend
  https://github.com/threatstack/elfutils/tree/master/backends
and pahole depends on elfutils for dwarf parsing and resolving relocation.

More specifically, the unsupported relocation in .debug_info for type/member name
against symbol table caused the incorrect result above. The following is
the raw .rel.debug_info for the above example,
  Hex dump of section '.rel.debug_info':
    0x00000000 06000000 00000000 0a000000 0b000000 ................
    0x00000010 0c000000 00000000 0a000000 01000000 ................
    0x00000020 12000000 00000000 0a000000 02000000 ................
    0x00000030 16000000 00000000 0a000000 0e000000 ................
    0x00000040 1a000000 00000000 0a000000 03000000 ................
               ----------------- -------- --------
                reloc location     type   symtab index

  Hex dump of section '.debug_info':
    0x00000000 7b000000 04000000 00000801 00000000 {...............
    0x00000010 0c000000 00000000 00000000 00000000 ................
    0x00000020 00000000 00001000 00000200 00000000 ................

Based on "type", the proper value will be extracted from symbol table
and filled in .debug_info so later on .debug_info can be properly
resolved against debug strings.

There are two ways to fix this problem. One is to fix elfutils by adding
BPF support which is desirable. This could take a long time and won't work
with already deployed pahole. For a short term workaround, we can disable
dwarf cross-section relation which specifically avoids debug_info and
symbol table cross relocation. This should help any dwarf-related tool
which has not implement BPF specific relocations yet.

Now .rel.debug_info does not have any relocation for symbol table and
.debug_info itself contains necessary relocation information by itself.
  Hex dump of section '.debug_info':
    0x00000000 7b000000 04000000 00000801 00000000 {...............
    0x00000010 0c003700 00000000 00003e00 00000000 ..7.......>.....
    0x00000020 00000000 00001000 00000200 00000000 ................
  location 0xc has 0, 0x12 has 0x37, 0x1a has 0x3e in place which
  will be used in relocation resolution. Here, the values of 0, 0x37 and 0x3e
  are offset in .debug_str section.
Please note the difference between two above .debug_info dumps.

With the fix, pahole works properly with BPF backend:
  -bash-4.2$ clang -O2 -g -target bpf -c test.c
  -bash-4.2$ pahole test.o
  struct test_t {
          int                        a;                    /*     0     4 */
          int                        b;                    /*     4     4 */

          /* size: 8, cachelines: 1, members: 2 */
          /* last cacheline: 8 bytes */
  };

Signed-off-by: Yonghong Song <yhs@fb.com>
llvm-svn: 325735
2018-02-21 22:59:14 +00:00
Tobias Edler von Koch ba7a1f08da [Hexagon] Add TargetRegisterInfo::getPointerRegClass() override
llvm-svn: 325731
2018-02-21 22:27:07 +00:00
Jonas Paulsson 77cdf3881c [Hexagon] Return true in enableMultipleCopyHints().
Enable multiple COPY hints to eliminate more COPYs during register allocation.

Note that this is something all targets should do, see
https://reviews.llvm.org/D38128.

Review: Krzysztof Parzyszek
llvm-svn: 325697
2018-02-21 16:37:45 +00:00
Simon Pilgrim 82d33b7c44 [X86] LowerBITCAST - pull out repeated calls to getOperand(0). NFCI.
llvm-svn: 325695
2018-02-21 16:35:40 +00:00
Jonas Devlieghere e0af7c390d [Sparc] Include __tls_get_addr in symbol table for TLS calls to it
Global Dynamic and Local Dynamic call relocations only implicitly
reference __tls_get_addr; there is no connection in the ELF file between
the relocations and the symbol other than the specification for the
relocations' semantics. However, it still needs to be in the symbol
table despite the lack of explicit references to the symbol table entry,
since it needs to be bound at link time for these relocations, otherwise
any objects will fail to link.

For details, see https://sourceware.org/bugzilla/show_bug.cgi?id=22832.

Path by: James Clarke (jrtc27)

Differential revision: https://reviews.llvm.org/D43271

llvm-svn: 325688
2018-02-21 15:25:26 +00:00
Nicolai Haehnle 770397f4cd AMDGPU: Do not combine loads/store across physreg defs
Summary:
Since this pass operates on machine SSA form, this should only really
affect M0 in practice.

Fixes various piglit variable-indexing/vs-varying-array-mat4-index-*

Change-Id: Ib2a1dc3a8d7b08225a8da49a86f533faa0986aa8
Fixes: r317751 ("AMDGPU: Merge S_BUFFER_LOAD_DWORD_IMM into x2, x4")

Reviewers: arsenm, mareko, rampitec

Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D40343

llvm-svn: 325677
2018-02-21 13:31:35 +00:00
Dmitry Preobrazhensky d6e1a9404d [AMDGPU][MC] Added lds support for MUBUF instructions
See bug 28234: https://bugs.llvm.org/show_bug.cgi?id=28234

Differential Revision: https://reviews.llvm.org/D43472

Reviewers: vpykhtin, artem.tamazov, arsenm
llvm-svn: 325676
2018-02-21 13:13:48 +00:00
Craig Topper d710adac2d [X86] Disable CLWB for Cannon Lake
Cannon Lake does not support CLWB, therefore it
does not include all features listed under SKX anymore.

Instead, enumerate all SKX features with the exception of CLWB.

Patch by Gabor Buella

Differential Revision: https://reviews.llvm.org/D43380

llvm-svn: 325654
2018-02-21 00:15:48 +00:00
Simon Dardis 7bc8ad5849 [mips] Spectre variant two mitigation for MIPSR2
This patch provides mitigation for CVE-2017-5715, Spectre variant two,
which affects the P5600 and P6600. It implements the LLVM part of
-mindirect-jump=hazard. It is _not_ enabled by default for the P5600.

The migitation strategy suggested by MIPS for these processors is to use
hazard barrier instructions. 'jalr.hb' and 'jr.hb' are hazard
barrier variants of the 'jalr' and 'jr' instructions respectively.

These instructions impede the execution of instruction stream until
architecturally defined hazards (changes to the instruction stream,
privileged registers which may affect execution) are cleared. These
instructions in MIPS' designs are not speculated past.

These instructions are used with the attribute +use-indirect-jump-hazard
when branching indirectly and for indirect function calls.

These instructions are defined by the MIPS32R2 ISA, so this mitigation
method is not compatible with processors which implement an earlier
revision of the MIPS ISA.

Performance benchmarking of this option with -fpic and lld using
-z hazardplt shows a difference of overall 10%~ time increase
for the LLVM testsuite. Certain benchmarks such as methcall show a
substantially larger increase in time due to their nature.

Reviewers: atanasyan, zoran.jovanovic

Differential Revision: https://reviews.llvm.org/D43486

llvm-svn: 325653
2018-02-21 00:06:53 +00:00
Konstantin Zhuravlyov 5c1237a1fd Revert "[AMDGPU] Increased vector length for global/constant loads."
https://reviews.llvm.org/rL325518

It breaks following OpenCL conformance tests:
  - Basic - parameter_types
  - Basic - vload_private

llvm-svn: 325643
2018-02-20 23:30:21 +00:00
Evandro Menezes 72f3983633 [AArch64] Refactor instructions using SIMD immediates
Get rid of icky goto loops and make the code easier to maintain.  Otherwise,
NFC.

Restore r324903 and fix PR36369.

Differentail revision: https://reviews.llvm.org/D43364

llvm-svn: 325621
2018-02-20 20:31:45 +00:00
Sjoerd Meijer 4d5c40492a [ARM] Lower BR_CC for f16
This case wasn't handled yet.

Differential Revision: https://reviews.llvm.org/D43508

llvm-svn: 325616
2018-02-20 19:28:05 +00:00
Krzysztof Parzyszek f9f2005f94 [Hexagon] Handle *Low8 register classes in early if-conversion
llvm-svn: 325606
2018-02-20 18:19:17 +00:00
Craig Topper df0c22fcd3 [X86] Correct SHRUNKBLEND creation to work correctly when there are multiple uses of the condition.
SimplifyDemandedBits forces the demanded mask to all 1s if the node has multiple uses, unless the AssumeSingleUse flag is set.

So previously we were only really likely to simplify something if the condition had a single use. And on the off chance we did simplify with multiple uses the demanded mask being used was all ones so there was no reason to create a shrunkblend.

This patch now checks that the condition is only used by selects first, and then sets the AssumeSingleUse flag for the simplifcation. Then we convert the selects to shrunkblend, and finally replace condition.

Differential Revision: https://reviews.llvm.org/D43446

llvm-svn: 325604
2018-02-20 17:58:17 +00:00
Craig Topper 010ae8dcbb [X86] Promote 16-bit cmovs to 32-bits
This allows us to avoid an opsize prefix. And forcing some move immediates to i32 avoids a length changing prefix on those instructions.

This mostly replaces the existing combine we had for zext/sext+cmov of constants. I left in a case for sign extending a 32 bit cmov of constants to 64 bits.

Differential Revision: https://reviews.llvm.org/D43327

llvm-svn: 325601
2018-02-20 17:41:00 +00:00
Simon Dardis d3860e6670 [mips] Correct the definition of cvt.d.w
An upcoming patch D41434, changes the ordering of the matcher table
for assembly. This patch corrects the definition of the normal MIPS
cvt.d.w not to be available in microMIPS.

llvm-svn: 325589
2018-02-20 15:55:17 +00:00
Lei Huang dfd41552f4 [PowerPC] Reduce stack frame for fastcc functions by only allocating parameter save area when needed
Current implementation always allocates the parameter save area conservatively
for fastcc functions. There is no reason to allocate the parameter save area if
all the parameters can be passed via registers.

Differential Revision: https://reviews.llvm.org/D42602

llvm-svn: 325581
2018-02-20 15:09:45 +00:00
Krzysztof Parzyszek b404fae9e3 [Hexagon] Fix alignment calculation of stack objects in Hexagon bit tracker
llvm-svn: 325580
2018-02-20 14:29:43 +00:00
David Green 056476497e [ARM] Mark -1 as cheap in xor's for thumb1
We can always convert xor %a, -1 into MVN, even in thumb 1 where the -1
would not otherwise be considered a cheap constant. This prevents the
-1's from being pulled out into constants and potentially hoisted.

Differential Revision: https://reviews.llvm.org/D43451

llvm-svn: 325573
2018-02-20 11:07:35 +00:00
George Rimar da4f43a4b4 [llvm-mc] - Produce R_X86_64_PLT32 for "call/jmp foo".
For instructions like call foo and jmp foo patch changes
relocation produced from R_X86_64_PC32 to R_X86_64_PLT32.
Relocation can be used as a marker for 32-bit PC-relative branches.
Linker will reduce PLT32 relocation to PC32 if function is defined locally.

Differential revision: https://reviews.llvm.org/D43383

llvm-svn: 325569
2018-02-20 10:17:57 +00:00
Tim Renouf 8234b4893a [AMDGPU] stop buffer_store being moved illegally
Summary:
The machine instruction scheduler was illegally moving a buffer store
past a buffer load with the same descriptor and offset. Fixed by marking
buffer ops as mayAlias and isAliased. This may be overly conservative,
and we may need to revisit.

Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D43332

Change-Id: Iff3173d9e0653e830474546276ab9d30318b8ef7
llvm-svn: 325567
2018-02-20 10:03:38 +00:00
Craig Topper 9256ac1a58 [X86] Add 512-bit unmasked pmulhrsw/pmulhw/pmulhuw intrinsics. Remove and auto upgrade 128/256/512 bit masked pmulhrsw/pmulhw/pmulhuw intrinsics.
The 128 and 256 bit versions were already not used by clang. This adds an equivalent unmasked 512 bit version. Then autoupgrades all sizes to use unmasked intrinsics plus select.

llvm-svn: 325559
2018-02-20 07:28:14 +00:00
Amara Emerson db211892ed [AArch64][GlobalISel] When copying from a gpr32 to an fpr16 reg, convert to fpr32 first.
This is a follow on commit to r[x] where we fix the other direction of copy.
For this case, after converting the source from gpr32 -> fpr32, we use a
subregister copy, which is essentially what EXTRACT_SUBREG does in SDAG land.

https://reviews.llvm.org/D43444

llvm-svn: 325550
2018-02-20 05:11:57 +00:00
Craig Topper a05ed17316 [X86] Make XOP VPCOM instructions commutable to fold loads during isel.
llvm-svn: 325547
2018-02-20 03:58:13 +00:00
Craig Topper 9b64bf54b9 [X86] Make a helper function for commuting AVX512 VPCMP immediates since we do it in two places.
llvm-svn: 325546
2018-02-20 03:58:11 +00:00
Craig Topper b195ed8ce3 [X86] Use vpmovq2m/vpmovd2m for truncate to vXi1 when possible.
Previously we used vptestmd, but the scheduling data for SKX says vpmovq2m/vpmovd2m is lower latency. We already used vpmovb2m/vpmovw2m for byte/word truncates. So this is more consistent anyway.

llvm-svn: 325534
2018-02-19 22:07:31 +00:00
Craig Topper e60f1472f1 [X86] Stop swapping the operands of AVX512 setge.
We swapped the operands and used setle, but I don't see any reason to do that. I think this is a holdover from SSE where we swap and the invert to use pcmpgt. But with AVX512 we don't want an invert so we won't use pcmpgt. So there's no need to swap.

llvm-svn: 325527
2018-02-19 19:23:35 +00:00
Craig Topper 9471a7c898 [X86] Reduce the number of isel pattern variations needed for VPTESTM/VPTESTNM matching.
Canonicalize EQ/NE PCMPM to have build vector all zeros on the RHS so we don't have to pattern match it in both locations. This significantly reduces the number of isel patterns needed since we also had to multiply it out with loads being in either operand of the 'and' input node and in the 'and' masking node.

This removes over 24000 bytes from the isel table.

llvm-svn: 325526
2018-02-19 19:23:31 +00:00
Mark Searles 65207923f6 [AMDGPU] Make note of existing waitcnt instrs; this is add-on work related to suppression of redundant waitcnt instrs. It is necessary to make note of these existing waitcnt instrs so that we do not fall into an infinite loop when handling loops. Also, [NFC] some minor code clean-up.
llvm-svn: 325524
2018-02-19 19:19:59 +00:00
Mark Searles 419bdab759 [AMDGPU] Increased vector length for global/constant loads.
Summary: GCN ISA supports instructions that can read 16 consecutive dwords from memory through the scalar data cache; loadstoreVectorizer should take advantage of the wider vector length and pack 16/8 elements of dwords/quadwords.

Author: FarhanaAleen

Reviewed By: rampitec

Subscribers: llvm-commits, AMDGPU

Differential Revision: https://reviews.llvm.org/D43275

llvm-svn: 325518
2018-02-19 16:42:49 +00:00
Rafael Espindola c7e51805ff Bring back r323297.
It was reverted because it broke the grub build. The reason the grub
build broke is because grub does its own relocation processing and was
not handing R_386_PLT32. Since grub has no dynamic linker, the fix is
trivial: handle R_386_PLT32 exactly like R_386_PC32.

On the report it was noted that they are using
-fno-integrated-assembler. The upstream GAS (starting with
451875b4f976a527395e9303224c7881b65e12ed) will already be producing a
R_386_PLT32 anyway, so they have to update their code one way or the
other

Original message:

Don't assume a null GV is local for ELF and MachO.

This is already a simplification, and should help with avoiding a plt
reference when calling an intrinsic with -fno-plt.

With this change we return false for null GVs, so the caller only
needs to check the new metadata to decide if it should use foo@plt or
*foo@got.

llvm-svn: 325514
2018-02-19 16:02:38 +00:00
Francis Visoiu Mistrih 68ced40a23 Revert "[CodeGen] Move printing '\n' from MachineInstr::print to MachineBasicBlock::print"
This reverts commit r324681.

llvm-svn: 325505
2018-02-19 15:08:49 +00:00
Simon Pilgrim c302a581a0 [X86][SSE] combineTruncateWithSat - use truncateVectorWithPACK down to 64-bit subvectors
Add support for chaining PACKSS/PACKUS down to 64-bit vectors by using only a single 128-bit input.

llvm-svn: 325494
2018-02-19 13:29:20 +00:00
Dylan McKay 9a2a996c1c [AVR] Set the program address space in the data layout
This adds the program memory address space setting to the AVR data
layout.

This setting was very recently added under r325479.

At the moment, there are no uses of this setting. In the future, things
such as switch lookup tables should reside there.

llvm-svn: 325481
2018-02-19 10:40:59 +00:00
Dylan McKay 05d3e41076 [AVR] Fix a lowering bug in AVRISelLowering.cpp
The parseFunctionArgs() method was directly reading the
arguments from a Function object, but is should have used the
arguments supplied by the SelectionDAGBuilder.

This was causing
the lowering code to only lower one argument, not two in some cases.

Thanks to @brainlag on GitHub for coming up with the working fix!

Patch-by: @brainlag on GitHub
llvm-svn: 325474
2018-02-19 08:28:38 +00:00
Eric Christopher 8fad26e5f3 Add LanaiMCTargetDesc.h to LanaiInstrInfo.h to make it self contained
with instruction enum definitions.

llvm-svn: 325473
2018-02-19 05:26:49 +00:00
Craig Topper 9cf812e1ed [X86] Correct a typo I made in combineToExtendCMOV recently.
We're accidentally checking that the same node is a constant twice instead of checking the other node.

This isn't a functional problem since we didn't do anything below that explicitly requires constants. It just means we may have introduced a sign_extend or zero_extend that won't fold out.

llvm-svn: 325469
2018-02-18 20:41:25 +00:00
Amara Emerson 242efdb54b Fix unused assertion variable warning.
llvm-svn: 325464
2018-02-18 17:28:34 +00:00
Amara Emerson 7e9f348b2d [AArch64][GlobalISel] Fix an assert fail/miscompile when fp16 types are copied
to gpr register banks.

PR36345.

rdar://36478867

Differential Revision: https://reviews.llvm.org/D43310

llvm-svn: 325463
2018-02-18 17:10:49 +00:00
Amara Emerson bc03baef77 [AArch64][GlobalISel] Support G_INSERT/G_EXTRACT of types < s32 bits.
These are needed for operations on fp16 types in a later patch.

llvm-svn: 325462
2018-02-18 17:03:02 +00:00
Haicheng Wu aed6e52b3c [AArch64] Coalesce Copy Zero during instruction selection
Add special case for copy of zero to avoid a double copy.

Differential Revision: https://reviews.llvm.org/D36104

llvm-svn: 325459
2018-02-18 13:51:33 +00:00
Jonas Paulsson 891789c299 [BPF] Return true in enableMultipleCopyHints().
Enable multiple COPY hints to eliminate more COPYs during register allocation.

Note that this is something all targets should do, see
https://reviews.llvm.org/D38128.

Review: Yonghong Song
llvm-svn: 325457
2018-02-18 10:09:54 +00:00
Craig Topper 1040f236a3 [X86] Make masked pcmpeq commutable during isel so we can fold loads in other operand to the shorter encoding.
Previously we used the immediate encoding if the load was in operand 0 and the short encoding if the load was in operand 1.

This added an insane number of bytes to the size of the isel table. I'm wondering if we should always use the immediate form during isel and change to the short form during emission. This would remove the need to pattern match every combination for both the immediate form and the short form during isel. We could do the same with vpcmpgt

llvm-svn: 325456
2018-02-18 02:37:33 +00:00
Simon Pilgrim 386b8ddd5f [MIPS][MSA] Convert vector integer min/max opcodes to use generic implementation
Found while investigating D43338

Simon^3 - the LLVM project needs more Simons.

Differential Revision: https://reviews.llvm.org/D43433

llvm-svn: 325447
2018-02-17 21:29:45 +00:00
Martin Storsjo a63a5b993e [AArch64] Implement dynamic stack probing for windows
This makes sure that alloca() function calls properly probe the
stack as needed.

Differential Revision: https://reviews.llvm.org/D42356

llvm-svn: 325433
2018-02-17 14:26:32 +00:00
Simon Pilgrim 63db669013 Fix unused variable warning. NFCI.
We were casting to AArch64InstrInfo but only using it for static methods which some compilers complain about.

llvm-svn: 325432
2018-02-17 13:48:23 +00:00
Jonas Paulsson b51a9bc358 [AMDGPU] Return true in enableMultipleCopyHints().
Enable multiple COPY hints to eliminate more COPYs during register allocation.

Note that this is something all targets should do, see
https://reviews.llvm.org/D38128.

Review: Stanislav Mekhanoshin, Tom Stellard.
llvm-svn: 325425
2018-02-17 10:00:28 +00:00
Chandler Carruth a1d6107b14 [DAG, X86] Revert r324797, r324491, and r324359.
Sadly, r324359 caused at least PR36312. There is a patch out for review
but it seems to be taking a bit and we've already had these crashers in
tree for too long. We're hitting this PR in real code now and are
blocked on shipping new compilers as a consequence so I'm reverting us
back to green.

Sorry for the churn due to the stacked changes that I had to revert. =/

llvm-svn: 325420
2018-02-17 02:26:25 +00:00
Craig Topper 0bcdd399e7 [X86] Turn selects with constant condition into vector shuffles during DAG combine
Summary:
Currently we convert to shuffles during lowering. This moves it to DAG combine so hopefully we can get it done before type legalization has to extend the condition.

I believe in some cases we're creating SHRUNKBLENDs that end up with constant conditions because we see the extended on the condition and think its a dynamic selelect before DAG combine gets a chance to constant fold the extend. We could add combines to turn SHRUNKBLENDs with constant condition back to vselect. But it seemed like it might be better to just send them to shuffles as early as possible so they never get a chance to become SHRUNKBLENDs. This the reason some tests went from blends controlled by a constant pool load to just move.

Some of the constant pool entries changed because the sign_extend introduced by type legalization turned undef elements in select condition into 0s. While the select->shuffle used -1 in the shuffle mask. So now the shuffle lowering can do what it wants with them.

I'll remove the lowering code as a follow up. We might be able to simplify some of the pre-checks for SHRUNKBLEND as the FIXME there says.

Reviewers: spatel, RKSimon, efriedma, zvi, andreadb

Reviewed By: spatel

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D43367

llvm-svn: 325417
2018-02-17 00:30:30 +00:00
Konstantin Zhuravlyov ef9aafcddc AMDGPU: Remove unused private member of AMDGPUTargetELFStreamer
llvm-svn: 325408
2018-02-16 23:04:11 +00:00
Eric Christopher 8ceddb0ecd Remove an unused function.
llvm-svn: 325403
2018-02-16 22:46:47 +00:00
Konstantin Zhuravlyov 9122a63143 AMDGPU: Bring elf flags in sync with the spec
- Add MACH flags
- Add XNACK flag
- Add reserved flags
- Minor cleanups in docs

Differential Revision: https://reviews.llvm.org/D43356

llvm-svn: 325399
2018-02-16 22:33:59 +00:00
Craig Topper 27b9ac2372 [X86] In lowerVSELECTtoVectorShuffle, don't map undef select condition to undef in shuffle mask.
Undef in select condition means we should pick the element from one side or the other. An undef in a shuffle mask means pick any element from either source or worse.

I suspect by the time we get here most of the undefs in a constant vector have been removed by other things, but doing this for safety.

llvm-svn: 325394
2018-02-16 21:36:29 +00:00
Konstantin Zhuravlyov 331f97e171 AMDGPU: Bring processors and features in sync with the spec
- Remove gfx800
- Make iceland gfx802
- Add xnack to gfx902

Differential Revision: https://reviews.llvm.org/D43355

llvm-svn: 325393
2018-02-16 21:26:25 +00:00
Evandro Menezes 10ae20d80c [AArch64] Fix BITCAST lowering crash
The data type is assumed to be a vector, but sometimes it is not, leading
to an assertion.

Add simple test-case to verify this.

Differential revision: https://reviews.llvm.org/D42599

llvm-svn: 325378
2018-02-16 20:00:57 +00:00
Changpeng Fang ba92059ca9 AMDGPU/SI: Extend promoting alloca to vector to arrays of up to 16 elements
Summary:
  This patch extends the promotion of alloca to vector to the arrays of up to 16 elements. Also we introduce
an option, -disable-promote-alloca-to-vector, to switch promotion to vector off, if needed.

Reviewers:
  arsenm

Differential Revision:
  https://reviews.llvm.org/D33559

llvm-svn: 325372
2018-02-16 19:14:17 +00:00
Craig Topper de565fc73e [X86] Only reorder srl/and on last DAG combiner run
This seems to interfere with a target independent brcond combine that looks for the (srl (and X, C1), C2) pattern to enable TEST instructions. Once we flip, that combine doesn't fire and we end up exposing it to the X86 specific BT combine which causes us to emit a BT instruction. BT has lower throughput than TEST.

We could try to make the brcond combine aware of the alternate pattern, but since the flip was just a code size reduction and not likely to enable other combines, it seemed easier to just delay it until after lowering.

Differential Revision: https://reviews.llvm.org/D43201

llvm-svn: 325371
2018-02-16 18:51:09 +00:00
Craig Topper 79bd39db80 [X86] Remove call to ShrinkDemandedCosntant from the SHRUNKBLEND creation code.
We only run this code if know the condition isn't a constant vector. ShrinkDemandedConstant isn't going to find any different.

llvm-svn: 325368
2018-02-16 18:34:46 +00:00
Sam Clegg b7a5469c7e [WebAssembly] MC: Make explicit our current lack of support for relocations against unnamed temporary symbols.
Add an explicit check before looking up symbol in SymbolIndices.
This was previously silently succeeding and returning zero for such
unnamed temporaries.

Differential Revision: https://reviews.llvm.org/D43365

llvm-svn: 325367
2018-02-16 18:06:05 +00:00
Changpeng Fang da38b5fd49 AMDGPU/SI: Turn off GPR Indexing Mode immediately after the interested instruction.
Summary:
  In the current implementation of GPR Indexing Mode when the index is of non-uniform, the s_set_gpr_idx_off instruction
is incorrectly inserted after the loop. This will lead the instructions with vgpr operands (v_readfirstlane for example) to read incorrect
vgpr.
 In this patch, we fix the issue by inserting s_set_gpr_idx_on/off immediately around the interested instruction.

Reviewers:
  rampitec

Differential Revision:
  https://reviews.llvm.org/D43297

llvm-svn: 325355
2018-02-16 16:31:30 +00:00
Simon Pilgrim 4e2f757dc1 [X86][SSE] Allow float domain crossing if we are merging 2 or more shuffles and the root started as a float domain shuffle
llvm-svn: 325349
2018-02-16 14:57:25 +00:00
Nemanja Ivanovic 6cf41b028d [PowerPC] Fix transform in table gen file causing UB
Running a bootstrap build with UBSan produces a number of instances where
we have signed integer overflow due to this transform. Change the type to
long to prevent this UB on 64-bit build machines.

llvm-svn: 325347
2018-02-16 14:49:01 +00:00
Simon Dardis b8ae30ecec [mips] Remove codegen support from some 16 bit instructions
These instructions conflict with their full length variants
for the purposes of FastISel as they cannot be distingushed
based on the number and type of operands and predicates.

Reviewers: atanasyan

Differential Revision: https://reviews.llvm.org/D41285

llvm-svn: 325341
2018-02-16 13:34:23 +00:00
Jonas Paulsson 995ba6e42c [ARM] Return true in enableMultipleCopyHints().
Enable multiple COPY hints to eliminate more COPYs during register allocation.

Note that this is something all targets should do, see
https://reviews.llvm.org/D38128.

Review: Eli Friedman
llvm-svn: 325327
2018-02-16 09:51:01 +00:00
Roger Ferrer Ibanez d41059a9f6 [ARM] Materialise some boolean values to avoid a branch
This patch combines some cases of ARMISD::CMOV for integers that arise in comparisons of the form

  a != b ? x : 0
  a == b ? 0 : x

and that currently (e.g. in Thumb1) are emitted as branches.

Differential Revision: https://reviews.llvm.org/D34515

llvm-svn: 325323
2018-02-16 09:23:59 +00:00
Craig Topper 2e4b838c06 [X86] Allow CMOVs of constants to be sign extended from i32.
Sign extending i32 constants only requires a REX prefix as does widening the CMOV. This is cheaper than the explicit sign extend op.

llvm-svn: 325318
2018-02-16 07:16:15 +00:00
Craig Topper 5d9e301042 [X86] Don't zero_extend cmov up to i64, stop at i32.
Zero extend from i32 to i64 is free. So extend from i16 to i32, and then use a free zero extend to finish.

llvm-svn: 325317
2018-02-16 06:52:43 +00:00
Stanislav Mekhanoshin ff2763a658 [AMDGPU] Combine adjacent waitcounts in a single strongest wait
Differential Revision: https://reviews.llvm.org/D43350

llvm-svn: 325299
2018-02-15 22:03:55 +00:00
Rafael Auler de9ad4ba84 [X86][3DNOW] Teach decoder about AMD 3DNow! instrs
Summary:
This patch makes the decoder understand old AMD 3DNow!
instructions that have never been properly supported in the X86
disassembler, despite being supported in other subsystems. Hopefully
this should make the X86 decoder more complete with respect to binaries
containing legacy code.

Reviewers: craig.topper

Reviewed By: craig.topper

Subscribers: llvm-commits, maksfb, bruno

Differential Revision: https://reviews.llvm.org/D43311

llvm-svn: 325295
2018-02-15 21:20:31 +00:00
Craig Topper f3f35efe5c [X86] Enable BT to be used in place of TEST for single bit checks under optsize
We already do this for 64-bit when it won't fit into a 64-bit AND/TEST's immediate field. This adds an additional qualifier to do it for any single bit constant larger than 8-bits under optsize

Differential Revision: https://reviews.llvm.org/D43346

llvm-svn: 325290
2018-02-15 20:27:30 +00:00
Craig Topper 2b2d8c5eb2 [X86] Use btc/btr/bts to implement xor/and/or that affects a single bit in the upper 32-bits of a 64-bit operation.
We can't fold a large immediate into a 64-bit operation. But if we know we're only operating on a single bit we can use the bit instructions.

For now only do this for optsize.

Differential Revision: https://reviews.llvm.org/D37418

llvm-svn: 325287
2018-02-15 19:57:35 +00:00
Krzysztof Parzyszek e0d7de7d7b Recommit [Hexagon] Make the vararg handling a bit more robust
Use the FunctionType of the callee when it's available. It may not be
available for synthetic calls to functions specified by external symbols.

llvm-svn: 325269
2018-02-15 17:20:07 +00:00
Yonghong Song 920df52a93 bpf: fix a bug in dag2dag optimization for loads from readonly section
The reference '&' is missing in the function parameter. If there are
back-to-back optimizations in terms of dag node list like below:
  t29: i64,ch = load<LD4[bitcast (%struct.test_t* @test.t to i8*)+12](dereferenceable), zext from i32> t3, t43, undef:i64
  t34: i64,ch = load<LD4[bitcast (%struct.test_t* @test.t to i8*)](dereferenceable), zext from i32> t3, t41, undef:i64
The bug will trigger a segfault for the added test case remove_truncate_5.ll:
  LLVMSymbolizer: error reading file: No such file or directory
  #0 0x000000000241c4d9 (llc+0x241c4d9)
  #1 0x000000000241c56a (llc+0x241c56a)
  #2 0x000000000241aa50 (llc+0x241aa50)
  ...
  #22 0x0000000000fd5edf (llc+0xfd5edf)
  #23 0x00007f0fe03bec05 __libc_start_main (/lib64/libc.so.6+0x21c05)
  #24 0x0000000000fd3e69 (llc+0xfd3e69)
  ...
  Segmentation fault

Signed-off-by: Yonghong Song <yhs@fb.com>
llvm-svn: 325267
2018-02-15 17:06:45 +00:00
Krzysztof Parzyszek 8a9eff6b87 Revert "[Hexagon] Make the vararg handling a bit more robust"
This is breaking lit tests.

llvm-svn: 325266
2018-02-15 16:57:44 +00:00
Krzysztof Parzyszek 568107275d [Hexagon] Make the vararg handling a bit more robust
The FunctionType of the callee is always available, even if the Function
of the callee is not. Use that to get the number of fixed parameters.

llvm-svn: 325259
2018-02-15 16:24:30 +00:00
Krzysztof Parzyszek 18e0d2a1f8 [Hexagon] Fix lowering of formal arguments after r324737
Lowering of formal arguments needs to be aware of vararg functions.

llvm-svn: 325255
2018-02-15 15:47:53 +00:00
Pablo Barrio e28cb8399a [ARM] Allow 64- and 128-bit types with 't' inline asm constraint
Summary:
In LLVM, 't' selects a floating-point/SIMD register and only supports
32-bit values. This is appropriately documented in the LLVM Language
Reference Manual. However, this behaviour diverges from that of GCC, where
't' selects the s0-s31 registers and its qX and dX variants depending on
additional operand modifiers (q/P).

For example, the following C code:

#include <arm_neon.h>
float32x4_t a, b, x;
asm("vadd.f32 %0, %1, %2" : "=t" (x) : "t" (a), "t" (b))

results in the following assembly if compiled with GCC:

vadd.f32 s0, s0, s1

whereas LLVM will show "error: couldn't allocate output register for
constraint 't'", since a, b, x are 128-bit variables, not 32-bit.

This patch extends the use of 't' to mean that of GCC, thus allowing
selection of the lower Q vector regs and their D/S variants. For example,
the earlier code will now compile as:

vadd.f32 q0, q0, q1

This behaviour still differs from that of GCC but I think it is actually
more correct, since LLVM picks up the right register type based on the
datatype of x, while GCC would need an extra operand modifier to achieve
the same result, as follows:

asm("vadd.f32 %q0, %q1, %q2" : "=t" (x) : "t" (a), "t" (b))

Since this is only an extension of functionality, existing code should not
be affected by this change. Note that operand modifiers q/P are already
supported by LLVM, so this patch should suffice to support inline
assembly with constraint 't' originally built for GCC.

Reviewers: grosbach, rengolin

Reviewed By: rengolin

Subscribers: rogfer01, efriedma, olista01, aemerson, javed.absar, eraman, kristof.beyls, llvm-commits

Differential Revision: https://reviews.llvm.org/D42962

llvm-svn: 325244
2018-02-15 14:44:22 +00:00
Simon Pilgrim 17bb6f0755 [X86][SSE] combineTruncateWithSat - use truncateVectorWithPACK to chain PACKUS vXi32-vXi8 saturated truncation
We can use PACKSS/PACKUS to saturate each stage of the chain: PACKSSDW down to [-32768,32767] and then PACKUSWB to [0,255].

llvm-svn: 325243
2018-02-15 14:37:59 +00:00
Simon Pilgrim 908f833e57 [X86][SSE] combineTruncateWithSat - use truncateVectorWithPACK to chain PACKSS vXi32-vXi8 saturated truncation
We can use PACKSS to saturate each stage of the chain: PACKSSDW down to [-32768,32767] and then PACKSSWB to [-128,127].

PACKUS is a little trickier and will be handled in a separate patch.

llvm-svn: 325235
2018-02-15 13:33:15 +00:00
Sjoerd Meijer 9430c8cd1c [ARM] f16 vcmp fixes
This adds f16 VCMP match rules and fixes the test cases.

Differential Revision: https://reviews.llvm.org/D43291

llvm-svn: 325228
2018-02-15 10:33:07 +00:00
Craig Topper 386cfa08a8 [X86] Change 32 and 64 bit versions of LSL instruction have a 16-bit memory operand.
This matches the Intel and AMD documentation and is consistent with the LAR instruction.

llvm-svn: 325197
2018-02-15 01:21:53 +00:00
Craig Topper bab7b0a466 [X86] Dont' allow 'outs' and 'ins' in at&t syntax without suffixes.
The match would be ambiguous, but at&t asm parsing doesn't support ambiguous matches and will just return the first.

llvm-svn: 325192
2018-02-14 23:53:26 +00:00
Craig Topper a08f83bac3 [X86] Reverse the operand order of invlpga in at&t syntax to match gas.
llvm-svn: 325190
2018-02-14 23:53:21 +00:00
Craig Topper 675752166d [X86] Don't swap argument on BOUND instruction in at&t syntax.
The bound instruction does not have reversed operands in gas.

Fixes PR27653.

Patch by Maya Madhavan.

Differential Revision: https://reviews.llvm.org/D43243

llvm-svn: 325178
2018-02-14 21:54:58 +00:00
Krzysztof Parzyszek ad83ce4cb4 [Hexagon] Split HVX vector pair loads/stores, expand unaligned loads
llvm-svn: 325169
2018-02-14 20:46:06 +00:00
Simon Pilgrim 2ec8373633 [X86][SSE] truncateVectorWithPACK - Use src type instead of dst to select between PACK*SDW/PACK*SWB
Try to keep PACK*SDW/PACK*SWB as wide as possible, this helps ComputeNumSignBits as it can only peek through bitcasts to wider types, pre-AVX2 codegen was already doing this as it could peek through bitcasts/subvectors more easily than AVX2 could through shuffles.

This shouldn't affect existing results as calls to truncateVectorWithPACK ensure we have enough sign bits to pack to the same value, but it should make it possible to use truncateVectorWithPACK chains to perform saturation in combineTruncateWithSat with a future patch.

llvm-svn: 325149
2018-02-14 18:23:58 +00:00
Stanislav Mekhanoshin c078ca92eb [AMDGPU] Remove non-temporal flag from argument loads
Kernel arguments likely read by all workitems and should not bypass
cache. Fixes performance hit in sub-dword argument loads.

Differential Revision: https://reviews.llvm.org/D43249

llvm-svn: 325146
2018-02-14 18:05:14 +00:00
Paul Robinson ee88ed6753 [DWARF] Fix incorrect prologue end line record.
The prologue-end line record must be emitted after the last
instruction that is part of the function frame setup code and before
the instruction that marks the beginning of the function body.

Patch by Carlos Alberto Enciso!

Differential Revision: https://reviews.llvm.org/D41762

llvm-svn: 325143
2018-02-14 17:35:52 +00:00
Sjoerd Meijer 3b4294edd2 [ARM] f16 stack spill/reloads
This adds support for handling f16 stack spills/reloads.

Differential Revision: https://reviews.llvm.org/D43280

llvm-svn: 325130
2018-02-14 15:09:09 +00:00
Simon Pilgrim ded6e7a263 Fix GCC -Wlogical-op-parentheses warning. NFCI.
llvm-svn: 325129
2018-02-14 15:07:36 +00:00
Lama Saba fe1016c485 [X86] Reduce Store Forward Block issues in HW - Recommit after fixing Bug 36346
If a load follows a store and reloads data that the store has written to memory, Intel microarchitectures can in many cases forward the data directly from the store to the load, This "store forwarding" saves cycles by enabling the load to directly obtain the data instead of accessing the data from cache or memory.
A "store forward block" occurs in cases that a store cannot be forwarded to the load. The most typical case of store forward block on Intel Core microarchiticutre that a small store cannot be forwarded to a large load.
The estimated penalty for a store forward block is ~13 cycles.

This pass tries to recognize and handle cases where "store forward block" is created by the compiler when lowering memcpy calls to a sequence
of a load and a store.

The pass currently only handles cases where memcpy is lowered to XMM/YMM registers, it tries to break the memcpy into smaller copies.
breaking the memcpy should be possible since there is no atomicity guarantee for loads and stores to XMM/YMM.

Change-Id: Ic41aa9ade6512e0478db66e07e2fde41b4fb35f9
llvm-svn: 325128
2018-02-14 14:58:53 +00:00
Simon Pilgrim 86d15bff68 [X86][SSE] Relax type legality for combineTruncateWithSat PACKSS/PACKUS truncation
While the AVX512 VTRUNCS/VTRUNCUS instructions require legal types, truncateVectorWithPACK handles cases with multiples of legal types through splitting/concatenation. So we just need to ensure that the src/dst scalar types are correct and leave truncateVectorWithPACK to handle the rest of it.

llvm-svn: 325127
2018-02-14 14:14:29 +00:00
Reid Kleckner 1631ac1696 [X86] Remove dead code from retpoline thunk generation
Follow-up to r325049

llvm-svn: 325085
2018-02-14 00:24:29 +00:00
Daniel Sanders 7fc87360e9 [globalisel][legalizerinfo] Follow up on post-commit review comments after r323681
* Document most API's
* Delete a useless function call
* Fix a discrepancy between the single and multi-opcode variants of
  getActionDefinitions().
  The multi-opcode variant now requires that more than one opcode is requested.
  Previously it acted much like the single-opcode form but unnecessarily
  enforced the requirements of the multi-opcode form.

llvm-svn: 325067
2018-02-13 23:02:44 +00:00
Reid Kleckner 91e11a83fc [X86] Use EDI for retpoline when no scratch regs are left
Summary:
Instead of solving the hard problem of how to pass the callee to the indirect
jump thunk without a register, just use a CSR. At a call boundary, there's
nothing stopping us from using a CSR to hold the callee as long as we save and
restore it in the prologue.

Also, add tests for this mregparm=3 case. I wrote execution tests for
__llvm_retpoline_push, but they never got committed as lit tests, either
because I never rewrote them or because they got lost in merge conflicts.

Reviewers: chandlerc, dwmw2

Subscribers: javed.absar, kristof.beyls, hiraditya, llvm-commits

Differential Revision: https://reviews.llvm.org/D43214

llvm-svn: 325049
2018-02-13 20:47:49 +00:00
Hans Wennborg f381e94ac8 Revert r324903 "[AArch64] Refactor identification of SIMD immediates"
It caused "Cannot select: t33: f64 = AArch64ISD::FMOV Constant:i32<0>"
in Chromium builds. See PR36369.

> Get rid of icky goto loops and make the code easier to maintain (NFC).
>
> Differential revision: https://reviews.llvm.org/D42723

llvm-svn: 325034
2018-02-13 18:14:38 +00:00
Yaxun Liu 0124b5484c [AMDGPU] Change constant addr space to 4
Differential Revision: https://reviews.llvm.org/D43170

llvm-svn: 325030
2018-02-13 18:00:25 +00:00
Craig Topper 036789a7e8 [X86] Add combine to shrink 64-bit ands when one input is an any_extend and the other input guarantees upper 32 bits are 0.
Summary: This gets the shift case from PR35792.

Reviewers: spatel, RKSimon

Reviewed By: RKSimon

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D43222

llvm-svn: 325018
2018-02-13 16:25:25 +00:00
Krzysztof Parzyszek cfbe6ba20c [Hexagon] Simplify some code, NFC
llvm-svn: 325014
2018-02-13 15:35:07 +00:00
Krzysztof Parzyszek 080bf219c2 [Hexagon] Remove unnecessary check
llvm-svn: 325013
2018-02-13 15:34:29 +00:00
Sjoerd Meijer f4a7fa7bbe [ARM] Allow half types in ConstantPool
Change ARMConstantIslandPass to:
- accept f16 literals as litpool entries,
- if the litpool needs to be inserted in the middle of a big block, then we
  need to 4-byte align the next instruction in ARM mode.

Differential Revision: https://reviews.llvm.org/D42784

llvm-svn: 325012
2018-02-13 15:34:09 +00:00
Andre Vieira f00234c0bf [ARM] Don't print "Requires NEON" error message for M-profile
Differential Revision: https://reviews.llvm.org/D43125

llvm-svn: 325000
2018-02-13 11:46:38 +00:00
Sjoerd Meijer 101ee43072 [Thumb] Handle addressing mode AddrMode5FP16
This addressing mode wasn't checked, so we were running in an assert.

Differential Revision: https://reviews.llvm.org/D43179

llvm-svn: 324996
2018-02-13 10:29:03 +00:00
Craig Topper df99baa4df [X86] Teach EVEX->VEX pass to turn VRNDSCALE into VROUND when bits 7:4 of the immediate are 0 and the regular EVEX->VEX checks pass.
Bits 7:4 control the scale part of the operation. If the scale is 0 the behavior is equivalent to VROUND.

Fixes PR36246

llvm-svn: 324985
2018-02-13 04:19:26 +00:00
Craig Topper 5ce6db93c1 [X86] Use getTypeAction in most places that were checking ExperimentalVectorWideningLegalization.
This will allow more flexibility in what types we legalize via widening or not. This should help with a couple lines in D41062.

llvm-svn: 324980
2018-02-13 01:49:58 +00:00
Jacob Gravelle ca358da5e7 [WebAssembly] Fix casting MCSymbol to MCSymbolWasm on ELF
Summary:
wasm32-unknown-unknown-elf has MCSymbols that are not MCSymbolWasms, so
we need a non-asserting cast here.

Reviewers: dschuff, sunfish

Subscribers: jfb, sbc100, aheejin, llvm-commits

Differential Revision: https://reviews.llvm.org/D43205

llvm-svn: 324942
2018-02-12 21:41:12 +00:00
Craig Topper 88939fefe8 [X86] Simplify X86DAGToDAGISel::matchBEXTRFromAnd by creating an X86ISD::BEXTR node and calling Select. Add isel patterns to recognize this node.
This removes a bunch of special case code for selecting the immediate and folding loads.

llvm-svn: 324939
2018-02-12 21:18:11 +00:00
Craig Topper efe3923514 [X86] Remove unused multiclass argument. NFC
llvm-svn: 324938
2018-02-12 21:18:09 +00:00
Abderrazek Zaafrani e72d99261f [AArch64] Fixes for ARMv8.2-A FP16 scalar intrinsic - llvm portion
https://reviews.llvm.org/D42993

llvm-svn: 324912
2018-02-12 17:35:42 +00:00
Simon Pilgrim 0efe9bc953 [X86] Add missing scheduling class tag for i64 absolute address moves
Expand existing SchedRW to encompass these like it did for the other memory offset movs - added comments to closing braces to keep track of def scopes.

We only tagged it with the itinerary class, so completeness checks were erroneously passed (PR35639).

llvm-svn: 324910
2018-02-12 17:21:28 +00:00
Oliver Stannard 02f08c9d1f [AArch64] Improve v8.1-A code-gen for atomic load-and
Armv8.1-A added an atomic load-clear instruction (which performs bitwise
and with the complement of it's operand), but not a load-and
instruction. Our current code-generation for atomic load-and always
inserts an MVN instruction to invert its argument, even if it could be
folded into a constant or another instruction.

This adds lowering early in selection DAG to convert a load-and
operation into an xor with -1 and a load-clear, allowing the normal DAG
optimisations to work on it.

To do this, I've had to add a new ISD opcode, ATOMIC_LOAD_CLR. I don't
see any easy way to do this with an AArch64-specific ISD node, because
the code-generation for atomic operations assumes the SDNodes are of
type AtomicSDNode.

I've left the old tablegen patterns in because they are still needed for
global isel.

Differential revision: https://reviews.llvm.org/D42478

llvm-svn: 324908
2018-02-12 17:03:11 +00:00
Simon Pilgrim 07e1337c2a [X86][AVX512] Add missing scheduling class tag for KMOVB/KMOVW/KMOVD/KMOVQ moves/loads/stores.
We only tagged it with the itinerary class, so completeness checks were erroneously passed (PR35639).

llvm-svn: 324905
2018-02-12 16:59:04 +00:00
Evandro Menezes 7dc0f1ec45 [AArch64] Refactor identification of SIMD immediates
Get rid of icky goto loops and make the code easier to maintain (NFC).

Differential revision: https://reviews.llvm.org/D42723

llvm-svn: 324903
2018-02-12 16:41:41 +00:00
Simon Pilgrim 369e59d4d1 [X86][AVX512] Add missing scheduling class tag for VMOVQ/VMOVHLPS/VMOVLHPS/VMOVHPD/VMOVHPS/VMOVLPD/VMOVLPS
Tag AVX512 variants to match SSE/AVX originals.

We only tagged it with the itinerary class, so completeness checks were erroneously passed (PR35639).

llvm-svn: 324901
2018-02-12 16:18:36 +00:00
Simon Pilgrim b941f5dc5f [X86] Tag CET-IBT instruction scheduler classes
llvm-svn: 324898
2018-02-12 15:57:00 +00:00
Simon Pilgrim d0693a6501 [X86][MMX] Add missing scheduling class tag for EMMS/FEMMS
We only tagged it with the itinerary class, so completeness checks were erroneously passed (PR35639).

AMD targets can perform these a lot quicker than WriteMicrocoded so will need an override in the models.

llvm-svn: 324897
2018-02-12 15:52:59 +00:00
Oliver Stannard 4269917304 [AArch64] Improve v8.1-A code-gen for atomic load-subtract
Armv8.1-A added an atomic load-add instruction, but not a load-subtract
instruction. Our current code-generation for atomic load-subtract always
inserts a NEG instruction to negate it's argument, even if it could be
folded into a constant or another instruction.

This adds lowering early in selection DAG to convert a load-subtract
operation into a subtract and a load-add, allowing the normal DAG
optimisations to work on it.

I've left the old tablegen patterns in because they are still needed for
global isel.

Some of the tests in this patch are copied from D35375 by Chad Rosier (which
was abandoned).

Differential revision: https://reviews.llvm.org/D42477

llvm-svn: 324892
2018-02-12 14:22:03 +00:00
Hans Wennborg 7e19dfc45f Revert r324835 "[X86] Reduce Store Forward Block issues in HW"
It asserts building Chromium; see PR36346.

(This also reverts the follow-up r324836.)

> If a load follows a store and reloads data that the store has written to memory, Intel microarchitectures can in many cases forward the data directly from the store to the load, This "store forwarding" saves cycles by enabling the load to directly obtain the data instead of accessing the data from cache or memory.
> A "store forward block" occurs in cases that a store cannot be forwarded to the load. The most typical case of store forward block on Intel Core microarchiticutre that a small store cannot be forwarded to a large load.
> The estimated penalty for a store forward block is ~13 cycles.
>
> This pass tries to recognize and handle cases where "store forward block" is created by the compiler when lowering memcpy calls to a sequence
> of a load and a store.
>
> The pass currently only handles cases where memcpy is lowered to XMM/YMM registers, it tries to break the memcpy into smaller copies.
> breaking the memcpy should be possible since there is no atomicity guarantee for loads and stores to XMM/YMM.

llvm-svn: 324887
2018-02-12 12:43:39 +00:00
Simon Atanasyan 0874cf5e62 [mips] Fix 'l' constraint handling for types smaller than 32 bits
In case of correct using of the 'l' constraint llvm now generates valid
code; otherwise it shows an error message. Initially these triggers an
assertion.

This commit is the same as r324869 with fixed the test's file name.

llvm-svn: 324885
2018-02-12 12:21:55 +00:00
Simon Atanasyan dc4ed35ea6 [mips] Revert rL324869
This commit adds inlineasm-cnstrnt-bad-l.ll which is clashing
with inlineasm-cnstrnt-bad-L.ll on case insensitive file systems.

llvm-svn: 324882
2018-02-12 11:15:37 +00:00
David Green 6d9f8c9817 [CodeGen] Add a -trap-unreachable option for debugging
Add a common -trap-unreachable option, similar to the target
specific hexagon equivalent, which has been replaced. This
turns unreachable instructions into traps, which is useful for
debugging.

Differential Revision: https://reviews.llvm.org/D42965

llvm-svn: 324880
2018-02-12 11:06:27 +00:00
Simon Atanasyan e08f2a19d4 [mips] Fix 'l' constraint handling for types smaller than 32 bits
In case of correct using of the 'l' constraint llvm now generates valid
code; otherwise it shows an error message. Initially these triggers an
assertion.

llvm-svn: 324869
2018-02-12 07:51:21 +00:00
Craig Topper b424fafa9f [X86] Don't look for TEST instruction shrinking opportunities when the root node is a X86ISD::SUB.
I don't believe we ever create an X86ISD::SUB with a 0 constant which is what the TEST handling needs. The ternary operator at the end of this code shows up as only going one way in the llvm-cov report from the bots.

llvm-svn: 324865
2018-02-12 03:02:02 +00:00
Craig Topper 3ccbd3f32f [X86] Remove check for X86ISD::AND with no flag users from the TEST instruction immediate shrinking code.
We turn X86ISD::AND with no flag users back to ISD::AND in PreprocessISelDAG.

llvm-svn: 324864
2018-02-12 03:02:01 +00:00
Craig Topper 98ae8f833f [X86] Change some compare patterns to use loadi8/loadi16/loadi32/loadi64 helper fragments.
This enables CMP8mi to fold zextloadi8i1 which in all tests allows us to avoid creating a TEST8rr that peephole can't fold.

llvm-svn: 324863
2018-02-12 02:48:42 +00:00
Craig Topper 3ce035acf3 [X86] Add KADD X86ISD opcode instead of reusing ISD::ADD.
ISD::ADD implies individual vector element addition with no carries between elements. But for a vXi1 type that would be the same as XOR. And we already turn ISD::ADD into ISD::XOR for all vXi1 types during lowering. So the ISD::ADD pattern would never be able to match anyway.

KADD is different, it adds the elements but also propagates a carry between them. This just a way of doing an add in k-register without bitcasting to the scalar domain. There's still no way to match the pattern, but at least its not obviously wrong.

llvm-svn: 324861
2018-02-12 01:33:38 +00:00
Craig Topper dfc322ddf4 [X86] Allow zextload/extload i1->i8 to be folded into instructions during isel
Previously we just emitted this as a MOV8rm which would likely get folded during the peephole pass anyway. This just makes it explicit earlier.

The gpr-to-mask.ll test changed because the kaddb instruction has no memory form.

llvm-svn: 324860
2018-02-12 01:33:36 +00:00
Craig Topper 363e099446 [X86] Remove MASK_BINOP intrinsic type. NFC
llvm-svn: 324858
2018-02-11 22:32:30 +00:00
Craig Topper 38d61c38a2 [X86] Remove dead code from getMaskNode that looked for a i64 mask with a maskVT that wasn't v64i1. NFC
llvm-svn: 324857
2018-02-11 22:32:29 +00:00
Craig Topper a7ac028a6b [X86] Remove LowerBoolVSETCC_AVX512, we get this with a target independent DAG combine now. NFC
llvm-svn: 324856
2018-02-11 22:32:27 +00:00
Simon Pilgrim 0d8c4bfc2a [X86][SSE] Use SplitBinaryOpsAndApply to recognise PSUBUS patterns before they're split on AVX1
This needs to be generalised further to support AVX512BW cases but I want to add non-uniform constants first.

llvm-svn: 324844
2018-02-11 17:29:42 +00:00
Craig Topper ca5a340171 [X86] Use min/max for vector ult/ugt compares if avoids a sign flip.
Summary:
Currently we only use min/max to help with ule/uge compares because it removes an invert of the result that would otherwise be needed. But we can also use it for ult/ugt compares if it will prevent the need for a sign bit flip needed to use pcmpgt at the cost of requiring an invert after the compare.

I also refactored the code so that the max/min code is self contained and does its own return instead of setting up a flag to manipulate the rest of the function's behavior.

Most of the test cases look ok with this. I did notice that we added instructions when one of the operands being sign flipped is a constant vector that we were able to constant fold the flip into.

I also noticed that sometimes the SSE min/max clobbers a register that is needed after the compare. This resulted in an extra move being inserted before the min/max to preserve the register. We could try to detect this and switch from min to max and change the compare operands to use the operand that gets reused in the compare.

Reviewers: spatel, RKSimon

Reviewed By: RKSimon

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D42935

llvm-svn: 324842
2018-02-11 17:11:40 +00:00
Simon Pilgrim c2544c572a [X86][SSE] Moved SplitBinaryOpsAndApply earlier so more methods can use it. NFCI.
llvm-svn: 324841
2018-02-11 17:01:43 +00:00
Simon Pilgrim 0be5567a89 [X86][SSE] Enable SMIN/SMAX/UMIN/UMAX custom lowering for all legal types
This allows us to recognise more saturation patterns and also simplify some MINMAX codegen that was failing to combine CMPGE comparisons to a legal CMPGT.

Differential Revision: https://reviews.llvm.org/D43014

llvm-svn: 324837
2018-02-11 10:52:37 +00:00
Lama Saba c2ba6c387e [X86] Reduce Store Forward Block issues in HW
If a load follows a store and reloads data that the store has written to memory, Intel microarchitectures can in many cases forward the data directly from the store to the load, This "store forwarding" saves cycles by enabling the load to directly obtain the data instead of accessing the data from cache or memory.
A "store forward block" occurs in cases that a store cannot be forwarded to the load. The most typical case of store forward block on Intel Core microarchiticutre that a small store cannot be forwarded to a large load.
The estimated penalty for a store forward block is ~13 cycles.

This pass tries to recognize and handle cases where "store forward block" is created by the compiler when lowering memcpy calls to a sequence
of a load and a store.

The pass currently only handles cases where memcpy is lowered to XMM/YMM registers, it tries to break the memcpy into smaller copies.
breaking the memcpy should be possible since there is no atomicity guarantee for loads and stores to XMM/YMM.

Change-Id: I620b6dc91583ad9a1444591e3ddc00dd25d81748
llvm-svn: 324835
2018-02-11 09:34:12 +00:00
Craig Topper 24d3b28d93 [X86] Don't make 512-bit vectors legal when preferred vector width is 256 bits and 512 bits aren't required
This patch adds a new function attribute "required-vector-width" that can be set by the frontend to indicate the maximum vector width present in the original source code. The idea is that this would be set based on ABI requirements, intrinsics or explicit vector types being used, maybe simd pragmas, etc. The backend will then use this information to determine if its save to make 512-bit vectors illegal when the preference is for 256-bit vectors.

For code that has no vectors in it originally and only get vectors through the loop and slp vectorizers this allows us to generate code largely similar to our AVX2 only output while still enabling AVX512 features like mask registers and gather/scatter. The loop vectorizer doesn't always obey TTI and will create oversized vectors with the expectation the backend will legalize it. In order to avoid changing the vectorizer and potentially harm our AVX2 codegen this patch tries to make the legalizer behavior similar.

This is restricted to CPUs that support AVX512F and AVX512VL so that we have good fallback options to use 128 and 256-bit vectors and still get masking.

I've qualified every place I could find in X86ISelLowering.cpp and added tests cases for many of them with 2 different values for the attribute to see the codegen differences.

We still need to do frontend work for the attribute and teach the inliner how to merge it, etc. But this gets the codegen layer ready for it.

Differential Revision: https://reviews.llvm.org/D42724

llvm-svn: 324834
2018-02-11 08:06:27 +00:00
Craig Topper a4bf9b8d51 [X86] Remove setOperationAction lines for promoting vXi1 SINT_TO_FP/UINT_TO_FP.
We promote these via a DAG combine now before lowering gets the chance.

Also remove the v2i1 custom handling since it will no longer be triggered.

llvm-svn: 324833
2018-02-11 07:44:33 +00:00
Craig Topper ba5ad55965 [X86] Remove some redundant qualifications from the setOperationAction blocks. NFC
These were added as part of the refactoring for prefer vector width. At the time I thought the hasAVX512 here would be replaced with "allow 512 bit vectors" so that it would read "allow 512 bit vectors OR VLX". But now the plan is to only give the option of disabling 512 bit vectors when VLX is enabled. So we don't need this qualification at all

llvm-svn: 324831
2018-02-11 03:07:19 +00:00
Craig Topper 4dccffc84a [X86] Change signatures of avx512 packed fp compare intrinsics to return a vXi1 mask type to be closer to an fcmp.
Summary:
This patch changes the signature of the avx512 packed fp compare intrinsics to return a vXi1 vector and no longer take a mask as input. The casts to scalar type will now need to be explicit in the IR. The masking node will now be an explicit and in the IR.

This makes the intrinsic look much more similar to an fcmp instruction that we wish we could use for these but can't. We already use icmp instructions for integer compares.

Previously the lowering step of isel would turn the intrinsic into an X86 specific ISD node and a emit the masking nodes as well as some bitcasts. This means DAG combines can't see the vXi1 type until somewhat late, making it more difficult to combine out gpr<->mask transition sequences. By exposing the vXi1 type explicitly in the IR and initial SelectionDAG we give earlier DAG combines and even InstCombine the chance to see it and optimize it.

This should make any issues with gpr<->mask sequences the same between integer and fp. Meaning we only have to fix them once.

Reviewers: spatel, delena, RKSimon, zvi

Reviewed By: RKSimon

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D43137

llvm-svn: 324827
2018-02-10 23:33:55 +00:00
Simon Pilgrim cb9a02f60e [X86][SSE] Increase PMULLD costs to better match hardware
Until Skylake, most hardware could only issue a PMULLD op every other cycle

llvm-svn: 324823
2018-02-10 19:27:10 +00:00
Craig Topper 9121eb575e [X86] Custom legalize (v2i32 (setcc (v2f32))) so that we don't end up with a (v4i1 (setcc (v4f32)))
Undef VLX, getSetCCResultType returns v2i1/v4i1 for v2f32/v4f32 so default type legalization will end up changing the setcc result type back to vXi1 if it had been extended. The resulting extend gets messed up further by type legalization and is difficult to recombine back to (v4i32 (setcc (v4f32))) after legalization.

I went ahead and enabled this for SSE2 and later since its always the result we want and this helps type legalization get there in less steps.

llvm-svn: 324822
2018-02-10 19:12:58 +00:00
Craig Topper 28d3a73c81 [X86] Extend inputs with elements smaller than i32 to sint_to_fp/uint_to_fp before type legalization.
This prevents extends of masks being introduced during lowering where it become difficult to combine them out.

There are a few oddities in here.

We sometimes concatenate two k-registers produced by two compares, sign_extend the combined pair, then extract two halves. This worked better previously because the sign_extend wasn't created until after the fp_to_sint was split which led to a split sign_extend being created.

We probably also need to custom type legalize (v2i32 (sext v2i1)) via widening.

llvm-svn: 324820
2018-02-10 17:58:58 +00:00
Craig Topper b8d7b1620b [X86] Custom legalize (v2i1 (fp_to_uint/fp_to_sint v2f64)) without AVX512VL.
Strangely the code was already present, just the setOperationAction wasn't being called without VLX.

llvm-svn: 324806
2018-02-10 08:39:31 +00:00
Craig Topper c3aab4bbe1 [X86] Legalize zero extends from vXi1 to vXi16/vXi32/vXi64 using a sign extend and a shift.
This avoids a constant pool load to create 1.

The int->float are showing converts to mask and back. We probably need to widen inputs to sint_to_fp/uint_to_fp before type legalization.

llvm-svn: 324805
2018-02-10 08:06:52 +00:00
Craig Topper d34af6f636 [X86] Teach combineExtSetcc to handle ZERO_EXTEND by widening the setcc and then masking. A later DAG combine will convert to a shift.
This helps to avoid a constant pool load needed to zero extend from the mask.

llvm-svn: 324804
2018-02-10 08:06:49 +00:00
Nirav Dave c8c9d4fe35 [DAG] Make early exit hasPredecessorHelper return true. NFCI.
All uses conservatively assume in early exit case that it will be a
predecessor. Changing default removes checking code in all uses.

llvm-svn: 324797
2018-02-10 02:41:22 +00:00
Craig Topper fa6113b3d7 [X86] Teach combineInsertSubvector how to combine some k-register insert_subvectors and extract_subvector sequences to remove extra zeroing.wq
llvm-svn: 324791
2018-02-10 01:00:41 +00:00
Daniel Neilson f4fa26f5d8 [Hexagon] Update uses of deprecated IRBuilder CreateMemCpy/Move calls
Summary:
This change is part of step five in the series of changes to remove alignment argument from
memcpy/memmove/memset in favour of alignment attributes. In particular, this changes the
Hexagon LoopIdiom pass to cease using the old IRBuilder createMemCpy/createMemMove
single-alignment APIs in favour of the new API that allows setting source and
destination alignments independently.

Steps:
Step 1) Remove alignment parameter and create alignment parameter attributes for
memcpy/memmove/memset. ( rL322965, rC322964, rL322963 )
Step 2) Expand the IRBuilder API to allow creation of memcpy/memmove with differing
source and dest alignments. ( rL323597 )
Step 3) Update Clang to use the new IRBuilder API. ( rC323617 )
Step 4) Update Polly to use the new IRBuilder API. ( rL323618 )
Step 5) Update LLVM passes that create memcpy/memmove calls to use the new IRBuilder API,
and those that use use MemIntrinsicInst::[get|set]Alignment() to use [get|set]DestAlignment()
and [get|set]SourceAlignment() instead. ( rL323886, rL323891, rL324148, rL324273, rL324278,
rL324384, rL324395, rL324402, rL324626, rL324642, rL324653, rL324654, rL324773, rL324774,
rL324781 )
Step 6) Remove the single-alignment IRBuilder API for memcpy/memmove, and the
MemIntrinsicInst::[get|set]Alignment() methods.

Reference
   http://lists.llvm.org/pipermail/llvm-dev/2015-August/089384.html
   http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20151109/312083.html

llvm-svn: 324784
2018-02-09 23:33:35 +00:00
Craig Topper 99db883d55 [X86] Teach lower1BitVectorShuffle to recognize shuffles that are just filling upper elements with zero. Replace with insert_subvector.
There's still some extra kshifts in one of the modified test cases here, but hopefully that's only a DAG combine away.

llvm-svn: 324782
2018-02-09 23:32:27 +00:00
Daniel Neilson 7512c3e15f [ARMFastISel] Replace deprecated calls to MemoryIntrinsic::getAlignment() (NFCI)
This change is part of step five in the series of changes to remove alignment argument from
memcpy/memmove/memset in favour of alignment attributes. In particular, this changes
ARMFastISel to cease using the old getAlignment() API of MemoryIntrinsic in favour of getting
source & dest specific alignments through the new API.

Steps:
Step 1) Remove alignment parameter and create alignment parameter attributes for
memcpy/memmove/memset. ( rL322965, rC322964, rL322963 )
Step 2) Expand the IRBuilder API to allow creation of memcpy/memmove with differing
source and dest alignments. ( rL323597 )
Step 3) Update Clang to use the new IRBuilder API. ( rC323617 )
Step 4) Update Polly to use the new IRBuilder API. ( rL323618 )
Step 5) Update LLVM passes that create memcpy/memmove calls to use the new IRBuilder API,
and those that use use MemIntrinsicInst::[get|set]Alignment() to use [get|set]DestAlignment()
and [get|set]SourceAlignment() instead. ( rL323886, rL323891, rL324148, rL324273, rL324278,
rL324384, rL324395, rL324402, rL324626, rL324642, rL324653, rL324654, rL324773, rL324774 )
Step 6) Remove the single-alignment IRBuilder API for memcpy/memmove, and the
MemIntrinsicInst::[get|set]Alignment() methods.

Reference

http://lists.llvm.org/pipermail/llvm-dev/2015-August/089384.html
http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20151109/312083.html

llvm-svn: 324781
2018-02-09 23:31:37 +00:00
Dan Gohman db1916a646 [WebAssembly] Add mechanisms for specifying an explicit import module name.
This adds a wasm-import-module function attribute and a .import_module
assembler directive, for specifying module import names for WebAssembly.
Currently these may only be used for function symbols; global variables
may be considered in the future.

WebAssembly has a two-level namespace scheme for symbols, and it's
normally the linker's job to assign the module name, which is the
first-level name. The attributes here allow users to specify their
own module names explicitly, which is useful for tools generating
bindings to modules defined in other languages.

This feature is not fully usable yet. It will evolve along with the
ongoing symbol table and lld changes.

Differential Revision: https://reviews.llvm.org/D42520

llvm-svn: 324778
2018-02-09 23:13:22 +00:00
Dan Gohman 861bec2b7c [WebAssembly] Add an LLVM_FALLTHROUGH to address a warning. NFC.
llvm-svn: 324777
2018-02-09 22:59:01 +00:00
Daniel Neilson a60f4621ae [AMDGPUPromoteAlloca] Replace deprecated memory intrinsic APIs (NFCI)
Summary:
This change is part of step five in the series of changes to remove alignment argument from
memcpy/memmove/memset in favour of alignment attributes. In particular, this changes the
AMDGPUPromoteAlloca pass to cease using:
1) The old getAlignment() API of MemoryIntrinsic in favour of getting source & dest specific
alignments through the new API.
2) The old IRBuilder createMemCpy/createMemMove single-alignment APIs in favour of the new
API that allows setting source and destination alignments independently.

Steps:
Step 1) Remove alignment parameter and create alignment parameter attributes for
memcpy/memmove/memset. ( rL322965, rC322964, rL322963 )
Step 2) Expand the IRBuilder API to allow creation of memcpy/memmove with differing
source and dest alignments. ( rL323597 )
Step 3) Update Clang to use the new IRBuilder API. ( rC323617 )
Step 4) Update Polly to use the new IRBuilder API. ( rL323618 )
Step 5) Update LLVM passes that create memcpy/memmove calls to use the new IRBuilder API,
and those that use use MemIntrinsicInst::[get|set]Alignment() to use [get|set]DestAlignment()
and [get|set]SourceAlignment() instead. ( rL323886, r323891, rL324148, rL324273, rL324278,
rL324384, rL324395, rL324402, rL324626, rL324642, rL324653, rL324654, rL324773 )
Step 6) Remove the single-alignment IRBuilder API for memcpy/memmove, and the
MemIntrinsicInst::[get|set]Alignment() methods.

Reference
   http://lists.llvm.org/pipermail/llvm-dev/2015-August/089384.html
   http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20151109/312083.html

llvm-svn: 324774
2018-02-09 21:56:15 +00:00
Daniel Neilson 4a58b4b52c [AArch64FastISel] Replace deprecated calls to MemoryIntrinsic::getAlignment() (NFCI)
Summary:
This change is part of step five in the series of changes to remove alignment argument from
memcpy/memmove/memset in favour of alignment attributes. In particular, this changes
AArch64FastISel to cease using the old getAlignment() API of MemoryIntrinsic in favour of getting
source & dest specific alignments through the new API.

Steps:
Step 1) Remove alignment parameter and create alignment parameter attributes for
memcpy/memmove/memset. ( rL322965, rC322964, rL322963 )
Step 2) Expand the IRBuilder API to allow creation of memcpy/memmove with differing
source and dest alignments. ( rL323597 )
Step 3) Update Clang to use the new IRBuilder API. ( rC323617 )
Step 4) Update Polly to use the new IRBuilder API. ( rL323618 )
Step 5) Update LLVM passes that create memcpy/memmove calls to use the new IRBuilder API,
and those that use use MemIntrinsicInst::[get|set]Alignment() to use [get|set]DestAlignment()
and [get|set]SourceAlignment() instead. ( rL323886, r323891, rL324148, rL324273, rL324278,
rL324384, rL324395, rL324402, rL324626, rL324642, rL324653, rL324654 )
Step 6) Remove the single-alignment IRBuilder API for memcpy/memmove, and the
MemIntrinsicInst::[get|set]Alignment() methods.

Reference
   http://lists.llvm.org/pipermail/llvm-dev/2015-August/089384.html
   http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20151109/312083.html

llvm-svn: 324773
2018-02-09 21:49:29 +00:00
Francis Visoiu Mistrih e67ed4c039 [X86][MC] Fix assembling rip-relative addressing + immediate displacements
In the rare case where the input contains rip-relative addressing with
immediate displacements, *and* the instruction ends with an immediate,
we encode the instruction in the wrong way:

movl $12345678, 0x400(%rdi) // all good, no rip-relative addr
movl %eax, 0x400(%rip) // all good, no immediate at the end of the instruction
movl $12345678, 0x400(%rip) // fails, encodes address as 0x3fc(%rip)

Offset is a label:

movl $12345678, foo(%rip)

we want to account for the size of the immediate (in this case,
$12345678, 4 bytes).

Offset is an immediate:

movl $12345678, 0x400(%rip)

we should not account for the size of the immediate, assuming the
immediate offset is what the user wanted.

Differential Revision: https://reviews.llvm.org/D43050

llvm-svn: 324772
2018-02-09 21:47:07 +00:00
Evandro Menezes 205c0e085e [AArch64] Adjust the cost model for Exynos M3
Fix the modeling of transfers between a generic register and a partial ASIMD
one.

llvm-svn: 324766
2018-02-09 19:26:11 +00:00
Krzysztof Parzyszek 9b48e8d233 [Hexagon] Add code to select QTRUE and QFALSE
Fixes http://llvm.org/PR36320.

llvm-svn: 324763
2018-02-09 19:10:46 +00:00
Matt Arsenault 0063ce7201 AMDGPU: Remove tied operand from si_else
llvm-svn: 324751
2018-02-09 17:18:38 +00:00
Matt Arsenault 923712b6b5 Reapply "AMDGPU: Add 32-bit constant address space"
This reverts r324494 and reapplies r324487.

llvm-svn: 324747
2018-02-09 16:57:57 +00:00
Matt Arsenault bcf7bec4b8 AMDGPU: Fix layering issue
Move utility function that depends on codegen.
Fixes build with r324487 reapplied.

llvm-svn: 324746
2018-02-09 16:57:48 +00:00
Evandro Menezes b5f12090fc [AArch64] Refactor stand alone methods (NFC)
Make stand alone methods in AArch64InstrInfo static.

llvm-svn: 324745
2018-02-09 16:14:41 +00:00
Krzysztof Parzyszek 7cfe7cbccc [Hexagon] Express calling conventions via .td file instead of hand-coding
Additionally, simplify the rest of the argument/parameter lowering code.

llvm-svn: 324737
2018-02-09 15:30:02 +00:00
Jonas Paulsson 7850601fa3 [AArch64] Return true in enableMultipleCopyHints().
Enable multiple COPY hints to eliminate more COPYs during register allocation.

Note that this is something all targets should do, see
https://reviews.llvm.org/D38128.

Review: Martin Storsjö
llvm-svn: 324720
2018-02-09 09:22:20 +00:00
Stanislav Mekhanoshin 9c6cd0458b [AMDGPU] More descriptive names in the memory legalizer
NFC.

Differential Revision: https://reviews.llvm.org/D43054

llvm-svn: 324712
2018-02-09 06:05:33 +00:00
Craig Topper ca5841b4e4 [X86] Simplify some code in lowerV4X128VectorShuffle and lowerV2X128VectorShuffle
Previously we extracted two subvectors and concatenate. But the concatenate will be lowered to two insert subvectors. Then DAG combine will merge once of the inserts and one of the extracts back into the original vector. We might as well just directly use one extract and one insert.

llvm-svn: 324710
2018-02-09 05:54:36 +00:00
Craig Topper 28166a877d [X86] Teach shuffle lowering to recognize 128/256 bit insertions into a zero vector.
This regresses a couple cases in the shuffle combining test. But those cases use intrinsics that InstCombine knows how to turn into a generic shuffle earlier. This should give opportunities to fold this earlier in InstCombine or DAG combine.

llvm-svn: 324709
2018-02-09 05:54:34 +00:00
Francis Visoiu Mistrih fb7b14f70d [CodeGen] Unify the syntax of MBB liveins in MIR and -debug output
Instead of:

Live Ins: %r0 %r1

print:

liveins: %r0, %r1
llvm-svn: 324694
2018-02-09 01:14:44 +00:00
Francis Visoiu Mistrih d65438d0ca [CodeGen] Move printing '\n' from MachineInstr::print to MachineBasicBlock::print
MBB.print wasn't printing it, but the MIRPrinter is printing it. The
goal is to unify that as much as possible.

llvm-svn: 324681
2018-02-08 23:42:27 +00:00
Jacques Pienaar bd275c7ba7 [Lanai] Code model dictates section selection.
Always use the small section when the small code model is specified.

llvm-svn: 324679
2018-02-08 23:25:05 +00:00
Matt Arsenault 9c2f3c4852 AMDGPU: Process SDWA block at a time
Right now this loops over the entire function every time there
is a change, which is not very efficient. There's no practical
reason to track this so globally, since the code motion optimization
passes should be sinking instructions with single uses and
the pass currently will not fold with multiple uses.

llvm-svn: 324667
2018-02-08 22:46:41 +00:00
Matt Arsenault c24d5e2819 AMDGPU: Minor cleanups
Column limit, typo, unnecessary reference

llvm-svn: 324666
2018-02-08 22:46:38 +00:00
Alexander Ivchenko da9e81c462 [GlobalISel][X86] Fixing failures after https://reviews.llvm.org/D37775
The patch essentially makes sure that X86CallLowering adds proper
G_COPY/G_TRUNC and G_ANYEXT/G_COPY when we are doing lowering of
arguments/returns for floating point values passed on registers.

Tests are updated accordingly

Reviewed By: qcolombet

Differential Revision: https://reviews.llvm.org/D42287

llvm-svn: 324665
2018-02-08 22:41:47 +00:00
Alexander Ivchenko a85c4fc029 [GlobalIsel][X86] Making {G_IMPLICIT_DEF, s128} legal
The patch is a split from D42287 and is related to
fixing failures after https://reviews.llvm.org/D37775

Reviewed By: qcolombet

Differential Revision: https://reviews.llvm.org/D42287

llvm-svn: 324664
2018-02-08 22:40:31 +00:00
Craig Topper 9e030c9e00 [X86] Improve combineCastedMaskArithmetic to fold (bitcast (vXi1 (and/or/xor X, C)))->(vXi1 (and/or/xor (bitcast X), (bitcast C)) where C is a constant build_vector.
Most vxi1 constant build vectors have to be implemented in the scalar domain anyway so we'll probably end up with a cast there later. But by then its too late to do the combine to get rid of it.

llvm-svn: 324662
2018-02-08 22:26:39 +00:00
Craig Topper 1b5b4ccb77 [X86] Add DAG combine to constant fold a bitcast of a vXi1 constant build_vector into a scalar integer.
llvm-svn: 324661
2018-02-08 22:26:36 +00:00
Craig Topper dccf72b583 [X86] Remove kortest intrinsics and replace with native IR.
llvm-svn: 324646
2018-02-08 20:16:06 +00:00
David Woodhouse 76eb26aa92 [X86] Support 'V' register operand modifier
This allows the register name to be printed without the leading '%'.
This can be used for emitting calls to the retpoline thunks from inline
asm.

llvm-svn: 324645
2018-02-08 20:06:05 +00:00
Oliver Stannard 133b6085e8 [ARM] Re-commit r324600 with fixed LLVMBuild.txt
ARMDisassembler now depends on the banked register tables in ARMUtils, so the
LLVMBuild.txt needed updating to reflect this.

Original commit mesage:

[ARM] Fix disassembly of invalid banked register moves

When disassembling banked register move instructions, we don't have an
assembly syntax for the unallocated register numbers, so we have to
return Fail rather than SoftFail. Previously we were returning SoftFail,
then crashing in the InstPrinter as we have no way to represent these
encodings in an assembly string.

This also switches the decoder to use the table-generated list of banked
registers, removing the duplicated list of encodings.

Differential revision: https://reviews.llvm.org/D43066

llvm-svn: 324606
2018-02-08 14:31:22 +00:00
Oliver Stannard 3c11ecbbab Revert r324600 as it breaks a buildbot
The broken bot (clang-ppc64le-linux-multistage) is doign a shared-object build,
so I guess using lookupBankedRegByEncoding in the disassembler is a layering
violation?

llvm-svn: 324604
2018-02-08 14:21:28 +00:00
Oliver Stannard db982b25ff [ARM] Fix disassembly of invalid banked register moves
When disassembling banked register move instructions, we don't have an
assembly syntax for the unallocated register numbers, so we have to
return Fail rather than SoftFail. Previously we were returning SoftFail,
then crashing in the InstPrinter as we have no way to represent these
encodings in an assembly string.

This also switches the decoder to use the table-generated list of banked
registers, removing the duplicated list of encodings.

Differential revision: https://reviews.llvm.org/D43066

llvm-svn: 324600
2018-02-08 13:06:08 +00:00
Clement Courbet 1b8c08b633 [X86] Fix compilation of r324580.
@ctopper Can you check that the fix is correct ?

llvm-svn: 324586
2018-02-08 09:41:50 +00:00
Stefan Maksimovic 8989940557 Revert accidental changes that snuck in r324584
llvm-svn: 324585
2018-02-08 09:31:48 +00:00
Stefan Maksimovic b3e7ed3b94 [mips] Define certain instructions in microMIPS32r3
Instructions affected:
mthc1, mfhc1, add.d, sub.d, mul.d, div.d,
mov.d, neg.d, cvt.w.d, cvt.d.s, cvt.d.w, cvt.s.d

These instructions are now defined for
microMIPS32r3 + microMIPS32r6 in MicroMipsInstrFPU.td
since they shared their encoding with those already defined
in microMIPS32r6InstrInfo.td and have been therefore
removed from the latter file.

Some instructions present in MicroMipsInstrFPU.td which
did not have both AFGR64 and FGR64 variants defined have
been altered to do so.

Differential revision: https://reviews.llvm.org/D42738

llvm-svn: 324584
2018-02-08 09:25:17 +00:00
Sjoerd Meijer 5ea465ded7 [AArch64] Don't materialize 0 with "fmov h0, .." when FullFP16 is not supported
We were generating "fmov h0, wzr" instructions when FullFP16 is not enabled.
I've not added any tests, because the problem was visible in:
test/CodeGen/AArch64/arm64-zero-cycle-zeroing.ll,
which I had to change: I don't think Cyclone has FullFP16 enabled
by default, so it shouldn't be using this v8.2a instruction.

I've also removed these rdar tags, please shout if there are any objections.

Differential Revision: https://reviews.llvm.org/D43020

llvm-svn: 324581
2018-02-08 08:39:05 +00:00
Craig Topper 8d0c8c9be1 [X86] Support folding in a k-register OR when creating KORTEST from scalar compare of a bitcast from vXi1.
This should allow us to remove the kortest intrinsic from IR and use compare+bitcast+or in IR instead.

llvm-svn: 324580
2018-02-08 08:29:43 +00:00
Craig Topper 93505707b6 [X86] Allow KORTEST instruction to be used for testing if a mask is all ones
The KTEST instruction sets the C flag if the result of anding both operands together is all 1s. We can use this to lower (icmp eq/ne (bitcast (vXi1 X), -1)

Differential Revision: https://reviews.llvm.org/D42772

llvm-svn: 324577
2018-02-08 07:54:16 +00:00
Craig Topper f5465f98d2 [X86] Don't emit KTEST instructions unless only the Z flag is being used
Summary:
KTEST has weird flag behavior. The Z flag is set for all bits in the AND of the k-registers being 0, and the C flag is set for all bits being 1. All other flags are cleared.

We currently emit this instruction in EmitTEST and don't check the condition code. This can lead to strange things like using the S flag after a KTEST for a signed compare.

The domain reassignment pass can also transform TEST instructions into KTEST and is not protected against the flag usage either. For now I've disabled this part of the domain reassignment pass. I tried to comment out the checks in the mir test so that we could recover them later, but I couldn't figure out how to get that to work.

This patch moves the KTEST handling into LowerSETCC and now creates a ktest+x86setcc. I've chosen this approach because I'd like to add support for the C flag for all ones in a followup patch. To do that requires that I can rewrite the condition code going in the x86setcc to be different than the original SETCC condition code.

This fixes PR36182. I'll file a PR to fix domain reassignment once this goes in. Should this be merged to 6.0?

Reviewers: spatel, guyblank, RKSimon, zvi

Reviewed By: guyblank

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D42770

llvm-svn: 324576
2018-02-08 07:45:55 +00:00
Peter Collingbourne 559ff1fe03 ARM: Remove dead code. NFCI.
llvm-svn: 324565
2018-02-08 05:28:39 +00:00
Yonghong Song f2075aef68 bpf: Improve expanding logic in LowerSELECT_CC
LowerSELECT_CC is not generating optimal Select_Ri pattern at the moment. It
is not guaranteed to place ConstantNode at RHS which would miss matching
Select_Ri.

A new testcase added into the existing select_ri.ll, also there is an
existing case in cmp.ll which would be improved to use Select_Ri after this
patch, it is adjusted accordingly.

Reported-by: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Reviewed-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
llvm-svn: 324560
2018-02-08 04:37:49 +00:00
Matt Arsenault b02cebf552 AMDGPU: Fix incorrect reordering when inline asm defines LDS address
Defs of operands outside of the instruction's explicit defs need
to be checked.

llvm-svn: 324554
2018-02-08 01:56:14 +00:00
Matt Arsenault c908e3f77a AMDGPU: Don't crash when trying to fold implicit operands
llvm-svn: 324550
2018-02-08 01:12:46 +00:00
Justin Lebar 321b443ef6 [NVPTX] When dying due to a bad address space value, print out the value.
llvm-svn: 324549
2018-02-08 00:50:04 +00:00
Stanislav Mekhanoshin db39b4b0b4 [AMDGPU] Fixed wait count reuse
The code reusing existing wait counts is incorrect since it keeps
adding new operands to an old instruction instead of replacing
the immediate. It was also effectively switched off by the condition
that wait count is not an AMDGPU::S_WAITCNT.

Also switched to BuildMI instead of creating instructions directly.

Differential Revision: https://reviews.llvm.org/D42997

llvm-svn: 324547
2018-02-08 00:18:35 +00:00
Chandler Carruth 0be0cfa65b [x86] Fix nasty bug in the x86 backend that is essentially impossible to
hit from IR but creates a minefield for MI passes.

The x86 backend has fairly powerful logic to try and fold loads that
feed register operands to instructions into a memory operand on the
instruction. This is almost always a good thing, but there are specific
relocated loads that are only allowed to appear in specific
instructions. Notably, R_X86_64_GOTTPOFF is only allowed in `movq` and
`addq`. This patch blocks folding of memory operands using this
relocation unless the target is in fact `addq`.

The particular relocation indicates why we simply don't hit this under
normal circumstances. This relocation is only used for TLS, and it gets
used in very specific ways in conjunction with %fs-relative addressing.
The result is that loads using this relocation are essentially never
eligible for folding into an instruction's memory operands. Unless, of
course, you have an MI pass that inserts usage of such a load. I have
exactly such an MI pass and was greeted by truly mysterious miscompiles
where the linker replaced my instruction with a completely garbage byte
sequence. Go team.

This is the only such relocation I'm aware of in x86, but there may be
others that need to be similarly restricted.

Fixes PR36165.

Differential Revision: https://reviews.llvm.org/D42732

llvm-svn: 324546
2018-02-07 23:59:14 +00:00
Craig Topper 37765ff326 [X86] Prune some unreachable 'return SDValue()' paths from LowerSIGN_EXTEND/LowerZERO_EXTEND/LowerANY_EXTEND.
We were doing a lot of whitelisting of what we handle in these routines, but setOperationAction constrains what we can get here. So just add some asserts and prune the unreachable paths.

llvm-svn: 324538
2018-02-07 22:45:38 +00:00
Craig Topper 1db5ebc016 [X86] Remove dead code from EmitTest that looked for an i1 type which should have already been type legalized away. NFC
llvm-svn: 324536
2018-02-07 22:19:26 +00:00
Craig Topper 8baa9c77e3 [X86] When doing callee save/restore for k-registers make sure we don't use KMOVQ on non-BWI targets
If we are saving/restoring k-registers, the default behavior of getMinimalRegisterClass will find the VK64 class with a spill size of 64 bits. This will cause the KMOVQ opcode to be used for save/restore. If we don't have have BWI instructions we need to constrain the class returned to give us VK16 with a 16-bit spill size. We can do this by passing the either v16i1 or v64i1 into getMinimalRegisterClass.

Also add asserts to make sure BWI is enabled anytime we use KMOVD/KMOVQ. These are what caught this bug.

Fixes PR36256

Differential Revision: https://reviews.llvm.org/D42989

llvm-svn: 324533
2018-02-07 21:41:50 +00:00
Rafael Espindola f4e3f3e31c Revert "AMDGPU: Add 32-bit constant address space"
This reverts commit r324487.

It broke clang tests.

llvm-svn: 324494
2018-02-07 18:09:35 +00:00
Marek Olsak 871c30e540 AMDGPU: Add 32-bit constant address space
Note: This is a candidate for LLVM 6.0, because it was planned to be
      in that release but was delayed due to a long review period.

Merge conflict in release_60 - resolution:
    Add "-p6:32:32" into the second (non-amdgiz) string.

Only scalar loads support 32-bit pointers. An address in a VGPR will
fail to compile. That's OK because the results of loads will only be used
in places where VGPRs are forbidden.

Updated AMDGPUAliasAnalysis and used SReg_64_XEXEC.
The tests cover all uses cases we need for Mesa.

Reviewers: arsenm, nhaehnle

Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D41651

llvm-svn: 324487
2018-02-07 16:01:00 +00:00
Marek Olsak b2cc77985b AMDGPU: Remove the s_buffer workaround for GFX9 chips
Summary:
I checked the AMD closed source compiler and the workaround is only
needed when x3 is emulated as x4, which we don't do in LLVM.

SMEM x3 opcodes don't exist, and instead there is a possibility to use x4
with the last component being unused. If the last component is out of
buffer bounds and falls on the next 4K page, the hw hangs.

Reviewers: arsenm, nhaehnle

Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, llvm-commits, t-tye

Differential Revision: https://reviews.llvm.org/D42756

llvm-svn: 324486
2018-02-07 16:00:40 +00:00
Simon Pilgrim b4e789e8f6 [X86][AVX] Add PACKSSDW/PACKUSDW support for truncation of clamped values
SSE and shorter vector sizes will have to wait until we can add support for general SMIN/SMAX matching.

llvm-svn: 324485
2018-02-07 15:48:44 +00:00
Simon Atanasyan 70498f81de [mips] Support 'y' operand code to print exact log2 of the operand
llvm-svn: 324477
2018-02-07 12:36:39 +00:00
Simon Atanasyan 737bec38d0 [mips] Handle 'M' and 'L' operand codes for memory operands
Both operand codes now work the same way in case of register or memory
operands. It print high-order or low-order word in a double-word
register or memory location.

llvm-svn: 324476
2018-02-07 12:36:33 +00:00
Sjoerd Meijer 8c0739347c [ARM] FP16 mov imm pattern
This is a follow up of r324321, adding a match pattern for mov with a FP16
immediate (also fixing operand vfp_f16imm that wasn't even compiling).

Differential Revision: https://reviews.llvm.org/D42973

llvm-svn: 324456
2018-02-07 08:37:17 +00:00
Chandler Carruth 282ae1632a [x86/retpoline] Make the external thunk names exactly match the names
that happened to end up in GCC.

This is really unfortunate, as the names don't have much rhyme or reason
to them. Originally in the discussions it seemed fine to rely on aliases
to map different names to whatever external thunk code developers wished
to use but there are practical problems with that in the kernel it turns
out. And since we're discovering this practical problems late and since
GCC has already shipped a release with one set of names, we are forced,
yet again, to blindly match what is there.

Somewhat rushing this patch out for the Linux kernel folks to test and
so we can get it patched into our releases.

Differential Revision: https://reviews.llvm.org/D42998

llvm-svn: 324449
2018-02-07 06:16:24 +00:00
Tom Stellard 33445765dd AMDGPU/GlobalISel: Mark 32-bit G_FPTOUI as legal
Reviewers: arsenm

Reviewed By: arsenm

Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, rovka, kristof.beyls, dstuttard, tpr, llvm-commits, t-tye

Differential Revision: https://reviews.llvm.org/D42152

llvm-svn: 324446
2018-02-07 04:47:59 +00:00
Mark Searles 24c92eeb83 [AMDGPU] Suppress redundant waitcnt instrs.
1. Run the memory legalizer prior to the waitcnt pass; keep the policy that the waitcnt pass does not remove any waitcnts within the incoming IR.

2. The waitcnt pass doesn't (yet) track waitcnts that exist prior to the waitcnt pass (it just skips over them); because the waitcnt pass is ignorant of them, it may insert a redundant waitcnt. To avoid this, check the prev instr. If it and the to-be-inserted waitcnt are the same, suppress the insertion. We keep the existing waitcnt under the assumption that whomever, e.g., the memory legalizer, inserted it knows what they were doing.

3. Follow-on work: teach the waitcnt pass to record the pre-existing waitcnts for better waitcnt production.

Differential Revision: https://reviews.llvm.org/D42854

llvm-svn: 324440
2018-02-07 02:21:21 +00:00
Matt Arsenault a18b3bcf51 AMDGPU: Select BFI patterns with 64-bit ints
llvm-svn: 324431
2018-02-07 00:21:34 +00:00
Craig Topper 58ecffd857 [DAGCombiner][AMDGPU][X86] Turn cttz/ctlz into cttz_zero_undef/ctlz_zero_undef if we can prove the input is never zero
X86 currently has a late DAG combine after cttz/ctlz are turned into BSR+BSF+CMOV to detect this and remove the CMOV. But we should be able to do this much earlier and avoid creating the cmov all together.

For the changed AMDGPU test case it appears that previously the i8 cttz was type legalized to i16 which introduced an OR with 256 in order to limit the result to 8 on the widened type. At this point the result is known to never be zero, but nothing checked that. Then operation legalization is told to promote all i16 cttz to i32. This introduces an extend and a truncate and another OR with 65536 to limit the result to 16. With the DAG combiner change we are able to prevent the creation of the second OR since the opcode will have been changed to cttz_zero_undef after the first OR. I the lack of the OR caused the instruction to change to v_ffbl_b32_sdwa

Differential Revision: https://reviews.llvm.org/D42985

llvm-svn: 324427
2018-02-06 23:54:37 +00:00
Eli Friedman cd07a3e2f9 Place undefined globals in .bss instead of .data
Following up on the discussion from
http://lists.llvm.org/pipermail/llvm-dev/2017-April/112305.html, undef
values are now placed in the .bss as well as null values. This prevents
undef global values taking up potentially huge amounts of space in the
.data section.

The following two lines now both generate equivalent .bss data:

@vals1 = internal unnamed_addr global [20000000 x i32] zeroinitializer, align 4
@vals2 = internal unnamed_addr global [20000000 x i32] undef, align 4 ; previously unaccounted for

This is primarily motivated by the corresponding issue in the Rust
compiler (https://github.com/rust-lang/rust/issues/41315).

Differential Revision: https://reviews.llvm.org/D41705

Patch by varkor!

llvm-svn: 324424
2018-02-06 23:22:14 +00:00
Evandro Menezes cb7959fd78 [AArch64] Adjust the cost model for Exynos M3
Fix the modeling of long division and SIMD conversion from integer and
horizontal minimum and maximum.

llvm-svn: 324417
2018-02-06 22:35:47 +00:00
Krzysztof Parzyszek 8abaf8954a [Hexagon] Extract HVX lowering and selection into HVX-specific files, NFC
llvm-svn: 324392
2018-02-06 20:22:20 +00:00
Krzysztof Parzyszek 97a5095db6 [Hexagon] Lower concat of more than 2 vectors into build_vector
llvm-svn: 324391
2018-02-06 20:18:58 +00:00
Stanislav Mekhanoshin ce2d428a98 [AMDGPU] removed dead code handling rmw in memory legalizer
It was always using cmpxchg path and in rmw and cmpxchg instructions
are not distinguishable in the BE.

Differential Revision: https://reviews.llvm.org/D42976

llvm-svn: 324383
2018-02-06 19:11:56 +00:00
Krzysztof Parzyszek be253e797b [Hexagon] Don't form new-value jumps from floating-point instructions
Additionally, verify that the register defined by the producer is a
32-bit register.

llvm-svn: 324381
2018-02-06 19:08:41 +00:00
Sjoerd Meijer d2718ba95e [ARM] f16 conversions
This is a follow up of r324321, adding f16 <-> f32 and f16 <-> f64 conversion
match patterns.

Differential Revision: https://reviews.llvm.org/D42954

llvm-svn: 324360
2018-02-06 16:28:43 +00:00
Nirav Dave 27721e8617 [DAG, X86] Improve Dependency analysis when doing multi-node
Instruction Selection

Cleanup cycle/validity checks in ISel (IsLegalToFold,
HandleMergeInputChains) and X86 (isFusableLoadOpStore). Now do a full
search for cycles / dependencies pruning the search when topological
property of NodeId allows.

As part of this propogate the NodeId-based cutoffs to narrow
hasPreprocessorHelper searches.

Reviewers: craig.topper, bogner

Subscribers: llvm-commits, hiraditya

Differential Revision: https://reviews.llvm.org/D41293

llvm-svn: 324359
2018-02-06 16:14:29 +00:00
Marek Olsak 7d92b7e23a AMDGPU: Fix S_BUFFER_LOAD_DWORD_SGPR moveToVALU
Author: Bas Nieuwenhuizen

https://reviews.llvm.org/D42881

llvm-svn: 324353
2018-02-06 15:17:55 +00:00
Krzysztof Parzyszek 1d52a850b3 [Hexagon] Remove leftover assert
llvm-svn: 324352
2018-02-06 15:15:13 +00:00
Krzysztof Parzyszek 88f11003a0 [Hexagon] Split HVX operations on vector pairs
Vector pairs are legal types, but not every operation can work on pairs.
For those operations that are legal for single vectors, generate a concat
of their results on pair halves.

llvm-svn: 324350
2018-02-06 14:24:57 +00:00
Krzysztof Parzyszek 7b52cf1d7f [Hexagon] Add helper functions to identify single/pair vector types, NFC
llvm-svn: 324349
2018-02-06 14:21:31 +00:00
Krzysztof Parzyszek 69f1d7e370 [Hexagon] Handle lowering of SETCC via setCondCodeAction
It was expanded directly into instructions earlier. That was to avoid
loads from a constant pool for a vector negation: "xor x, splat(i1 -1)".
Implement ISD opcodes QTRUE and QFALSE to denote logical vectors of
all true and all false values, and handle setcc with negations through
selection patterns.

llvm-svn: 324348
2018-02-06 14:16:52 +00:00
Simon Pilgrim ae00a71f55 [X86][SSE] Add PACKUS support for truncation of clamped values
Followup to D42544 that matches PACKUSWB cases for non-AVX512, SSE and PACKUSDW cases will have to wait until we can add support for general SMIN/SMAX matching.

llvm-svn: 324347
2018-02-06 14:07:46 +00:00
Tim Renouf 807ecc3d66 [AMDGPU] do not generate .AMDGPU.config for amdpal os type
Summary:
Now we generate PAL metadata for the amdpal os type, there is no need to
generate the .AMDGPU.config section.

Reviewers: arsenm, nhaehnle, dstuttard

Subscribers: kzhuravl, wdng, yaxunl, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D37760

Change-Id: I303c5fad66656ce97293da60621afac6595b4c18
llvm-svn: 324346
2018-02-06 13:39:38 +00:00
Sander de Smalen 81fcf865be [AArch64][SVE] Asm: Add AND_ZI instructions and aliases
Summary: Adds support for the SVE AND instruction with vector and logical-immediate operands, and their corresponding aliases.

Reviewers: fhahn, rengolin, samparker, echristo, aadg, kristof.beyls

Reviewed By: fhahn

Subscribers: aemerson, javed.absar, tschuett, llvm-commits

Differential Revision: https://reviews.llvm.org/D42295

llvm-svn: 324343
2018-02-06 13:13:21 +00:00
Simon Pilgrim 90a237bf83 [X86][SSE] Add PACKSS support for truncation of clamped values
Followup to D42544 that matches PACKSSWB cases for non-AVX512, SSE and PACKSSDW cases will have to wait until we can add support for general SMIN/SMAX matching.

llvm-svn: 324339
2018-02-06 12:16:10 +00:00
Hiroshi Inoue ad48d2fe61 [PowerPC] fix up in rL324229, NFC
This patch fixes up my previous commit (add initialization of local variables).

llvm-svn: 324336
2018-02-06 11:34:16 +00:00
Oliver Stannard 6df8f43c4d [AArch64] Fix spelling of ICH_ELRSR_EL2 system register
This register was mis-spelled as ICH_ELSR_EL2, but has the correct encoding for
ICH_ELRSR_EL2.

llvm-svn: 324325
2018-02-06 09:39:04 +00:00
Oliver Stannard ee0ac39305 [ARM][AArch64] Add CSDB speculation barrier instruction
This adds the CSDB instruction, which is a new barrier instruction
described by the whitepaper at [1].

This is in encoding space which was previously executed as a NOP, so it is
available for all targets that have the relevant NOP encoding space. This
matches the binutils behaviour for these instructions [2][3].

[1] https://developer.arm.com/support/security-update
[2] https://sourceware.org/ml/binutils/2018-01/msg00116.html
[3] https://sourceware.org/ml/binutils/2018-01/msg00120.html

llvm-svn: 324324
2018-02-06 09:24:47 +00:00
Sjoerd Meijer 89ea2648bb [ARM] Armv8.2-A FP16 code generation (part 3/3)
This adds most of the FP16 codegen support, but these areas need further work:

- FP16 literals and immediates are not properly supported yet (e.g. literal
  pool needs work),
- Instructions that are generated from intrinsics (e.g. vabs) haven't been
  added.

This will be addressed in follow-up patches.

Differential Revision: https://reviews.llvm.org/D42849

llvm-svn: 324321
2018-02-06 08:43:56 +00:00
Konstantin Zhuravlyov 8818d13ed2 AMDGPU/MemoryModel: Fix monotonic atomic loads
Those should have glc bit set for system and agent synchronization scopes

llvm-svn: 324314
2018-02-06 04:06:04 +00:00
Ahmed Charles 646ab87bb4 [RISCV] Add support for %pcrel_lo.
llvm-svn: 324303
2018-02-06 00:55:23 +00:00
Reid Kleckner 697d1bc236 Revert "Don't assume a null GV is local for ELF and MachO."
This reverts r323297.

It breaks building grub.

llvm-svn: 324301
2018-02-06 00:47:14 +00:00
Craig Topper 9c6c7c5e9b [X86] Relax restrictions on what setcc condition codes can be folded with a sext when AVX512 is enabled.
We now allow all signed comparisons and not equal. The complement that needs to be added for this is no worse than the extend. And the vector output forms of pcmpeq/pcmpgt have better latency than the k-register version on SKX.

llvm-svn: 324294
2018-02-05 23:57:01 +00:00
Sanjay Patel d7c702b451 [LoopStrengthReduce, x86] don't add cost for a cmp that will be macro-fused (PR35681)
In the motivating case from PR35681 and represented by the macro-fuse-cmp test:
https://bugs.llvm.org/show_bug.cgi?id=35681
...there's a 37 -> 31 byte size win for the loop because we eliminate the big base 
address offsets.

SPEC2017 on Ryzen shows no significant perf difference.

Differential Revision: https://reviews.llvm.org/D42607

llvm-svn: 324289
2018-02-05 23:43:05 +00:00
Nirav Dave eedb663221 [X86] Teach DAG unfoldMemoryOperand to reconvert CMPs to tests
Summary:
Copy MI-level cmp->test conversion to SelectionDAG-level memory unfold.
This fixes a regression from upcoming D41293 change.

Reviewers: craig.topper, RKSimon

Reviewed By: craig.topper

Subscribers: llvm-commits, hiraditya

Differential Revision: https://reviews.llvm.org/D42808

llvm-svn: 324261
2018-02-05 18:58:58 +00:00
Craig Topper 9a06f24704 [X86] Artificially lower the complexity of the scalar ANDN patterns so that AND with immediate will match first.
This allows the immediate to folded into the and instead of being forced to move into a register. This can sometimes result in shorter encodings since the and can sign extend an immediate.

This also allows us to match an and to a movzx after a not.

This can cause an extra move if the input to the separate NOT has an additional user which requires a copy before the NOT.

llvm-svn: 324260
2018-02-05 18:31:04 +00:00
Krzysztof Parzyszek e3ef6e0706 [Hexagon] Memoize instruction positions in BitTracker
llvm-svn: 324250
2018-02-05 17:12:07 +00:00
Craig Topper 57e0643160 [X86] Teach X86DAGToDAGISel::shrinkAndImmediate to preserve upper 32 zeroes of a 64 bit mask.
If the upper 32 bits of a 64 bit mask are all zeros, we have special isel patterns to use a 32-bit and instead of a 64-bit and by relying on the impliciting zeroing of 32 bit ops.

This patch teachs shrinkAndImmediate not to break that optimization.

Differential Revision: https://reviews.llvm.org/D42899

llvm-svn: 324249
2018-02-05 16:54:07 +00:00
Benjamin Kramer 45aa89eb7f BitTracker.h needs a full definition of MachineInstr, so include the defining file.
Patch by Dean Sturtevant!

Differential Revision: https://reviews.llvm.org/D42907

llvm-svn: 324245
2018-02-05 15:56:24 +00:00
Krzysztof Parzyszek ef20447fa0 [Hexagon] Forgot about HexagonISD::VZERO in selecting const vectors
llvm-svn: 324244
2018-02-05 15:52:54 +00:00
Krzysztof Parzyszek 67079be139 [Hexagon] Don't use garbage mask in HvxSelector::shuffp2
The function shuffp2 was breaking up a wide shuffle into a pair of
narrower ones, except that the narrower shuffle masks were actually
uninitialized.

llvm-svn: 324243
2018-02-05 15:46:41 +00:00
Krzysztof Parzyszek 02947b7112 [Hexagon] Use V6_vmpyih for halfword multiplication
Unlike V6_vmpyhv, it produces the result in the exact form that is
expected without the need for a shuffle.

llvm-svn: 324241
2018-02-05 15:40:06 +00:00
Dmitry Preobrazhensky 0a1ff464e1 [AMDGPU][MC] Corrected dst/data size for MIMG opcodes with d16 modifier
See bug 36154: https://bugs.llvm.org/show_bug.cgi?id=36154

Differential Revision: https://reviews.llvm.org/D42847

Reviewers: cfang, artem.tamazov, arsenm
llvm-svn: 324237
2018-02-05 14:18:53 +00:00
Dmitry Preobrazhensky e3271aee44 [AMDGPU][MC] Added validation of d16 and r128 modifiers of MIMG opcodes
See bugs 36094, 36095:
  https://bugs.llvm.org/show_bug.cgi?id=36094
  https://bugs.llvm.org/show_bug.cgi?id=36095

Differential Revision: https://reviews.llvm.org/D42692

Reviewers: vpykhtin, artem.tamazov, arsenm
llvm-svn: 324231
2018-02-05 12:45:43 +00:00
Hiroshi Inoue c5ab1ab797 [PowerPC] Check hot loop exit edge in PPCCTRLoops
PPCCTRLoops transform loops using mtctr/bdnz instructions if loop trip count is known and big enough to compensate for the cost of mtctr.
But if there is a loop exit edge which is known to be frequently taken (by builtin_expect or by PGO), we should not transform the loop to avoid the cost of mtctr instruction. Here is an example of a loop with hot exit edge:

for (unsigned i = 0; i < TripCount; i++) {
  // do something
  if (__builtin_expect(check(), 1))
    break;
  // do something
}

Differential Revision: https://reviews.llvm.org/D42637

llvm-svn: 324229
2018-02-05 12:25:29 +00:00
Craig Topper 5a2bd99a9e [X86] Add isel patterns for selecting masked SUBV_BROADCAST with bitcasts. Remove combineBitcastForMaskedOp.
Add test cases for the merge masked versions to make sure we have all those covered.

llvm-svn: 324210
2018-02-05 08:37:37 +00:00
Craig Topper 6ff5eb5dd5 [X86] Remove unused lambda. NFC
llvm-svn: 324206
2018-02-05 06:56:33 +00:00
Craig Topper 25ceba7f30 [X86] Remove X86ISD::SHUF128 from combineBitcastForMaskedOp. Use isel patterns instead.
We always created X86ISD::SHUF128 with a 64-bit element type so we can use isel patterns to detect a bitconvert to 32-bit to handle masking.

The test changes are because we also match the bitconvert even if there is no masking. This leads to unnecessary isel pattern, but it requires more multiclass hackery in tablegen to get rid of it.

llvm-svn: 324205
2018-02-05 06:00:23 +00:00
Craig Topper 8d511a65af [X86] Add DAG combine to turn (bitcast (and/or/xor (bitcast X), Y)) -> (and/or/xor X, (bitcast Y)) when casting between GPRs and mask operations.
This reduces the number of transitions between k-registers and GPRs, reducing the number of instructions.

There's still some room for improvement to remove more transitions, but this is a good start.

llvm-svn: 324184
2018-02-04 01:43:48 +00:00
Craig Topper 17d99f1df4 [X86] Remove unused function argument. NFC
llvm-svn: 324183
2018-02-04 01:43:44 +00:00
Craig Topper 071ad9c6e0 [X86] Remove and autoupgrade kand/kandn/kor/kxor/kxnor/knot intrinsics.
Clang already stopped using these a couple months ago.

The test cases aren't great as there is nothing forcing the operations to stay in k-registers so some of them moved back to scalar ops due to the bitcasts being moved around.

llvm-svn: 324177
2018-02-03 20:18:25 +00:00
Craig Topper fae8788cfa [X86] Prefer to create a ISD::SETCC over X86ISD::PCMPEQ in combineVectorSizedSetCCEquality.
This is running pre-legalize, we should try to use target independent nodes. This will give the best opportunity for target independent optimizations.

llvm-svn: 324147
2018-02-02 21:59:46 +00:00
Craig Topper 10aa254ecd [X86] Pass SDLoc by const reference in a few more places in X86ISelLowering.cpp. NFC
llvm-svn: 324135
2018-02-02 20:32:00 +00:00
Amara Emerson 3838ed0370 [AArch64][GlobalISel] Use getRegClassForTypeOnBank() in selectCopy.
Differential Revision: https://reviews.llvm.org/D42832

llvm-svn: 324110
2018-02-02 18:03:30 +00:00
Craig Topper e538fc74d4 [X86] Remove checks for FeatureAVX512 from the X86 assembly parser. Remove mcpu/mattr from assembly test command lines.
Summary:
We should always be able to accept AVX512 registers and instructions in llvm-mc. The only subtarget mode that should be checked is 16-bit vs 32-bit vs 64-bit mode.

I've also removed all the mattr/mcpu lines from test RUN lines to be consistent with this. Most were due to AVX512, but a few were for other features.

Fixes PR36202

Reviewers: RKSimon, echristo, bkramer

Reviewed By: echristo

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D42824

llvm-svn: 324106
2018-02-02 17:02:58 +00:00
Yaxun Liu 2a22c5deff [AMDGPU] Switch to the new addr space mapping by default
This requires corresponding clang change.

Differential Revision: https://reviews.llvm.org/D40955

llvm-svn: 324101
2018-02-02 16:07:16 +00:00
Craig Topper 76c5ce5184 [X86] Legalize (v64i1 (bitcast (i64 X))) on 32-bit targets by extracting 32-bit halves from i32, bitcasting each to v32i1, and concatenating.
This prevents the scalarization that would otherwise occur.

llvm-svn: 324057
2018-02-02 05:59:33 +00:00
Craig Topper 5570e03b21 [X86] Legalize (i64 (bitcast (v64i1 X))) on 32-bit targets by extracting to v32i1 and bitcasting to i32.
This saves a trip through memory and seems to open up other combining opportunities.

llvm-svn: 324056
2018-02-02 05:59:31 +00:00
Shiva Chen b22c1d29bc [RISCV] Fix c.addi and c.addi16sp immediate constraints which should be non-zero
Differential Revision: https://reviews.llvm.org/D42782

llvm-svn: 324055
2018-02-02 02:43:23 +00:00
Shiva Chen bbf4c5c25e [RISCV] Define getSetCCResultType for setting vector setCC type
To avoid trigger "No default SetCC type for vectors!" Assertion

Differential Revision: https://reviews.llvm.org/D42675

llvm-svn: 324054
2018-02-02 02:43:18 +00:00
Amara Emerson 58aea52bc4 [GlobalISel] Constrain the dest reg of IMPLICT_DEF.
This fixes a crash where the user is a COPY, which deliberately does not
constrain its source operands, resulting in a vreg without a reg class escaping
selection.

Differential Revision: https://reviews.llvm.org/D42697

llvm-svn: 324047
2018-02-02 01:44:43 +00:00
David Blaikie d8a6f93aae Remove non-modular header containing static utility functions
The one place that uses these functions isn't particularly
long/complicated, so it's easier to just have these inline at that
location than trying to split it out into a true header. (in part also
because of the use of the DEBUG macros, which make this not really a
standalone header even if the static functions were made inline instead)

llvm-svn: 324044
2018-02-02 00:33:50 +00:00
Craig Topper 2d67d1e2a8 [X86] Separate the call to LowerVectorAllZeroTest from EmitTest. NFCI
Every instruction that has the word TEST in its name seems to have been buried into EmitTest. But that code is largely concerned with trying to reuse the flags from instructions that update flags in a pretty normal way.

PTEST/TESTP/KTEST do not update flags in a normal way. They only update Z and C and the C flag update is non-standard. Rather than try to bend EmitTest's already complex logic to accomodate this, just move the call up to LowerSETCC and replicate the few pre-checks that are needed.

While there add a FIXME for using the C flag for checking for all 1s which we definitely couldn't do from EmitTEST.

llvm-svn: 324029
2018-02-01 23:21:20 +00:00
Nemanja Ivanovic 77e34f15c9 [PowerPC] Tell VSX swap removal that scalar conversions are lane-sensitive
This is a rather non-controversial change. We were missing these instructions
from the list of instructions that are lane-sensitive. These two put the result
into lane 0 (BE) or 3 (LE) regardless of the input. This patch fixes PR36068.

llvm-svn: 324005
2018-02-01 21:09:04 +00:00
Sanjay Patel d7bed12192 [AArch64] remove bogus comment; NFC
I added this comment with D42323, but as discussed in D42806, the architecture
does the right thing for denorms. We don't even need the select on 0.0 here?

llvm-svn: 323996
2018-02-01 19:59:33 +00:00
Changpeng Fang 29fcf883fb AMDGPU/SI: Adjust the encoding family for D16 buffer instructions when the target has UnpackedD16VMem feature.
Reviewers:
  Matt and Brian

Differential Revision:
  https://reviews.llvm.org/D42548

llvm-svn: 323988
2018-02-01 18:41:33 +00:00
Simon Pilgrim 1a8cefc328 [X86][SSE] LowerBUILD_VECTORAsVariablePermute - add support for scaling index vectors
This allows us to use PSHUFB for v8i16/v4i32 and VPERMD/PERMPS for v4i64/v4f64 variable shuffles.

Differential Revision: https://reviews.llvm.org/D42487

llvm-svn: 323987
2018-02-01 18:10:30 +00:00
Craig Topper a8a24232ee [X86] Remove custom lowering vXi1 extending loads and truncating stores.
Summary: Now that v2i1/v4i1 are legal without VLX. And v32i1 is legalized by splitting rather than widening. And isVectorLoadExtDesirable returns false for vXi1. It appears this handling is dead because the operations simply don't exist.

Reviewers: RKSimon, zvi, guyblank, delena, spatel

Reviewed By: delena

Subscribers: llvm-commits, rengolin

Differential Revision: https://reviews.llvm.org/D42781

llvm-svn: 323983
2018-02-01 17:08:41 +00:00
Craig Topper 7e910a9e85 [X86] Turn X86ISD::AND nodes that have no flag users back into ISD::AND just before isel to enable test instruction matching
Summary:
EmitTest sometimes creates X86ISD::AND specifically to hide the AND from DAG combine. But this prevents isel patterns that look for (cmp (and X, Y), 0) from being able to see it. So we end up with an AND and a TEST. The TEST gets removed by compare instruction optimization during the peephole pass.

This patch attempts to fix this by converting X86ISD::AND with no flag users back into ISD::AND during the DAG preprocessing just before isel.

In order to do this correctly I had to make the X86ISD::AND node created by EmitTest in this case really have a flag output. Which arguably it should have had anyway so that the number of operands would be consistent for the opcode in all cases. Then I had to modify the ReplaceAllUsesWith to understand that we might be looking at an instruction with 2 outputs. Though in this case there are no uses to replace since we just created the node, but that's what the code did before so I just made it keep working.

Reviewers: spatel, RKSimon, niravd, deadalnix

Reviewed By: RKSimon

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D42764

llvm-svn: 323982
2018-02-01 17:08:39 +00:00
Sanjay Patel 657e5d8d41 [DAGCombiner] filter out denorm inputs when calculating sqrt estimate (PR34994)
As shown in the example in PR34994:
https://bugs.llvm.org/show_bug.cgi?id=34994
...we can return a very wrong answer (inf instead of 0.0) for square root when 
using a reciprocal square root estimate instruction.

Here, I've conditionalized the filtering out of denorms based on the function 
having "denormal-fp-math"="ieee" in its attributes. The other options for this 
attribute are 'preserve-sign' and 'positive-zero'.

So we don't generate this extra code by default with just '-ffast-math' (because 
then there's no denormal attribute string at all), but it works if you specify 
'-ffast-math -fdenormal-fp-math=ieee' from clang. 

As noted in the review, there may be other problems in clang that affect the 
results depending on platform (Linux x86 at least), but this should allow 
creating the desired codegen.

Differential Revision: https://reviews.llvm.org/D42323

llvm-svn: 323981
2018-02-01 16:57:18 +00:00
Sjoerd Meijer 9d9a86535e [ARM] FullFP16 LowerReturn Fix
Commit r323512 introduced an optimisation in LowerReturn for half-precision
return values. A missing check caused a crash when the return value is "undef"
(i.e. a node that has no operands).

Differential Revision: https://reviews.llvm.org/D42743

llvm-svn: 323968
2018-02-01 13:48:40 +00:00
Aleksandar Beserminji a330c208f2 [mips] Include EVA instructions in Std2MicroMips mapping tables
This patch includes EVA instructions in the Std2MicroMips mapping
tables, which is required for direct object emission.

Differential Revision: https://reviews.llvm.org/D41771

llvm-svn: 323958
2018-02-01 12:53:26 +00:00
Clement Courbet ea8d07eb76 [AArch64][NFC] Make all ProcResource definitions include their SchedModel.
This makes targets ExynosM1,ExynosM3,ThunderX2T99 consistent with all
other targets.

llvm-svn: 323955
2018-02-01 12:12:01 +00:00
Yvan Roux 490e9e6761 [ARM] Add support for unpredictable MVN instructions.
This fixes bugzilla 33011
https://bugs.llvm.org/show_bug.cgi?id=33011

Defines bits {19-16} as zero or unpredictable as specified by the ARM ARM in
sections A8.8.116 and A8.8.117.

It fixes also the usage of PC register as destination register for MVN
register-shifted register version as specified in A8.8.117.

Differential Revision: https://reviews.llvm.org/D41905

llvm-svn: 323954
2018-02-01 12:06:57 +00:00
Yvan Roux 705e26a243 Test commit: Fix a comment.
llvm-svn: 323947
2018-02-01 08:39:58 +00:00
Dean Michael Berris cdca0730be [XRay][compiler-rt+llvm] Update XRay register stashing semantics
Summary:
This change expands the amount of registers stashed by the entry and
`__xray_CustomEvent` trampolines.

We've found that since the `__xray_CustomEvent` trampoline calls can show up in
situations where the scratch registers are being used, and since we don't
typically want to affect the code-gen around the disabled
`__xray_customevent(...)` intrinsic calls, that we need to save and restore the
state of even the scratch registers in the handling of these custom events.

Reviewers: pcc, pelikan, dblaikie, eizan, kpw, echristo, chandlerc

Reviewed By: echristo

Subscribers: chandlerc, echristo, hiraditya, davide, dblaikie, llvm-commits

Differential Revision: https://reviews.llvm.org/D40894

llvm-svn: 323940
2018-02-01 02:21:54 +00:00
Evgeniy Stepanov 7746899f48 Revert "[ARM] Lower lower saturate to 0 and lower saturate to -1 using bit-operations"
Miscompiles code. Testcase pending.

This reverts commit r323869.

llvm-svn: 323929
2018-01-31 22:55:19 +00:00
Matt Arsenault af88f0eb44 AMDGPU: Fix missing SCC def from s_xor_b64_term
llvm-svn: 323927
2018-01-31 22:54:27 +00:00
Craig Topper e44faf53c7 [X86] Make the type checks in detectAVX512USatPattern more robust
This code currently uses isSimple and getSizeInBits in an attempt to prune types. But isSimple will return true for any type that any target supports natively. I don't think that's a good way to prune types. I also don't think the dest element type checks are very robust since we didn't do an isSimple check on the dest type.

This patch adds a check for the input type being legal to the one caller that didn't already check that. Then we explicitly check the element types for the destination are i8, i16, or i32

Differential Revision: https://reviews.llvm.org/D42706

llvm-svn: 323924
2018-01-31 22:26:31 +00:00
Krzysztof Parzyszek 15efa98f63 [Hexagon] Rename HexagonISelLowering::getNode to getInstr, NFC
llvm-svn: 323916
2018-01-31 21:17:03 +00:00
Chandler Carruth 0dcee4fe7a [x86] Make the retpoline thunk insertion a machine function pass.
Summary:
This removes the need for a machine module pass using some deeply
questionable hacks. This should address PR36123 which is a case where in
full LTO the memory usage of a machine module pass actually ended up
being significant.

We should revert this on trunk as soon as we understand and fix the
memory usage issue, but we should include this in any backports of
retpolines themselves.

Reviewers: echristo, MatzeB

Subscribers: sanjoy, mcrosier, mehdi_amini, hiraditya, llvm-commits

Differential Revision: https://reviews.llvm.org/D42726

llvm-svn: 323915
2018-01-31 20:56:37 +00:00
Krzysztof Parzyszek 1108ee2496 [Hexagon] Implement HVX codegen for vector shifts
llvm-svn: 323914
2018-01-31 20:49:24 +00:00
Krzysztof Parzyszek 9eb085e6cf [Hexagon] Handle ANY_EXTEND_VECTOR_INREG in lowering
llvm-svn: 323912
2018-01-31 20:48:11 +00:00
Krzysztof Parzyszek b843f75179 [Hexagon] Handle SETCC on vector pairs in lowering
llvm-svn: 323911
2018-01-31 20:46:55 +00:00
Marek Olsak d4bb329d0e AMDGPU: Fold inline offset for loads properly in moveToVALU on GFX9
Summary:
This enables load merging into x2, x4, which is driven by inline offsets.

6500 shaders are affected:
Code Size in affected shaders: -15.14 %

Reviewers: arsenm, nhaehnle

Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D42078

llvm-svn: 323909
2018-01-31 20:18:11 +00:00
Marek Olsak 13e4741275 AMDGPU: Add intrinsics llvm.amdgcn.cvt.{pknorm.i16, pknorm.u16, pk.i16, pk.u16}
Reviewers: arsenm, nhaehnle

Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye

Differential Revision: https://reviews.llvm.org/D41663

llvm-svn: 323908
2018-01-31 20:18:04 +00:00
Sam Clegg 6e7f1826c5 [WebAssembly] MC: Remove unused code for handling of wasm globals
For now, we are not using wasm globals, except for modeling of
the stack points.

Alos, factor out common struct WasmGlobalType, which matches the
name for that tuple in the Wasm spec and rename methods
to "isBindingGlobal", "isTypeGlobal" to avoid ambiguity.

Patch by Nicholas Wilson!

Differential Revision: https://reviews.llvm.org/D42750

llvm-svn: 323901
2018-01-31 19:50:14 +00:00
Amaury Sechet f9a9e9a251 [X86] Generate testl instruction through truncates.
Summary:
This was introduced in D42646 but ended up being reverted because the original implementation was buggy.

Depends on D42646

Reviewers: craig.topper, niravd, spatel, hfinkel

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D42741

llvm-svn: 323899
2018-01-31 19:20:06 +00:00
Krzysztof Parzyszek 82a83391d3 [Hexagon] Handle BUILD_VECTOR from undef values in buildHvxVectorReg
llvm-svn: 323889
2018-01-31 16:52:15 +00:00
Amaury Sechet f89f188ddb [X86] Avoid using high register trick for test instruction
Summary:
It seems it's main effect is to create addition copies when values are inr register that do not support this trick, which increase register pressure and makes the code bigger.

Reviewers: craig.topper, niravd, spatel, hfinkel

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D42646

llvm-svn: 323888
2018-01-31 16:48:54 +00:00
Krzysztof Parzyszek 8cc636c592 [Hexagon] Only process bitcasts of vsplats when selecting const vectors
Selecting of constant HVX vectors involves some "manual processing",
which mishandled an unrelated BITCAST operation causing a selection
error.

llvm-svn: 323887
2018-01-31 16:48:20 +00:00
Diana Picus 12ed95e3e7 Fix formatting for r323876. NFC
llvm-svn: 323878
2018-01-31 15:16:17 +00:00
Diana Picus 1d4421f6a6 [ARM GlobalISel] Modernize LegalizerInfo. NFCI
Start using the new LegalizerInfo API introduced in r323681.

Keep the old API for opcodes that need Lowering in some circumstances
(G_FNEG and G_UREM/G_SREM).

llvm-svn: 323876
2018-01-31 14:55:07 +00:00
Pablo Barrio 2e442a7831 [ARM] Lower lower saturate to 0 and lower saturate to -1 using bit-operations
Summary:
Expressions of the form x < 0 ? 0 :  x; and x < -1 ? -1 : x can be lowered using bit-operations instead of branching or conditional moves

In thumb-mode this results in a two-instruction sequence, a shift followed by a bic or or while in ARM/thumb2 mode that has flexible second operand the shift can be folded into a single bic/or instructions. In most cases this results in smaller code and possibly less branches, and in no case larger than before.

Patch by Marten Svanfeldt.

Reviewers: fhahn, pbarrio

Reviewed By: pbarrio

Subscribers: efriedma, rogfer01, aemerson, javed.absar, kristof.beyls, llvm-commits

Differential Revision: https://reviews.llvm.org/D42574

llvm-svn: 323869
2018-01-31 13:20:10 +00:00
Jonas Paulsson cc5fe73669 [SystemZ] Check the bitwidth before calling isInt/isUInt.
Since these methods will assert if the integer does not fit into 64 bits,
it is necessary to do this check before calling them in
supportedAddressingMode().

Review: Ulrich Weigand.
llvm-svn: 323866
2018-01-31 12:41:25 +00:00
Sjoerd Meijer 98d5359ea2 [ARM] Armv8.2-A FP16 code generation (part 2/3)
Half-precision arguments and return values are passed as if it were an int or
float for ARM. This results in truncates and bitcasts to/from i16 and f16
values, which are legalized very early to stack stores/loads. When FullFP16 is
enabled, we want to avoid codegen for these bitcasts as it is unnecessary and
inefficient.

Differential Revision: https://reviews.llvm.org/D42580

llvm-svn: 323861
2018-01-31 10:18:29 +00:00
Jonas Paulsson e6a8329e9f [PowerPC] Return true in enableMultipleCopyHints().
Enable multiple COPY hints to eliminate more COPYs during register allocation.

Note that this is something all targets should do, see
https://reviews.llvm.org/D38128.

Review: Nemanja Ivanovic
llvm-svn: 323858
2018-01-31 09:26:51 +00:00
Roger Ferrer Ibanez aea4208720 [ARM] Allow the scheduler to clone a node with glue to avoid a copy CPSR ↔ GPR.
In Thumb 1, with the new ADDCARRY / SUBCARRY the scheduler may need to do
copies CPSR ↔ GPR but not all Thumb1 targets implement them.

The schedule can attempt, before attempting a copy, to clone the instructions
but it does not currently do that for nodes with input glue. In this patch we
introduce a target-hook to let the hook decide if a glued machinenode is still
eligible for copying. In this case these are ARM::tADCS and ARM::tSBCS .

As a follow-up of this change we should actually implement the copies for the
Thumb1 targets that do implement them and restrict the hook to the targets that
can't really do such copy as these clones are not ideal.

This change fixes PR35836.

Differential Revision: https://reviews.llvm.org/D42051

llvm-svn: 323857
2018-01-31 09:23:43 +00:00
Krzysztof Parzyszek 119856430e [RDF] Clear the renamable flag when copy propagating reserved registers
llvm-svn: 323831
2018-01-30 23:19:44 +00:00
Krzysztof Parzyszek 5d9844f48a [Hexagon] Handle truncates in polynomial multiply idiom recognition
This is in anticipation of https://reviews.llvm.org/D42424, which would
otherwise break one of the pmpy testcases.

llvm-svn: 323824
2018-01-30 22:03:59 +00:00
Craig Topper d759f476e8 [X86] Remove redundant check for hasAVX512 before calling hasBWI. NFC
hasBWI implies hasAVX512.

llvm-svn: 323823
2018-01-30 21:53:35 +00:00
Martin Storsjo 708498a164 [AArch64] Properly handle dllimport of variables when using fast-isel
Differential Revision: https://reviews.llvm.org/D42567

llvm-svn: 323810
2018-01-30 19:50:51 +00:00
Krzysztof Parzyszek 39a9842f3c [Hexagon] Handle non-aligned offsets in globals in extender optimization
Instructions like memd(r0+##global+1) are legal as long as the entire
address is properly aligned. Assuming that "global" is aligned at an
8-byte boundary, the expression "global+1" appears to be misaligned.
Handle such cases in HexagonConstExtenders, and make sure that any non-
extended offsets generated are still aligned accordingly.

llvm-svn: 323799
2018-01-30 18:12:37 +00:00
Krzysztof Parzyszek 96a284114e Revert: [Hexagon] Make sure that offset on globals matches alignment requirements
This reverts r323562, since it wasn't actually necessary. Constant-
extended offsets do not need to be aligned, as long as the effective
address is aligned.

Keep the testcase, with a modification which checks that such offsets
are not unnecessarily avoided.

llvm-svn: 323798
2018-01-30 18:10:27 +00:00
Simon Pilgrim 073f089c6e [X86][XOP] Update isVectorShiftByScalarCheap with cases covered by XOP
Similar to D42437, XOP supports variable shift for v16i8/v8i16/v4i32/v2i64 types.

Differential Revision: https://reviews.llvm.org/D42526

llvm-svn: 323797
2018-01-30 18:10:21 +00:00
Geoff Berry 1d53101387 [AMDGPU] isRenamable fixes to support copy forwarding
Mark more opcodes as hasExtraSrcRegAllocReq so that their operands will
be marked as not renamable, to avoid copy forwarding violating the
constraint that only one operand may use the constant bus.

These changes fix a few mis-compiles when copy forwarding is enabled in
MachineCopyPropagation by D41835 (and were reviewed as part of that change).

llvm-svn: 323794
2018-01-30 17:37:39 +00:00
Mark Searles 94ae3b2f9b [AMDGPU] Revert "[AMDGPU] Add options for waitcnt pass debugging; add instr count in debug output."
Patch caused a buildbot failure; arg; http://lab.llvm.org:8011/builders/lld-x86_64-darwin13/builds/17373/s\
teps/build_Lld/logs/stdio :
        /Users/buildslave/as-bldslv9/lld-x86_64-darwin13/llvm.src/lib/Target/AMDGPU/SIInsertWaitcnts.cpp:1563:18: error: unused variable 'InstCnt' [-Werror,-Wunused-variable]
          static int32_t InstCnt = 0;
                                              "
This reverts commit 4f4a7d61e306b67044d9f16bc2016fee806bc2cc.

llvm-svn: 323791
2018-01-30 17:17:06 +00:00
Mark Searles d6d5a2571f [AMDGPU] Add options for waitcnt pass debugging; add instr count in debug output.
-amdgpu-waitcnt-forcezero={1|0}  Force all waitcnt instrs to be emitted as s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-amdgpu-waitcnt-forceexp=<n>  Force emit a s_waitcnt expcnt(0) before the first <n> instrs
-amdgpu-waitcnt-forcelgkm=<n> Force emit a s_waitcnt lgkmcnt(0) before the first <n> instrs
-amdgpu-waitcnt-forcevm=<n>   Force emit a s_waitcnt vmcnt(0) before the first <n> instrs

This patch was pushed ( abb190fd51cd2f9a9eef08c024e109f7f7e909fc ), which caused a buildbot failure, reverted ( 6227480d74da507cf8e1b4bcaffbdb9fb875b4b8 ), and then updated to fix buildbot failures (this patch).

Differential Revision: https://reviews.llvm.org/D40091

llvm-svn: 323788
2018-01-30 16:49:38 +00:00
Changpeng Fang 0905870f93 AMDGPU/SI: Add decoding in the GFX80_UNPACKED decoding namespace.
Reviewer:
  Dmitry (dp).

Differential Revision:
  https://reviews.llvm.org/D42596

llvm-svn: 323785
2018-01-30 16:42:40 +00:00
Evandro Menezes f1d01645a7 [AArch64] Add new target feature to fuse address generation with load or store
This feature enables the fusion of the address generation and a
corresponding load or store together.

Differential revision: https://reviews.llvm.org/D42393

llvm-svn: 323782
2018-01-30 16:28:01 +00:00
Simon Dardis daaeaba665 [mips] Fix incorrect sign extension for fpowi libcall
PR36061 showed that during the expansion of ISD::FPOWI, that there
was an incorrect zero extension of the integer argument which for
MIPS64 would then give incorrect results. Address this with the
existing mechanism for correcting sign extensions.

This resolves PR36061.

Thanks to James Cowgill for reporting the issue!

Reviewers: atanasyan, hfinkel

Differential Revision: https://reviews.llvm.org/D42537

llvm-svn: 323781
2018-01-30 16:24:10 +00:00
Zaara Syeda 1f59ae311b Re-commit : [PowerPC] Add handling for ColdCC calling convention and a pass to mark
candidates with coldcc attribute.

This recommits r322721 reverted due to sanitizer memory leak build bot failures.

Original commit message:
This patch adds support for the coldcc calling convention for Power.
This changes the set of non-volatile registers. It includes a pass to stress
test the implementation by marking all static directly called functions with
the coldcc attribute through the option -enable-coldcc-stress-test. It also
includes an option, -ppc-enable-coldcc, to add the coldcc attribute to
functions which are cold at all call sites based on BlockFrequencyInfo when
the containing function does not call any non cold functions.

Differential Revision: https://reviews.llvm.org/D38413

llvm-svn: 323778
2018-01-30 16:17:22 +00:00
Evandro Menezes 07c78eeeef [AArch64] Add new target feature to handle cheap as move for Exynos
This feature enables special handling of cheap as move in the existing
custom handling specifically for Exynos processors.

Differential revision: https://reviews.llvm.org/D42387

llvm-svn: 323774
2018-01-30 15:40:22 +00:00
Evandro Menezes 9f9daa1f14 [AArch64] Add pipeline model for Exynos M3
Add the scheduling and cost model for Exynos M3.

Differential revision: https://reviews.llvm.org/D42387

llvm-svn: 323773
2018-01-30 15:40:16 +00:00
Eric Liu 0b69b5ed85 Revert "[X86] Avoid using high register trick for test instruction"
This reverts commit r323690. This causes crash in llc. See the original commit thread for details.

llvm-svn: 323761
2018-01-30 14:18:33 +00:00
Simon Pilgrim eb07016156 Spelling mistake in comment. NFCI.
llvm-svn: 323752
2018-01-30 12:18:51 +00:00
Diana Picus 2a5b962030 [ARM GlobalISel] Map G_SITOFP and G_UITOFP
Straightforward mapping (integer operand to GPR, floating point operand
to FPR).

llvm-svn: 323731
2018-01-30 09:15:23 +00:00
Diana Picus 517531e5a5 [ARM GlobalISel] Legalize G_SITOFP and G_UITOFP
Legal if we have hardware support, libcall otherwise.

Also add supporting code to the legalizer helper for libcalls.

llvm-svn: 323730
2018-01-30 09:15:17 +00:00
Diana Picus a2da03022c [ARM GlobalISel] Map G_FPTOSI and G_FPTOUI
Straightforward mapping (integer operand goes to GPR, floating point
operand goes to FPR).

llvm-svn: 323727
2018-01-30 07:54:58 +00:00
Diana Picus 4ed0ee7b5f [ARM GlobalISel] Legalize G_FPTOSI and G_FPTOUI
Legal if we have hardware support for floating point, libcalls
otherwise.

Also add the necessary support for libcalls in the legalizer helper.

llvm-svn: 323726
2018-01-30 07:54:52 +00:00
Tom Stellard 3ae38d271e AMDGPU: Move ADDRIndirect complex pattern into R600Instructions.td
Summary: This is only used by R600.

Reviewers: arsenm

Reviewed By: arsenm

Subscribers: kzhuravl, wdng, nhaehnle, mgorny, yaxunl, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D37114

llvm-svn: 323709
2018-01-29 23:29:26 +00:00
Craig Topper 571231a7fe [X86] Use VMOVDQA64 for aligned vXi32 stores.
I meant to do this with the unaligned stores in r322820, but looks like I missed it.

llvm-svn: 323708
2018-01-29 23:27:23 +00:00
Marek Olsak 48057b554c AMDGPU: Allow a SGPR for the conditional KILL operand
Patch by: Bas Nieuwenhuizen

Just use the _e64 variant if needed. This should be possible as per

def : Pat <
  (int_amdgcn_kill (i1 (setcc f32:$src, InlineFPImm<f32>:$imm, cond:$cond))),
  (SI_KILL_F32_COND_IMM_PSEUDO $src, (bitcast_fpimm_to_i32 $imm), (cond_as_i32imm $cond))
> ;

I don't think we can get an immediate for the other operand for which we
need the second 32-bit word.

https://reviews.llvm.org/D42302

llvm-svn: 323706
2018-01-29 23:19:10 +00:00
Craig Topper a8f87a36f1 [X86] Add FeaturePOPCNTFalseDeps to skylake server CPU to match skylake client.
llvm-svn: 323700
2018-01-29 21:56:48 +00:00
Simon Pilgrim 02bdac53e7 [X86] Emit 11-byte or 15-byte NOPs on recent AMD targets, else default to 10-byte NOPs (PR22965)
We currently emit up to 15-byte NOPs on all targets (apart from Silvermont), which stalls performance on some targets with decoders that struggle with 2 or 3 more '66' prefixes.

This patch flags recent AMD targets (btver1/znver1) to still emit 15-byte NOPs and bdver* targets to emit 11-byte NOPs. All other targets now emit 10-byte NOPs apart from SilverMont CPUs which still emit 7-byte NOPS.

Differential Revision: https://reviews.llvm.org/D42616

llvm-svn: 323693
2018-01-29 21:24:31 +00:00
Daniel Sanders 08464524c3 [ARM][GISel] PR35965 Constrain RegClasses of nested instructions built from Dst Pattern
Summary:
Apparently, we missed on constraining register classes of VReg-operands of all the instructions
built from a destination pattern but the root (top-level) one. The issue exposed itself
while selecting G_FPTOSI for armv7: the corresponding pattern generates VTOSIZS wrapped
into COPY_TO_REGCLASS, so top-level COPY_TO_REGCLASS gets properly constrained,
while nested VTOSIZS (or rather its destination virtual register to be exact) does not.

Fixing this by issuing GIR_ConstrainSelectedInstOperands for every nested GIR_BuildMI.

https://bugs.llvm.org/show_bug.cgi?id=35965
rdar://problem/36886530

Patch by Roman Tereshin

Reviewers: dsanders, qcolombet, rovka, bogner, aditya_nandakumar, volkan

Reviewed By: dsanders, qcolombet, rovka

Subscribers: aemerson, javed.absar, kristof.beyls, llvm-commits

Differential Revision: https://reviews.llvm.org/D42565

llvm-svn: 323692
2018-01-29 21:09:12 +00:00
Amaury Sechet 015184b79e [X86] Avoid using high register trick for test instruction
Summary:
It seems it's main effect is to create addition copies when values are inr register that do not support this trick, which increase register pressure and makes the code bigger.

The main noteworthy regression I was able to observe was pattern of the type (setcc (trunc (and X, C)), 0) where C is such as it would benefit from the hi register trick. To prevent this, a new pattern is added to materialize such pattern using a 32 bits test. This has the added benefit of working with any constant that is materializable as a 32bits immediate, not just the ones that can leverage the high register trick, as demonstrated by the test case in test-shrink.ll using the constant 2049 .

Reviewers: craig.topper, niravd, spatel, hfinkel

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D42646

llvm-svn: 323690
2018-01-29 20:54:33 +00:00
Evandro Menezes 1589d6e6a3 [AArch64] Change the filename of the Exynos M1 scheduling defs
After request by Matthias Braun in https://reviews.llvm.org/D42387.

llvm-svn: 323686
2018-01-29 20:22:24 +00:00
Jun Bum Lim fc7d56d949 Revert "AArch64: Omit callframe setup/destroy when not necessary"
This reverts commit r322917 due to multiple performance regressions in spec2006
and spec2017. XFAILed llvm/test/CodeGen/AArch64/big-callframe.ll which initially
motivated this change.

llvm-svn: 323683
2018-01-29 19:56:42 +00:00
Daniel Sanders 79cb839fcd [globalisel][legalizer] Adapt LegalizerInfo to support inter-type dependencies and other things.
Summary:
As discussed in D42244, we have difficulty describing the legality of some
operations. We're not able to specify relationships between types.
For example, declaring the following
  setAction({..., 0, s32}, Legal)
  setAction({..., 0, s64}, Legal)
  setAction({..., 1, s32}, Legal)
  setAction({..., 1, s64}, Legal)
currently declares these type combinations as legal:
  {s32, s32}
  {s64, s32}
  {s32, s64}
  {s64, s64}
but we currently have no means to say that, for example, {s64, s32} is
not legal. Some operations such as G_INSERT/G_EXTRACT/G_MERGE_VALUES/
G_UNMERGE_VALUES have relationships between the types that are currently
described incorrectly.
    
Additionally, G_LOAD/G_STORE currently have no means to legalize non-atomics
differently to atomics. The necessary information is in the MMO but we have no
way to use this in the legalizer. Similarly, there is currently no way for the
register type and the memory type to differ so there is no way to cleanly
represent extending-load/truncating-store in a way that can't be broken by
optimizers (resulting in illegal MIR).

It's also difficult to control the legalization strategy. We've added support
for legalizing non-power of 2 types but there's still some hardcoded assumptions
about the strategy. The main one I've noticed is that type0 is always legalized
before type1 which is not a good strategy for `type0 = G_EXTRACT type1, ...` if
you need to widen the container. It will converge on the same result eventually
but it will take a much longer route when legalizing type0 than if you legalize
type1 first.

Lastly, the definition of legality and the legalization strategy is kept
separate which is not ideal. It's helpful to be able to look at a one piece of
code and see both what is legal and the method the legalizer will use to make
illegal MIR more legal.

This patch adds a layer onto the LegalizerInfo (to be removed when all targets
have been migrated) which resolves all these issues.

Here are the rules for shift and division:
  for (unsigned BinOp : {G_LSHR, G_ASHR, G_SDIV, G_UDIV})
    getActionDefinitions(BinOp)
        .legalFor({s32, s64})     // If type0 is s32/s64 then it's Legal
        .clampScalar(0, s32, s64) // If type0 is <s32 then WidenScalar to s32
                                  // If type0 is >s64 then NarrowScalar to s64
        .widenScalarToPow2(0)     // Round type0 scalars up to powers of 2
        .unsupported();           // Otherwise, it's unsupported
This describes everything needed to both define legality and describe how to
make illegal things legal.

Here's an example of a complex rule:
  getActionDefinitions(G_INSERT)
      .unsupportedIf([=](const LegalityQuery &Query) {
        // If type0 is smaller than type1 then it's unsupported
        return Query.Types[0].getSizeInBits() <= Query.Types[1].getSizeInBits();
      })
      .legalIf([=](const LegalityQuery &Query) {
        // If type0 is s32/s64/p0 and type1 is a power of 2 other than 2 or 4 then it's legal
        // We don't need to worry about large type1's because unsupportedIf caught that.
        const LLT &Ty0 = Query.Types[0];
        const LLT &Ty1 = Query.Types[1];
        if (Ty0 != s32 && Ty0 != s64 && Ty0 != p0)
          return false;
        return isPowerOf2_32(Ty1.getSizeInBits()) &&
               (Ty1.getSizeInBits() == 1 || Ty1.getSizeInBits() >= 8);
      })
      .clampScalar(0, s32, s64)
      .widenScalarToPow2(0)
      .maxScalarIf(typeInSet(0, {s32}), 1, s16) // If type0 is s32 and type1 is bigger than s16 then NarrowScalar type1 to s16
      .maxScalarIf(typeInSet(0, {s64}), 1, s32) // If type0 is s64 and type1 is bigger than s32 then NarrowScalar type1 to s32
      .widenScalarToPow2(1)                     // Round type1 scalars up to powers of 2
      .unsupported();
This uses a lambda to say that G_INSERT is unsupported when type0 is bigger than
type1 (in practice, this would be a default rule for G_INSERT). It also uses one
to describe the legal cases. This particular predicate is equivalent to:
  .legalFor({{s32, s1}, {s32, s8}, {s32, s16}, {s64, s1}, {s64, s8}, {s64, s16}, {s64, s32}})

In terms of performance, I saw a slight (~6%) performance improvement when
AArch64 was around 30% ported but it's pretty much break even right now.
I'm going to take a look at constexpr as a means to reduce the initialization
cost.

Future work:
* Make it possible for opcodes to share rulesets. There's no need for
  G_LSHR/G_ASHR/G_SDIV/G_UDIV to have separate rule and ruleset objects. There's
  no technical barrier to this, it just hasn't been done yet.
* Replace the type-index numbers with an enum to get .clampScalar(Type0, s32, s64)
* Better names for things like .maxScalarIf() (clampMaxScalar?) and the vector rules.
* Improve initialization cost using constexpr

Possible future work:
* It's possible to make these rulesets change the MIR directly instead of
  returning a description of how to change the MIR. This should remove a little
  overhead caused by parsing the description and routing to the right code, but
  the real motivation is that it removes the need for LegalizeAction::Custom.
  With Custom removed, there's no longer a requirement that Custom legalization
  change the opcode to something that's considered legal.

Reviewers: ab, t.p.northover, qcolombet, rovka, aditya_nandakumar, volkan, reames, bogner

Reviewed By: bogner

Subscribers: hintonda, bogner, aemerson, mgorny, javed.absar, kristof.beyls, llvm-commits

Differential Revision: https://reviews.llvm.org/D42251

llvm-svn: 323681
2018-01-29 19:54:49 +00:00
Geoff Berry d37dc77b6e [AMDGPU][X86][Mips] Make sure renamable bit not set for reserved regs
Summary:
Fix a few places that were modifying code after register
allocation to set the renamable bit correctly to avoid failing the
validation added in D42449.

llvm-svn: 323675
2018-01-29 18:47:48 +00:00
Craig Topper eb13ebdb99 [X86] Don't create SHRUNKBLEND when the condition is used by the true or false operand of the vselect.
Fixes PR34592.

Differential Revision: https://reviews.llvm.org/D42628

llvm-svn: 323672
2018-01-29 17:56:57 +00:00
Daniel Sanders 9ade5592d9 [globalisel] Make LegalizerInfo::LegalizeAction available outside of LegalizerInfo. NFC
Summary:
The improvements to the LegalizerInfo discussed in D42244 require that
LegalizerInfo::LegalizeAction be available for use in other classes. As such,
it needs to be moved out of LegalizerInfo. This has been done separately to the
next patch to minimize the noise in that patch.

llvm-svn: 323669
2018-01-29 17:37:29 +00:00
Dmitry Preobrazhensky 4f321aef74 [AMDGPU][MC] Corrected parsing of image opcode modifiers r128 and d16
See bugs 36092, 36093:
    https://bugs.llvm.org/show_bug.cgi?id=36092
    https://bugs.llvm.org/show_bug.cgi?id=36093

Differential Revision: https://reviews.llvm.org/D42583

Reviewers: vpykhtin, artem.tamazov, arsenm
llvm-svn: 323651
2018-01-29 14:20:42 +00:00
Sander de Smalen a1c259c22c [AArch64][AsmParser] NFC: Generalize LogicalImm[Not](32|64) code
Summary:
All variants of isLogicalImm[Not](32|64) can be combined into a single templated function, same for printLogicalImm(32|64).
By making it use a template instead, further SVE patches can use it for other data types as well (e.g. 8, 16 bits).

Reviewers: fhahn, rengolin, aadg, echristo, kristof.beyls, samparker

Reviewed By: samparker

Subscribers: aemerson, javed.absar, llvm-commits

Differential Revision: https://reviews.llvm.org/D42294

llvm-svn: 323646
2018-01-29 13:05:38 +00:00
Jonas Devlieghere 865de57bde [Sparc] Account for bias in stack readjustment
Summary: This was broken long ago in D12208, which failed to account for
the fact that 64-bit SPARC uses a stack bias of 2047, and it is the
*unbiased* value which should be aligned, not the biased one. This was
seen to be an issue with Rust.

Patch by: jrtc27 (James Clarke)

Reviewers: jyknight, venkatra

Reviewed By: jyknight

Subscribers: jacob_hansen, JDevlieghere, fhahn, fedor.sergeev, llvm-commits

Differential Revision: https://reviews.llvm.org/D39425

llvm-svn: 323643
2018-01-29 12:10:32 +00:00
Sjoerd Meijer 3ddb7fb663 [ARM] FP16Pat and FullFP16Pat patterns. NFC.
Create and use FP16Pat FullFP16Pat helper patterns to make the difference
explicit.

Differential Revision: https://reviews.llvm.org/D42634

llvm-svn: 323640
2018-01-29 11:28:06 +00:00
Andrei Elovikov c560a18c7f [X86FixupBWInsts] Fix miscompilation if sibling sub-register is live.
Summary: The issues was found during D40524.

Reviewers: andrew.w.kaylor, craig.topper, MatzeB

Reviewed By: andrew.w.kaylor

Subscribers: aivchenk, llvm-commits

Differential Revision: https://reviews.llvm.org/D42533

llvm-svn: 323635
2018-01-29 09:26:04 +00:00
Oliver Stannard a9d2e004d2 [AArch64] Generate the CASP instruction for 128-bit cmpxchg
The Large System Extension added an atomic compare-and-swap instruction
that operates on a pair of 64-bit registers, which we can use to
implement a 128-bit cmpxchg.

Because i128 is not a legal type for AArch64 we have to do all of the
instruction selection in C++, and the instruction requires even/odd
register pairs, so we have to wrap it in REG_SEQUENCE and EXTRACT_SUBREG
nodes. This is very similar to what we do for 64-bit cmpxchg in the ARM
backend.

Differential revision: https://reviews.llvm.org/D42104

llvm-svn: 323634
2018-01-29 09:18:37 +00:00
Hiroshi Inoue c8e9245816 [NFC] fix trivial typos in comments and documents
"to to" -> "to"

llvm-svn: 323628
2018-01-29 05:17:03 +00:00
Craig Topper 3913a4dd56 [X86] Fix a crash that can occur in combineExtractVectorElt due to not checking the width of a ConstantSDNode before calling getConstantOperandVal.
llvm-svn: 323614
2018-01-28 07:29:35 +00:00
Craig Topper 15d69739e2 [X86] Remove VPTESTM/VPTESTNM ISD opcodes. Use isel patterns matching cmpm eq/ne with immallzeros.
llvm-svn: 323612
2018-01-28 00:56:30 +00:00
Craig Topper 5e4b45361f [X86] Add patterns for using masked vptestnmd for 256-bit vectors without VLX.
We can widen the mask and extract it back down.

llvm-svn: 323610
2018-01-27 23:49:14 +00:00
Craig Topper 247016a735 [X86] Use vptestm/vptestnm for comparisons with zero to avoid creating a zero vector.
We can use the same input for both operands to get a free compare with zero.

We already use this trick in a couple places where we explicitly create PTESTM with the same input twice. This generalizes it.

I'm hoping to remove the ISD opcodes and move this to isel patterns like we do for scalar cmp/test.

llvm-svn: 323605
2018-01-27 20:19:09 +00:00
Craig Topper 513d3fa674 [X86] Remove X86ISD::PCMPGTM/PCMPEQM and instead just use X86ISD::PCMPM and pattern match the immediate value during isel.
Legalization is still biased to turn LT compares in to GT by swapping operands to avoid needing extra isel patterns to commute.

I'm hoping to remove TESTM/TESTNM next and this should simplify that by making EQ/NE more similar.

llvm-svn: 323604
2018-01-27 20:19:02 +00:00
Simon Pilgrim fe3fac805a [X86][SSE] Simplify demanded elements from BROADCAST shuffle source.
If broadcasting from another shuffle, attempt to simplify it.

We can probably generalize this a lot more (embedding in combineX86ShufflesRecursively), but BROADCAST is one of the more troublesome as it accepts inputs of different sizes to the result.

llvm-svn: 323602
2018-01-27 19:48:13 +00:00
Craig Topper 8a444ee67c [X86] Use vpternlog to implement vector not under AVX512.
Previously we had to materialize all 1s in a register using vpternlog or pcmpeq and then xor with that. By using vpternlog directly we can do it in one operation.

This is implemented using isel patterns, but we should maybe consider creating a generalized vpternlog combiner.

llvm-svn: 323572
2018-01-26 22:17:40 +00:00
Richard Trieu 8610c9f43a Inline variable only used within assert.
llvm-svn: 323569
2018-01-26 21:55:13 +00:00
Krzysztof Parzyszek 90ca4e8b0c [Hexagon] Generate constant splats instead of loads from constant pool
llvm-svn: 323568
2018-01-26 21:54:56 +00:00
Krzysztof Parzyszek d4273abb69 [Hexagon] Make sure that offset on globals matches alignment requirements
A correctly aligned address may happen to be separated into a variable
part and a constant part, where the constant part does not match the
alignment needed in a load/store that uses this address. Such a constant
cannot be used as an immediate offset in an indexed instruction.

When lowering a global address, make sure that if there is an offset
folded into the global, the offset is valid for all uses in load/store
instructions.

llvm-svn: 323562
2018-01-26 21:20:04 +00:00
Krzysztof Parzyszek 95614acc24 [Hexagon] Replace multiple vector extracts with store-load combinations
llvm-svn: 323561
2018-01-26 21:17:14 +00:00
Benjamin Kramer a03d3198ee [X86] Unbreak the build.
X86ISelLowering.cpp:34130:5: error: return type 'llvm::SDValue' must
match previous return type 'const llvm::SDValue' when lambda expression
has unspecified explicit return type

llvm-svn: 323557
2018-01-26 20:16:43 +00:00
Craig Topper d4795b700d [X86] Allow any_extend to be combined with setcc on VLX targets.
For VLX target getSetccResultType returns vXi1 which prevents the target independent DAG combine from doing this tranform itself.

llvm-svn: 323555
2018-01-26 20:02:52 +00:00
Simon Pilgrim 8e9becbd81 [X86][AVX512] Add combining support for X86ISD::VTRUNCS
Similar to the existing support for X86ISD::VTRUNCUS.

Differential Revision: https://reviews.llvm.org/D42544

llvm-svn: 323553
2018-01-26 20:01:12 +00:00
Craig Topper 8f324bb1a4 [SelectionDAGISel] Add a debug print before call to Select. Adjust where blank lines are printed during isel process to make things more sensibly grouped.
Previously some targets printed their own message at the start of Select to indicate what they were selecting. For the targets that didn't, it means there was no print of the root node before any custom handling in the target executed. So if the target did something custom and never called SelectNodeCommon, no print would be made. For the targets that did print a message in Select, if they didn't custom handle a node SelectNodeCommon would reprint the root node before walking the isel table.

It seems better to just print the message before the call to Select so all targets behave the same. And then remove the root node printing from SelectNodeCommon and just leave a message that says we're starting the table search.

There were also some oddities in blank line behavior. Usually due to a \n after a call to SelectionDAGNode::dump which already inserted a new line.

llvm-svn: 323551
2018-01-26 19:34:20 +00:00
Craig Topper b207dd6870 [X86] Add 'rdrnd' feature to silvermont to match recent gcc bug fix.
gcc recently fixed this bug https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83546

llvm-svn: 323550
2018-01-26 19:34:14 +00:00
Krzysztof Parzyszek 1a1edbfb04 [Hexagon] Fix an incorrect assertion in HexagonConstExtenders
llvm-svn: 323548
2018-01-26 19:20:50 +00:00
Sanjay Patel b8ae262bd3 [x86] fix typo in comment; NFC
llvm-svn: 323545
2018-01-26 18:44:32 +00:00
Simon Pilgrim 1b14bdc0b8 [X86][AVX] LowerBUILD_VECTORAsVariablePermute - add support for VPERMILPV to v4i32/v4f32
Extension to D42431, adding support for v4i32/v4f32 as well as v2i64/v2f64 now that D42308 has landed

llvm-svn: 323542
2018-01-26 17:19:59 +00:00
Simon Pilgrim 76ede609f6 [X86][SSE] Don't colaesce v4i32 extracts
We currently coalesce v4i32 extracts from all 4 elements to 2 v2i64 extracts + shifts/sign-extends.

This seems to have been added back in the days when we tended to spill vectors and reload scalars, or ended up with repeated shuffles moving everything down to 0'th index. I don't think either of these are likely these days as we have better EXTRACT_VECTOR_ELT and VECTOR_SHUFFLE handling, and the existing code tends to make it very difficult for various vector and load combines.

Differential Revision: https://reviews.llvm.org/D42308

llvm-svn: 323541
2018-01-26 17:11:34 +00:00
Simon Pilgrim d567c27c84 [X86][SSE] Drop PMADDWD in lowerMul
As mentioned in D42258, we don't need this any more

llvm-svn: 323540
2018-01-26 16:57:36 +00:00
Dmitry Preobrazhensky 706828157f [AMDGPU][MC] Added validation of image dst/data size (must match dmask and tfe)
See bug 36000: https://bugs.llvm.org/show_bug.cgi?id=36000

Differential Revision: https://reviews.llvm.org/D42483

Reviewers: vpykhtin, artem.tamazov, arsenm
llvm-svn: 323538
2018-01-26 16:42:51 +00:00
Alexander Richardson 1f9636f3ef [MIPS] Don't crash on unsized extern types with -mgpopt
Summary: This fixes an assertion when building the FreeBSD MIPS64 kernel.

Reviewers: atanasyan, sdardis, emaste

Reviewed By: sdardis

Subscribers: krytarowski, llvm-commits

Differential Revision: https://reviews.llvm.org/D42571

llvm-svn: 323536
2018-01-26 15:56:14 +00:00
Dmitry Preobrazhensky 0b4eb1ead1 [AMDGPU][MC] Added support of 64-bit image atomics
See bug 35998: https://bugs.llvm.org/show_bug.cgi?id=35998

Differential Revision: https://reviews.llvm.org/D42469

Reviewers: vpykhtin, artem.tamazov, arsenm
llvm-svn: 323534
2018-01-26 15:43:29 +00:00
Dmitry Preobrazhensky 6cb42e7622 [AMDGPU][MC] Enabled disassembler for image atomic operations
See bug 35988: https://bugs.llvm.org/show_bug.cgi?id=35988

Differential Revision: https://reviews.llvm.org/D42186

Reviewers: vpykhtin, artem.tamazov, arsenm
llvm-svn: 323527
2018-01-26 14:07:38 +00:00
Simon Pilgrim 445d7c0e5c [X86] Cleanup SDLoc arguments as mentioned on D42544
llvm-svn: 323526
2018-01-26 14:00:01 +00:00
Daniil Fukalov 6e1dc68117 [AMDGPU] fix LDS f32 intrinsics
- using qualified pointer addrspace in intrinsics class to avoid .f32 mangling
- changed too common atomic mangling to ds
- added missing intrinsics to AMDGPUTTIImpl::getTgtMemIntrinsic

Reviewed by: b-sumner

Differential Revision: https://reviews.llvm.org/D42383

llvm-svn: 323516
2018-01-26 11:09:38 +00:00
Momchil Velikov d2cc6fd90b [ARM] Accept a subset of Thumb GPR register class when emitting an SP-relative
load instruction

The function `Thumb1InstrInfo::loadRegFromStackSlot` accepts only the `tGPR`
register class. The function serves to emit a `tLDRspi` instruction and
certainly any subset of the `tGPR` register class is a valid destination of the
load.

Differential revision: https://reviews.llvm.org/D42535

llvm-svn: 323514
2018-01-26 10:20:58 +00:00
Sjoerd Meijer 011de9c0ca [ARM] Armv8.2-A FP16 code generation (part 1/3)
This is the groundwork for Armv8.2-A FP16 code generation .

Clang passes and returns _Float16 values as floats, together with the required
bitconverts and truncs etc. to implement correct AAPCS behaviour, see D42318.
We will implement half-precision argument passing/returning lowering in the ARM
backend soon, but for now this means that this:

_Float16 sub(_Float16 a, _Float16 b) {
  return a + b;
}

gets lowered to this:

define float @sub(float %a.coerce, float %b.coerce) {
entry:
  %0 = bitcast float %a.coerce to i32
  %tmp.0.extract.trunc = trunc i32 %0 to i16
  %1 = bitcast i16 %tmp.0.extract.trunc to half
  <SNIP>
  %add = fadd half %1, %3
  <SNIP>
}

When FullFP16 is *not* supported, we don't make f16 a legal type, and we get
legalization for "free", i.e. nothing changes and everything works as before.
And also f16 argument passing/returning is handled.

When FullFP16 is supported, we do make f16 a legal type, and have 2 places that
we need to patch up: f16 argument passing and returning, which involves minor
tweaks to avoid unnecessary code generation for some bitcasts.

As a "demonstrator" that this works for the different FP16, FullFP16, softfp
modes, etc., I've added match rules to the VSUB instruction description showing
that we can codegen this instruction from IR, but more importantly, also to
some conversion instructions. These conversions were causing issue before in
the FP16 and FullFP16 cases.

I've also added match rules to the VLDRH and VSTRH desriptions, so that we can
actually compile the entire half-precision sub code example above. This showed
that these loads and stores had the wrong addressing mode specified: AddrMode5
instead of AddrMode5FP16, which turned out not be implemented at all, so that
has also been added.

This is the minimal patch that shows all the different moving parts. In patch
2/3 I will add some efficient lowering of bitcasts, and in 2/3 I will add the
remaining Armv8.2-A FP16 instruction descriptions.


Thanks to Sam Parker and Oliver Stannard for their help and reviews!


Differential Revision: https://reviews.llvm.org/D38315

llvm-svn: 323512
2018-01-26 09:26:40 +00:00
Hiroshi Inoue 0909ca132f [NFC] fix trivial typos in comments and documents
"in in" -> "in", "on on" -> "on" etc.

llvm-svn: 323508
2018-01-26 08:15:29 +00:00
Shiva Chen 056d835fa4 [RISCV] Encode RISCV specific ELF e_flags to RISCV Binary by RISCVTargetStreamer
llvm-svn: 323507
2018-01-26 07:53:07 +00:00
Craig Topper 882f0d7955 [X86] Remove dead code from LowerBUILD_VECTOR that tried to handle i64 element type in 32-bit mode.
Type legalization would prevent any i64 operands to the build_vector from existing before we get here. The coverage bots show this code as uncovered.

llvm-svn: 323506
2018-01-26 07:30:44 +00:00
Craig Topper 77c5077585 [X86] Remove code from combineBitcastvxi1 that was needed to support the previous native IR for kunpck intrinsics.
The original autoupgrade for kunpck intrinsics used a bitcasted scalar shift, or, and. This combine would turn this into a concat_vectors. Now the kunpck intrinsics are autoupgraded to a vector shuffle that will become a concat_vectors.

llvm-svn: 323504
2018-01-26 07:15:21 +00:00
Craig Topper 95e8c9143e [X86] Remove unused intrinsic type handling. NFC
llvm-svn: 323503
2018-01-26 07:15:20 +00:00
Craig Topper ccb35dfda6 [X86] Simplify condition in VSETCC. NFC
This listed all legal 128-bit integer types individually, but since we already know we have a legal type and its integer, we can just check is128BitVector.

llvm-svn: 323502
2018-01-26 07:15:18 +00:00
Craig Topper faa56f7b08 [X86] Remove LowerVSETCC code for handling vXi1 setcc with vXi8/vXi16 input type. NFC
These kinds of setccs are promoted by a DAG combine before they ever get to legalization.

llvm-svn: 323501
2018-01-26 07:15:17 +00:00
Craig Topper ad8ce0b800 [X86] Remove some dead code from LowerVSETCC. NFC
This code was added in r321967, but ultimately I fixed the issue in the legalizer and this code was no longer required.

llvm-svn: 323500
2018-01-26 07:15:16 +00:00
Serguei Katkov 1ce7137c99 [X86] Fix killed flag handling in X86FixupLea pass
When pass creates a MOV instruction for 
lea (%base,%index,1), %dst => mov %base,%dst; add %index,%dst
modification it should clean the killed flag for base
if base is equal to index.

Otherwise verifier complains about usage of killed register in add instruction.

Reviewers: lsaba, zvi, zansari, aaboud
Reviewed By: lsaba
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D42522

llvm-svn: 323497
2018-01-26 04:49:26 +00:00
Joel Jones 0715092c65 [AArch64] Enable aggressive FMA on T99 and provide AArch64 options for others.
This patch enables aggressive FMA by default on T99, and provides a -mllvm
option to enable the same on other AArch64 micro-arch's (-mllvm
-aarch64-enable-aggressive-fma).

Test case demonstrating the effects on T99 is included.

Patch by: steleman (Stefan Teleman)

Differential Revision: https://reviews.llvm.org/D40696

llvm-svn: 323474
2018-01-25 21:55:39 +00:00
Craig Topper 6fd634b11b [X86] Teach Intel syntax InstPrinter to print lock prefixes that have been parsed from the asm parser.
The asm parser puts the lock prefix in the MCInst flags so we need to check that in addition to TSFlags. This matches what the ATT printer does.

llvm-svn: 323469
2018-01-25 21:23:57 +00:00
Craig Topper 4abd60ab64 [X86] Combine two unnecessarily complicated ifs that had the same body. NFC
llvm-svn: 323468
2018-01-25 21:23:51 +00:00
Krzysztof Parzyszek b2c458e648 [Hexagon] SETEQ and SETNE are valid integer condition codes
llvm-svn: 323452
2018-01-25 18:07:27 +00:00
Simon Pilgrim 09c56b799f [X86] Apply clang-format to detectUSatPattern. NFCI.
Cleanup from D42544

llvm-svn: 323439
2018-01-25 16:38:56 +00:00
Krzysztof Parzyszek 16610b0a57 Revert "[Hexagon] Replace EmitFunctionEntryCode with a DAG preprocessing code"
This reverts r323374. The fix needs a different approach.

llvm-svn: 323438
2018-01-25 16:36:53 +00:00
Craig Topper b369cdbaad [X86] Expand IMUL/MUL instregexs in Intel scheduler models. Add load latency to some of them in SkylakeClient model.
The regular expressions and the imul names caused some instructions to be matched by multiple regexs creating unpredictable results.

This changes them all to use explicit instrs instead.

While doing this I also found that some instructions in Skylake were missing load latency so I fixed that too.

llvm-svn: 323406
2018-01-25 06:57:42 +00:00
Craig Topper 795b17f4fb [X86] Expand IMUL/MUL instregexs in Znver1 scheduler to show what's actually implemented.
The IMUL instruction names mixed with the prefix matching of the instregex lead to some strange matches. The worst being that several memory instructions are using the register form latency.

I don't know what the right answer is, so I've left TODOs and will try to work with the AMD folks to get this cleaned up.

llvm-svn: 323405
2018-01-25 06:57:39 +00:00
Craig Topper 066e73762d [X86] Name the MMX phaddd instruction with 3 Ds instead of just 2. NFC
llvm-svn: 323403
2018-01-25 04:45:32 +00:00
Craig Topper dbddac0915 [X86] Remove 64/128/256 from MMX/SSE/AVX instruction names for overall consistency. NFC
MMX instrutions all start with MMX_ so the 64 isn't needed for disambigutation.
SSE/AVX1 instructions are assumed 128-bit so we don't need to say 128.
AVX2 instructions should use a Y to indicate 256-bits.

llvm-svn: 323402
2018-01-25 04:45:30 +00:00
Craig Topper 81c87092d1 [X86] Remove unnecessary '_alt' and '_Int' from scheduler model regular expressions.
These were treated as optional suffixes, but the regular expressions are already prefix matches so this is unnecessary. It breaks the binary search optimization in tablegen due to the top level question mark.

llvm-svn: 323401
2018-01-25 04:45:28 +00:00
Krzysztof Parzyszek 14f3ef1f0e [Hexagon] Replace EmitFunctionEntryCode with a DAG preprocessing code
The code in EmitFunctionEntryCode needs to know the maximum stack
alignment, but it runs very early in the selection process (before
lowering). The final stack alignment may change during lowering, so
the code needs to be moved to where the alignment is known.

llvm-svn: 323374
2018-01-24 21:19:51 +00:00
Amara Emerson 4f84f8862b [AArch64][GlobalISel] Fall back during AArch64 isel if we have a volatile load.
The tablegen imported patterns for sext(load(a)) don't check for single uses
of the load or delete the original after matching. As a result two loads are
left in the generated code. This particular issue will be fixed by adding
support for a G_SEXTLOAD opcode in future.

There are however other potential issues around this that wouldn't be fixed by
a G_SEXTLOAD, so until we have a proper solution we don't try to handle volatile
loads at all in the AArch64 selector.

Fixes/works around PR36018.

llvm-svn: 323371
2018-01-24 20:35:37 +00:00
Simon Pilgrim 9f551ad604 [X86][SSE] Aggressively use PMADDWD for v4i32 multiplies with 17 or more leading zeros
As discussed in D41484, PMADDWD for 'zero extended' vXi32 is nearly always a better option than PMULLD:
On SNB it will result in code that isn't any faster, but not any slower so we may as well keep it.
On KNL it only has half the throughput, so I've disabled it on there - ideally there'd be a better way than this.

Differential Revision: https://reviews.llvm.org/D42258

llvm-svn: 323367
2018-01-24 19:20:02 +00:00
Geoff Berry c4796d4745 [AMDGPU] Make sure all super regs of reserved regs are marked reserved.
Summary:
Move reserveRegisterTuples into AMDGPURegisterInfo and use it in
R600RegisterInfo::getReservedRegs and
R600InstrInfo::reserveIndirectRegisters to ensure that all super
registers of reserved registers are also marked as reserved.

Before this change, under certain circumstances, the registers %t1_x and
%t1_xyzw would be marked as reserved, but %t1_xy and %t1_xyz would not
be, leading to the register allocator sometimes assigning a register to
%t1_xy, which is invalid since %t1_x is reserved.

Reviewers: arsenm, tstellar, MatzeB, qcolombet

Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, mcrosier, llvm-commits

Differential Revision: https://reviews.llvm.org/D42448

llvm-svn: 323356
2018-01-24 18:09:53 +00:00
Weiming Zhao 665784f170 [ARM] Expand long shifts for Thumb1 to __aeabi_ calls
Summary: For long shifts, the inlined version takes about 20 instructions on Thumb1. To avoid the code bloat, expand to __aeabi_ calls if target is Thumb1.

Reviewers: samparker

Reviewed By: samparker

Subscribers: samparker, aemerson, javed.absar, kristof.beyls, llvm-commits

Differential Revision: https://reviews.llvm.org/D42401

llvm-svn: 323354
2018-01-24 18:00:57 +00:00
Craig Topper 05af43fbad [X86] Fix some inconsistencies in the itineraries and Sched for (V)PEXTRW/(V)PINSRW
The weirdest being that PEXTRWrr was tagged as a memory operation.

llvm-svn: 323353
2018-01-24 17:58:57 +00:00
Craig Topper b85b484fee [X86] Adjust names of PINSRW/PEXTRW intructions between MMX/SSE/AVX/AVX512 for consistency and to maybe enable more regular expression compaction in the scheduler models. NFCI
llvm-svn: 323352
2018-01-24 17:58:51 +00:00
Craig Topper 23cc866c97 [X86] Remove '(_REV)?' from a bunch of scheduler regular expressions. NFC
The regexs are treated as a prefix match already so the checking for optional text at the end provides no value. Instead it prevents the binary search optimization in tablegen from kicking in due to the top level question mark.

llvm-svn: 323351
2018-01-24 17:58:42 +00:00
Krzysztof Parzyszek cf3ad5841b [Hexagon] Run late copy propagation and dead code elimination passes
llvm-svn: 323346
2018-01-24 17:48:11 +00:00
Pablo Barrio 9b3d4c01a0 [AArch64] Avoid unnecessary vector byte-swapping in big-endian
Summary:
Loads/stores of some NEON vector types are promoted to other vector
types with different lane sizes but same vector size. This is not a
problem in little-endian but, when in big-endian, it requires
additional byte reversals required to preserve the lane ordering
while keeping the right endianness of the data inside each lane.
For example:

%1 = load <4 x half>, <4 x half>* %p

results in the following assembly:

ld1 { v0.2s }, [x1]
rev32 v0.4h, v0.4h

This patch changes the promotion of these loads/stores so that the
actual vector load/store (LD1/ST1) takes care of the endianness
correctly and there is no need for further byte reversals. The
previous code now results in the following assembly:

ld1 { v0.4h }, [x1]

Reviewers: olista01, SjoerdMeijer, efriedma

Reviewed By: efriedma

Subscribers: aemerson, rengolin, javed.absar, llvm-commits, kristof.beyls

Differential Revision: https://reviews.llvm.org/D42235

llvm-svn: 323325
2018-01-24 14:13:47 +00:00
Krzysztof Parzyszek 5aef4b5997 [Hexagon] Remove unused HexagonISD opcodes, NFC
llvm-svn: 323324
2018-01-24 14:07:37 +00:00
Simon Pilgrim f26df47831 [X86][SSE] Avoid calls to combineX86ShufflesRecursively that can't combine to target shuffles (PR32037)
Don't bother making recursive calls to combineX86ShufflesRecursively if we have more shuffle source operands than will be combined together with the remaining recursive depth.

See https://bugs.llvm.org/show_bug.cgi?id=32037#c26 and https://bugs.llvm.org/show_bug.cgi?id=32037#c27 for the reduction in compile times from this patch.

Differential Revision: https://reviews.llvm.org/D42378

llvm-svn: 323320
2018-01-24 11:41:09 +00:00
Martin Storsjo 4ed94a06ac [ARM] Call __chkstk for dynamic stack allocation in all windows environments
This matches what MSVC does for alloca() function calls on ARM.
Even if MSVC doesn't support VLAs at the language level, it does
support the alloca function.

On the clang level, both the _alloca() (when emulating MSVC, which is
what the alloca() function expands to) and __builtin_alloca() builtin
functions, and VLAs, map to the same LLVM IR "alloca" function - so
within LLVM they're not distinguishable from each other.

Differential Revision: https://reviews.llvm.org/D42292

llvm-svn: 323308
2018-01-24 06:40:11 +00:00
Craig Topper 069e1dd861 [X86] Move 'Y' to correct place in FMA4 regular expression in Znver1 scheduler model.
I think these instructions used to be named differently and the regular expression reflected that. I guess we must have correct itinerary information that made this not matter for the scheduler test?

llvm-svn: 323305
2018-01-24 05:32:51 +00:00
Craig Topper a55ac7b790 [X86] Rename 256-bit VFRCZ instructions to have the Y before the rr/rm to match other instructions. NFC
llvm-svn: 323304
2018-01-24 05:14:39 +00:00
Craig Topper fd68c2d0ae [X86] Remove redundant regular expression from the Znver1 scheduler model. NFC
llvm-svn: 323303
2018-01-24 05:14:33 +00:00
Hiroshi Inoue 501931b117 [NFC] fix trivial typos in comments
"the the" -> "the"

llvm-svn: 323302
2018-01-24 05:04:35 +00:00
Craig Topper 0321ebc054 [X86] Use ISD::SIGN_EXTEND instead of X86ISD::VSEXT for mask to xmm/ymm/zmm conversion
There are a couple tricky things with this patch.

I had to add an override of isVectorLoadExtDesirable to stop DAG combine from combining sign_extend with loads after legalization since we legalize sextload using a load+sign_extend. Overriding this hook actually prevents a lot sextloads from being created in the first place.

I also had to add isel patterns because DAG combine blindly combines sign_extend+truncate to a smaller sign_extend which defeats what legalization was trying to do.

Differential Revision: https://reviews.llvm.org/D42407

llvm-svn: 323301
2018-01-24 04:51:17 +00:00
Rafael Espindola 432a587cf0 Don't assume a null GV is local for ELF and MachO.
This is already a simplification, and should help with avoiding a plt
reference when calling an intrinsic with -fno-plt.

With this change we return false for null GVs, so the caller only
needs to check the new metadata to decide if it should use foo@plt or
*foo@got.

llvm-svn: 323297
2018-01-24 02:11:18 +00:00
Eric Christopher a8bdf5328d Remove set but unused variable IsUndef.
llvm-svn: 323295
2018-01-24 01:51:57 +00:00
Zvi Rackover b5447b1e7c X86: Update isVectorShiftByScalarCheap with cases covered by AVX512BW
Summary:
AVX512BW adds support for variable shift amount for 16-bit element
vectors.

Reviewers: craig.topper, RKSimon, spatel

Reviewed By: RKSimon

Subscribers: rengolin, tschuett, llvm-commits

Differential Revision: https://reviews.llvm.org/D42437

llvm-svn: 323292
2018-01-24 01:36:40 +00:00
Matthias Braun 70fd374d1e AArch64: Cyclone: Remove SlowMisaligned128Store tuning flag
Remove FeatureSlowMisaligned128Store from cyclone flags.
This flag causes splitting of 16 byte wide stores into 2 stored of 8
bytes. This was useful on older apple CPUs which were slow for 16byte
stores that were not aligned on 16byte. As the compiler often cannot
predict the actual alignment, the splitting was choosen.

This has been a topic for a lot of debate as the splitting also
decreases performance for some benchmarks. Measuring the effects on
newer apple chips (rdar://35525421) shows that it harms more cases than
it helps. So it is time to retire this workaround.

llvm-svn: 323289
2018-01-24 00:39:53 +00:00
Tim Shen 7abe9887b0 [PPC] Avoid incorrect fp-i128-fp lowering.
Summary:
Fix an issue that's similar to what D41411 fixed:
  float(__int128(float_var)) shouldn't be optimized to xscvdpsxds +
  xscvsxdsp, as they mean (float)(int64_t)float_var.

Reviewers: jtony, hfinkel, echristo

Subscribers: sanjoy, nemanjai, hiraditya, llvm-commits, kbarton

Differential Revision: https://reviews.llvm.org/D42400

llvm-svn: 323270
2018-01-23 22:06:57 +00:00
Craig Topper 1e42a4a735 [X86] Merge some regular expressions in Zen scheduler model and remove 2 unused classes.
I don't know if the unused classes were intended to be used and that the VEX version is really different than the legacy SSE version. Agner's tables don't show any differences. I'm just cleaning up assuming the current behavior is correct.

llvm-svn: 323263
2018-01-23 21:37:56 +00:00
Craig Topper 3067f4ddca [X86] Remove 'Int_' from instregexs in Zen scheduler model.
No instructions have Int_ at the beginning. It's always at the end now. So it should be picked up as a prefix match

llvm-svn: 323262
2018-01-23 21:37:54 +00:00
Craig Topper 002657731b [X86] Move 'Int_' to the end of the name of the VCOMISS/VUCOMISS and instructions to get them picked up by the scheduler model regexs.
All other intrinsic instructions put the _Int on the end. This make these instructions consistent and gets the prefix instregexs in the scheduler models to pick them up.

llvm-svn: 323261
2018-01-23 21:37:51 +00:00
Simon Pilgrim 2cc74ed2be [X86][AVX] LowerBUILD_VECTORAsVariablePermute - add support for VPERMILPV to v2i64/v2f64
Minor refactor to make it possible for LowerBUILD_VECTORAsVariablePermute to be used with a wider variety of shuffles op and types.

I'd have liked to add v4i32/v4f32 support as well but we don't see v4i32 index extractions at the moment (which is why I created D42308)

After this I intend to begin adding scaling support for PSHUFB (v8i16, v4i32, v2i64)) and VPERMPS (v4f64, v4i64).

Differential Revision: https://reviews.llvm.org/D42431

llvm-svn: 323260
2018-01-23 21:33:24 +00:00
Simon Pilgrim c1e2290d37 Fix MSVC "result of 32-bit shift implicitly converted to 64 bits" warning. NFCI.
llvm-svn: 323258
2018-01-23 21:22:16 +00:00
Krzysztof Parzyszek d5e8a260bb [Hexagon] Add patterns for sext_inreg of HVX vector types
llvm-svn: 323250
2018-01-23 19:56:16 +00:00
Krzysztof Parzyszek 275ffa4679 [Hexagon] Implement hasLoadFromStackSlot and hasStoreToStackSlot
If the instruction is a bundle, check the instructions inside of it.

Patch by Suyog Sarda.

llvm-svn: 323240
2018-01-23 19:08:40 +00:00
Krzysztof Parzyszek ae3e934bd6 [Hexagon] Fix unused variable warning in release build
llvm-svn: 323233
2018-01-23 18:16:52 +00:00
Krzysztof Parzyszek 3780a0e1fa [Hexagon] Implement basic vector operations on vectors vNi1
In addition to that, make sure that there are no boolean vector types that
are associated with multiple register classes. Specifically, remove v32i1
and v64i1 from integer register classes. These types will correspond to
results of vector comparisons, and as such should belong to the vector
predicate class. Having them in scalar registers as well makes legalization
ambiguous.

llvm-svn: 323229
2018-01-23 17:53:59 +00:00
Simon Pilgrim 6ff241fc99 [X86][SSE] LowerBUILD_VECTORAsVariablePermute - extract subvector from oversized index vectors
llvm-svn: 323223
2018-01-23 17:02:15 +00:00
Dan Gohman 5464941a6a [WebAssembly] Add mem.* intrinsics.
The grow_memory and current_memory instructions are expected to be
officially renamed to mem.grow and mem.size. Introduce new intrinsics
with the new names. These new names aren't yet official, so for now,
use them at your own risk.

Also, take this opportunity to add arguments for the currently unused
immediate field in those instructions.

llvm-svn: 323222
2018-01-23 17:02:02 +00:00
Craig Topper c58c2b5c9b [X86] Rewrite vXi1 element insertion by using a vXi1 scalar_to_vector and inserting into a vXi1 vector.
The existing code was already doing something very similar to subvector insertion so this allows us to remove the nearly duplicate code.

This patch is a little larger than it should be due to differences between the DQI handling between the two today.

llvm-svn: 323212
2018-01-23 15:56:36 +00:00
Simon Pilgrim 0c9f77a9f9 [X86][SSE] LowerBUILD_VECTORAsVariablePermute - ensure that the source vector is not larger than the destination
We might be able to support this in the future with VPERMV3, OR(PSHUFB, PSHUFB) etc.

llvm-svn: 323210
2018-01-23 15:51:03 +00:00
Simon Pilgrim 9b4a097f94 Use EVT::changeVectorElementTypeToInteger() to convert index type to integer
llvm-svn: 323207
2018-01-23 15:30:07 +00:00
Simon Pilgrim e2905c8a0c [X86][SSE] LowerBUILD_VECTORAsVariablePermute - ensure that the index vector has the correct number of elements
llvm-svn: 323206
2018-01-23 15:13:37 +00:00
Tim Northover f9b560aa8e AArch64: get type from correct result when forming BFX
Some nodes produce multiple values so when obtaining the type of an ISD::OR we
need to make sure we ask for the correct one. Hopefully that's all of them.

llvm-svn: 323205
2018-01-23 15:11:27 +00:00
Tim Northover 9f3003d08f AArch64: get type from correct result when forming BFI/BFM
Some nodes produce multiple values so when obtaining the type of an ISD::OR we
need to make sure we ask for the correct one.

llvm-svn: 323202
2018-01-23 14:37:03 +00:00
Craig Topper 76adcc86cd [X86] Legalize v32i1 without BWI via splitting to v16i1 rather than the default of promoting to v32i8.
Summary:
For the most part its better to keep v32i1 as a mask type of a narrower width than trying to promote it to a ymm register.

I had to add some overrides to the methods that get the types for the calling convention so that we still use v32i8 for argument/return purposes.

There are still some regressions in here. I definitely saw some around shuffles. I think we probably should move vXi1 shuffle from lowering to a DAG combine where I think the extend and truncate we have to emit would be better combined.

I think we also need a DAG combine to remove trunc from (extract_vector_elt (trunc))

Overall this removes something like 13000 CHECK lines from lit tests.

Reviewers: zvi, RKSimon, delena, spatel

Reviewed By: RKSimon

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D42031

llvm-svn: 323201
2018-01-23 14:25:39 +00:00
Craig Topper c2df6409c7 [X86] Add missing MOVSX/MOVZX instructions to load folding tables.
I'm not sure there's any way to generate these folding cases especially the movzx ones since even the register form is never emitted by codegen.

I'm just adding them to remove the difference with the autogenerated version of the folding table.

llvm-svn: 323200
2018-01-23 14:09:22 +00:00
Simon Pilgrim 8ea1a0c690 [X86][SSE] LowerBUILD_VECTORAsVariablePermute - fix PSHUFB source/index operand ordering
As detailed in rL317463, PSHUFB (like most variable shuffle instructions) uses Op[0] for the source vector and Op[1] for the shuffle index vector, VPERMV works in reverse which is probably where the confusion comes from.

Differential Revision: https://reviews.llvm.org/D42380

llvm-svn: 323190
2018-01-23 11:39:06 +00:00
Stefan Maksimovic 98749e0249 [mips] Properly select abs and sqrt instructions
- Alter abs for micromips to have both AFGR64 and FGR64
  variants, same as sqrt
- Remove sqrt and abs from MicroMips32r6InstrInfo.td,
  use micromips FGR64 variants
- Restrict non-micromips abs/sqrt with NotInMicroMips
  predicate

Differential revision: https://reviews.llvm.org/D41439

llvm-svn: 323184
2018-01-23 10:09:39 +00:00
Craig Topper c92edd994e [X86] Don't reorder (srl (and X, C1), C2) if (and X, C1) can be matched as a movzx
Summary:
If we can match as a zero extend there's no need to flip the order to get an encoding benefit. As movzx is 3 bytes with independent source/dest registers. The shortest 'and' we could make is also 3 bytes unless we get lucky in the register allocator and its on AL/AX/EAX which have a 2 byte encoding.

This patch was more impressive before r322957 went in. It removed some of the same Ands that got deleted by that patch.

Reviewers: spatel, RKSimon

Reviewed By: spatel

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D42313

llvm-svn: 323175
2018-01-23 05:45:52 +00:00
Craig Topper e5aea25980 [X86] Remove 'NOREX' comment from the printing of _NOREX instructions.
Some of the NOREX instructions are used in 32-bit mode making this printing confusing. It also doesn't provide a lot of value since you can see the h-register being used by the instruction.

llvm-svn: 323174
2018-01-23 05:37:00 +00:00
Craig Topper 26a701f24f [X86] Various vXi1 insertion improvements.
Add missing patterns for inserting v1i1 into a zero vector. Use insert_subvector to zero upper bits before inserting an element into a vXi1 vector. Replace kshift based isel pattern with insert_subvector based pattern now that code that caused the pattern has been fixed to emit insert_subvector.

llvm-svn: 323173
2018-01-23 05:36:53 +00:00
Chandler Carruth c58f2166ab Introduce the "retpoline" x86 mitigation technique for variant #2 of the speculative execution vulnerabilities disclosed today, specifically identified by CVE-2017-5715, "Branch Target Injection", and is one of the two halves to Spectre..
Summary:
First, we need to explain the core of the vulnerability. Note that this
is a very incomplete description, please see the Project Zero blog post
for details:
https://googleprojectzero.blogspot.com/2018/01/reading-privileged-memory-with-side.html

The basis for branch target injection is to direct speculative execution
of the processor to some "gadget" of executable code by poisoning the
prediction of indirect branches with the address of that gadget. The
gadget in turn contains an operation that provides a side channel for
reading data. Most commonly, this will look like a load of secret data
followed by a branch on the loaded value and then a load of some
predictable cache line. The attacker then uses timing of the processors
cache to determine which direction the branch took *in the speculative
execution*, and in turn what one bit of the loaded value was. Due to the
nature of these timing side channels and the branch predictor on Intel
processors, this allows an attacker to leak data only accessible to
a privileged domain (like the kernel) back into an unprivileged domain.

The goal is simple: avoid generating code which contains an indirect
branch that could have its prediction poisoned by an attacker. In many
cases, the compiler can simply use directed conditional branches and
a small search tree. LLVM already has support for lowering switches in
this way and the first step of this patch is to disable jump-table
lowering of switches and introduce a pass to rewrite explicit indirectbr
sequences into a switch over integers.

However, there is no fully general alternative to indirect calls. We
introduce a new construct we call a "retpoline" to implement indirect
calls in a non-speculatable way. It can be thought of loosely as
a trampoline for indirect calls which uses the RET instruction on x86.
Further, we arrange for a specific call->ret sequence which ensures the
processor predicts the return to go to a controlled, known location. The
retpoline then "smashes" the return address pushed onto the stack by the
call with the desired target of the original indirect call. The result
is a predicted return to the next instruction after a call (which can be
used to trap speculative execution within an infinite loop) and an
actual indirect branch to an arbitrary address.

On 64-bit x86 ABIs, this is especially easily done in the compiler by
using a guaranteed scratch register to pass the target into this device.
For 32-bit ABIs there isn't a guaranteed scratch register and so several
different retpoline variants are introduced to use a scratch register if
one is available in the calling convention and to otherwise use direct
stack push/pop sequences to pass the target address.

This "retpoline" mitigation is fully described in the following blog
post: https://support.google.com/faqs/answer/7625886

We also support a target feature that disables emission of the retpoline
thunk by the compiler to allow for custom thunks if users want them.
These are particularly useful in environments like kernels that
routinely do hot-patching on boot and want to hot-patch their thunk to
different code sequences. They can write this custom thunk and use
`-mretpoline-external-thunk` *in addition* to `-mretpoline`. In this
case, on x86-64 thu thunk names must be:
```
  __llvm_external_retpoline_r11
```
or on 32-bit:
```
  __llvm_external_retpoline_eax
  __llvm_external_retpoline_ecx
  __llvm_external_retpoline_edx
  __llvm_external_retpoline_push
```
And the target of the retpoline is passed in the named register, or in
the case of the `push` suffix on the top of the stack via a `pushl`
instruction.

There is one other important source of indirect branches in x86 ELF
binaries: the PLT. These patches also include support for LLD to
generate PLT entries that perform a retpoline-style indirection.

The only other indirect branches remaining that we are aware of are from
precompiled runtimes (such as crt0.o and similar). The ones we have
found are not really attackable, and so we have not focused on them
here, but eventually these runtimes should also be replicated for
retpoline-ed configurations for completeness.

For kernels or other freestanding or fully static executables, the
compiler switch `-mretpoline` is sufficient to fully mitigate this
particular attack. For dynamic executables, you must compile *all*
libraries with `-mretpoline` and additionally link the dynamic
executable and all shared libraries with LLD and pass `-z retpolineplt`
(or use similar functionality from some other linker). We strongly
recommend also using `-z now` as non-lazy binding allows the
retpoline-mitigated PLT to be substantially smaller.

When manually apply similar transformations to `-mretpoline` to the
Linux kernel we observed very small performance hits to applications
running typical workloads, and relatively minor hits (approximately 2%)
even for extremely syscall-heavy applications. This is largely due to
the small number of indirect branches that occur in performance
sensitive paths of the kernel.

When using these patches on statically linked applications, especially
C++ applications, you should expect to see a much more dramatic
performance hit. For microbenchmarks that are switch, indirect-, or
virtual-call heavy we have seen overheads ranging from 10% to 50%.

However, real-world workloads exhibit substantially lower performance
impact. Notably, techniques such as PGO and ThinLTO dramatically reduce
the impact of hot indirect calls (by speculatively promoting them to
direct calls) and allow optimized search trees to be used to lower
switches. If you need to deploy these techniques in C++ applications, we
*strongly* recommend that you ensure all hot call targets are statically
linked (avoiding PLT indirection) and use both PGO and ThinLTO. Well
tuned servers using all of these techniques saw 5% - 10% overhead from
the use of retpoline.

We will add detailed documentation covering these components in
subsequent patches, but wanted to make the core functionality available
as soon as possible. Happy for more code review, but we'd really like to
get these patches landed and backported ASAP for obvious reasons. We're
planning to backport this to both 6.0 and 5.0 release streams and get
a 5.0 release with just this cherry picked ASAP for distros and vendors.

This patch is the work of a number of people over the past month: Eric, Reid,
Rui, and myself. I'm mailing it out as a single commit due to the time
sensitive nature of landing this and the need to backport it. Huge thanks to
everyone who helped out here, and everyone at Intel who helped out in
discussions about how to craft this. Also, credit goes to Paul Turner (at
Google, but not an LLVM contributor) for much of the underlying retpoline
design.

Reviewers: echristo, rnk, ruiu, craig.topper, DavidKreitzer

Subscribers: sanjoy, emaste, mcrosier, mgorny, mehdi_amini, hiraditya, llvm-commits

Differential Revision: https://reviews.llvm.org/D41723

llvm-svn: 323155
2018-01-22 22:05:25 +00:00
Mark Searles 7687d42052 [AMDGPU] SI Load Store Optimizer: When merging with offset, use V_ADD_{I|U}32_e64
- Change inserted add ( V_ADD_{I|U}32_e32 ) to _e64 version ( V_ADD_{I|U}32_e64 ) so that the add uses a vreg for the carry; this prevents inserted v_add from killing VCC; the _e64 version doesn't accept a literal in its encoding, so we need to introduce a mov instr as well to get the imm into a register.
- Change pass name to "SI Load Store Optimizer"; this removes the '/', which complicates scripts.

Differential Revision: https://reviews.llvm.org/D42124

llvm-svn: 323153
2018-01-22 21:46:43 +00:00
Evandro Menezes 312443fd83 [AArch64] Create a separate feature set for Exynos M3
Distinguish the features from Exynos M2.

llvm-svn: 323139
2018-01-22 19:03:26 +00:00
Joel Galenson 1d89cd2bb4 [ARM] Cleanup part of ARMBaseInstrInfo::optimizeCompareInstr (NFCI).
As noted in another review, this loop is confusing.  This commit cleans it up
somewhat.

Differential Revision: https://reviews.llvm.org/D42312

llvm-svn: 323136
2018-01-22 17:53:47 +00:00
Petar Jovanovic 29aced1bae [mips] add warnings for using dsp and msa flags with inappropriate revisions
Dsp and dspr2 require MIPS revision 2, while msa requires revision 5. Adding
warnings for cases when these flags are used with earlier revision.

Patch by Milos Stojanovic.

Differential Revision: https://reviews.llvm.org/D40490

llvm-svn: 323131
2018-01-22 16:43:30 +00:00
Ulrich Weigand 145d63f1ad [SystemZ] Fix bootstrap failure due to invalid DAG loop
The change in r322988 caused a failure in the bootstrap build bot.
The problem was that directly gluing a BR_CCMASK node to a
compare-and-swap could lead to issues if other nodes were
chained in between.  There is then no way to create a topological
sort that respects both the chain sequence and the glue property.

Fixed for now by rejecting the optimization in this case.  As a
future enhancement, we may be able to handle additional cases
by swapping chain links around.

llvm-svn: 323129
2018-01-22 15:41:49 +00:00
Marina Yatsina 811523cc08 Fix bug in commit 323096 exposed by test in test-suite-verify-machineinstrs-x86_64h-O3
Change-Id: I0a4b10d0d6c8de606d989c567ec07944ae283a87
llvm-svn: 323126
2018-01-22 15:31:05 +00:00
Sander de Smalen 7ab96f534c [AArch64][SVE] Asm: PTRUE and PTRUES instructions
Summary: These instructions initialize a predicate vector from a pattern/immediate.

Reviewers: fhahn, rengolin, evandro, mcrosier, t.p.northover, samparker, olista01

Reviewed By: samparker

Subscribers: aemerson, javed.absar, tschuett, kristof.beyls, llvm-commits

Differential Revision: https://reviews.llvm.org/D41819

llvm-svn: 323124
2018-01-22 15:29:19 +00:00
Carey Williams da15b5b116 [AArch64] optimise v4f16 fcmps to utilise vector instructions
Improves the code generation for v4f16 FCMP instructions when FullFP16 is not supported.
Generating FCTVL(s) rather than a longer series of FCVTs.

Differential Revision: https://reviews.llvm.org/D41772

llvm-svn: 323118
2018-01-22 14:16:11 +00:00
Simon Pilgrim 17682a86da [X86][SSE] Add ISD::VECTOR_SHUFFLE to faux shuffle decoding (Reapplied)
Primarily, this allows us to use the aggressive extraction mechanisms in combineExtractWithShuffle earlier and make use of UNDEF elements that may be lost during lowering.

Reapplied after rL322279 was reverted at rL322335 due to PR35918, underlying issue was fixed at rL322644.

llvm-svn: 323104
2018-01-22 12:05:17 +00:00
Sander de Smalen 245e0e67f3 [AArch64][SVE] Asm: Predicate patterns
Summary:
This patch adds support for parsing/printing of named or unnamed
patterns that are used in SVE's PTRUE instruction, amongst others.

The pattern can be specified as a named pattern to initialize the predicate
vector or it can be specified as an immediate in the range 0-31.

Reviewers: fhahn, rengolin, evandro, mcrosier, t.p.northover

Reviewed By: fhahn

Subscribers: aemerson, javed.absar, tschuett, kristof.beyls, llvm-commits

Differential Revision: https://reviews.llvm.org/D41818

llvm-svn: 323098
2018-01-22 10:46:00 +00:00
Marina Yatsina 77a21dbad4 Break false dependencies for POPCNT, LZCNT, TZCNT
Add POPCNT, LZCNT, TZCNT to the list of instructions that have false dependency.
Add a test to make sure BreakFalseDeps breaks the dependencies for these instructions.
Update affected tests.

This fixes bugzilla https://bugs.llvm.org/show_bug.cgi?id=33869

This is the final of multiple patches that fix this bugzilla.
Most of the patches are intended at refactoring the existent code.

Reviews of the refactoring done to enable this change:
https://reviews.llvm.org/D40330
https://reviews.llvm.org/D40331
https://reviews.llvm.org/D40332
https://reviews.llvm.org/D40333

Differential Revision: https://reviews.llvm.org/D40334

Change-Id: If95cbf1a3f5c7dccff8f1b22ecb397542147303d
llvm-svn: 323096
2018-01-22 10:07:01 +00:00
Marina Yatsina 0bf841ac2a Separate LoopTraversal, ReachingDefAnalysis and BreakFalseDeps into their own files.
This is the one of multiple patches that fix bugzilla https://bugs.llvm.org/show_bug.cgi?id=33869
Most of the patches are intended at refactoring the existent code.

Additional relevant reviews:
https://reviews.llvm.org/D40330
https://reviews.llvm.org/D40331
https://reviews.llvm.org/D40332
https://reviews.llvm.org/D40334

Differential Revision: https://reviews.llvm.org/D40333

Change-Id: Ie5f8eb34d98cfdfae23a3072eb69b5794f0e2d56
llvm-svn: 323095
2018-01-22 10:06:50 +00:00
Marina Yatsina 3d8efa4f0c Rename ExecutionDepsFix files to ExecutionDomainFix
This is the one of multiple patches that fix bugzilla https://bugs.llvm.org/show_bug.cgi?id=33869
Most of the patches are intended at refactoring the existent code.

Additional relevant reviews:
https://reviews.llvm.org/D40330
https://reviews.llvm.org/D40331
https://reviews.llvm.org/D40333
https://reviews.llvm.org/D40334

Differential Revision: https://reviews.llvm.org/D40332

Change-Id: I6a048cca7fdafbfc42fb1bac94343e483befded8
llvm-svn: 323094
2018-01-22 10:06:33 +00:00
Marina Yatsina 6fc2aaae8d Separate ExecutionDepsFix into 4 parts:
1. ReachingDefsAnalysis - Allows to identify for each instruction what is the “closest” reaching def of a certain register. Used by BreakFalseDeps (for clearance calculation) and ExecutionDomainFix (for arbitrating conflicting domains).
2. ExecutionDomainFix - Changes the variant of the instructions in order to minimize domain crossings.
3. BreakFalseDeps - Breaks false dependencies.
4. LoopTraversal - Creatws a traversal order of the basic blocks that is optimal for loops (introduced in revision L293571). Both ExecutionDomainFix and ReachingDefsAnalysis use this to determine the order they will traverse the basic blocks.

This also included the following changes to ExcecutionDepsFix original logic:
1. BreakFalseDeps and ReachingDefsAnalysis logic no longer restricted by a register class.
2. ReachingDefsAnalysis tracks liveness of reg units instead of reg indices into a given reg class.

Additional changes in affected files:
1. X86 and ARM targets now inherit from ExecutionDomainFix instead of ExecutionDepsFix. BreakFalseDeps also was added to the passes they activate.
2. Comments and references to ExecutionDepsFix replaced with ExecutionDomainFix and BreakFalseDeps, as appropriate.

Additional refactoring changes will follow.

This commit is (almost) NFC.
The only functional change is that now BreakFalseDeps will break dependency for all register classes.
Since no additional instructions were added to the list of instructions that have false dependencies, there is no actual change yet.
In a future commit several instructions (and tests) will be added.

This is the first of multiple patches that fix bugzilla https://bugs.llvm.org/show_bug.cgi?id=33869
Most of the patches are intended at refactoring the existent code.

Additional relevant reviews:
https://reviews.llvm.org/D40331
https://reviews.llvm.org/D40332
https://reviews.llvm.org/D40333
https://reviews.llvm.org/D40334

Differential Revision: https://reviews.llvm.org/D40330

Change-Id: Icaeb75e014eff96a8f721377783f9a3e6c679275
llvm-svn: 323087
2018-01-22 10:05:23 +00:00
Hiroshi Inoue 290adb3184 [NFC] fix trivial typos in comments
"the the" -> "the"

llvm-svn: 323074
2018-01-22 05:54:46 +00:00
Craig Topper 7fddf2bfef [X86] Add an override of targetShrinkDemandedConstant to limit the damage that shrinkdemandedbits can do to zext_in_reg operations
Summary:
This patch adds an implementation of targetShrinkDemandedConstant that tries to keep shrinkdemandedbits from removing bits that would otherwise have been recognized as a movzx.

We still need a follow patch to stop moving ands across srl if the and could be represented as a movzx before the shift but not after. I think this should help with some of the cases that D42088 ended up removing during isel.

Reviewers: spatel, RKSimon

Reviewed By: spatel

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D42265

llvm-svn: 323048
2018-01-20 18:50:09 +00:00
Simon Pilgrim 89540d9665 [X86][SSE] Check for out of bounds PEXTR/PINSR indices during faux shuffle combining.
llvm-svn: 323045
2018-01-20 17:16:01 +00:00
Craig Topper 08bd14803c [X86] Teach X86 codegen to use vector width preference to avoid promoting to 512-bit types when VLX is enabled and the preference is for a smaller size.
This change applies to places where we would turn 128/256-bit code into 512-bit in order to get a wider element type through sext/zext. Any 512-bit types that already existed in the IR/DAG will be left that way.

The width preference has no effect on codegen behavior when the target does not have AVX512 enabled. So AVX/AVX2 codegen cannot be limited via this mechanism yet.

If the preference is lower than 256 we may still use a 256 bit type to do the operation. Constraining to 128 bits makes it much more difficult to support some operations. For many of these cases we need to change element width while keeping element count constant which is easiest done by switching between 256 and 128 bit.

The preference is only obeyed when AVX512 and VLX are available. This means the preference is not obeyed for KNL, but is obeyed for SKX, Cannonlake, and Icelake. For KNL, the only way to do masked operation is on 512-bit registers so we would have to completely disable masking to obey the preference. We would also lose support for gather, scatter, ctlz, vXi64 multiplies, etc. This may change in the future, but this simplifies the initial implementation.

Differential Revision: https://reviews.llvm.org/D41895

llvm-svn: 323016
2018-01-20 00:26:12 +00:00
Craig Topper 0d797a34d8 [X86] Add support for passing 'prefer-vector-width' function attribute into X86Subtarget and exposing via X86's getRegisterWidth TTI interface.
This will cause the vectorizers to do some limiting of the vector widths they create. This is not a strict limit. There are reasons I know of that the loop vectorizer will generate larger vectors for.

I've written this in such a way that the interface will only return a properly supported width(0/128/256/512) even if the attribute says something funny like 384 or 10.

This has been split from D41895 with the remainder in a follow up commit.

llvm-svn: 323015
2018-01-20 00:26:08 +00:00
Derek Schuff a83a665cd4 [WebAssembly] Fix MSVC build
nullptr_t can't be used left of boolean &&

llvm-svn: 323012
2018-01-20 00:01:18 +00:00
Ulrich Weigand 426f6bef44 [SystemZ] Prefer LOCHI over generating IPM sequences
On current machines we have load-on-condition instructions that can be
used to directly implement the SETCC semantics.  If we have those, it is
always preferable to use them instead of generating the IPM sequence.

llvm-svn: 322989
2018-01-19 20:56:04 +00:00
Ulrich Weigand 31112895d9 [SystemZ] Directly use CC result of compare-and-swap
In order to implement a test whether a compare-and-swap succeeded, the
SystemZ back-end currently emits a rather inefficient sequence of first
converting the CC result into an integer, and then testing that integer
against zero.  This commit changes the back-end to simply directly test
the CC value set by the compare-and-swap instruction.

llvm-svn: 322988
2018-01-19 20:54:18 +00:00
Ulrich Weigand 849a59fd4b [SystemZ] Rework IPM sequence generation
The SystemZ back-end uses a sequence of IPM followed by arithmetic
operations to implement the SETCC primitive.  This is currently done
early during SelectionDAG.  This patch moves generating those sequences
to much later in SelectionDAG (during PreprocessISelDAG).

This doesn't change much in generated code by itself, but it allows
further enhancements that will be checked-in as follow-on commits.

llvm-svn: 322987
2018-01-19 20:52:04 +00:00
Ulrich Weigand 9eb858c92f [SystemZ] Implement computeKnownBitsForTargetNode
This provides a computeKnownBits implementation for SystemZ target
nodes.  Currently only SystemZISD::SELECT_CCMASK is supported.

llvm-svn: 322986
2018-01-19 20:49:05 +00:00
Joel Galenson dbc724f764 [ARM] Fix perf regression in compare optimization.
Fix a performance regression caused by r322737.

While trying to make it easier to replace compares with existing adds and
subtracts, I accidentally stopped it from doing so in some cases.  This should
fix that.  I'm also fixing another potential bug in that commit.

Differential Revision: https://reviews.llvm.org/D42263

llvm-svn: 322972
2018-01-19 17:46:27 +00:00
Derek Schuff bfb02aec5a [WebAssembly] Fix libcall signature lookup
RuntimeLibcallSignatures previously manually initialized all the libcall
names into an array and searched it linearly for the first match to lookup
the corresponding index.
r322802 switched that to initializing a map keyed by the libcall name.
Neither of these approaches works correctly because some libcall numbers use
the same name on different platforms (e.g. the "l" suffixed functions
use f80 or f128 or ppcf128).

This change fixes that by ensuring that each name only goes into the map
once. It also adds tests.

Differential Revision: https://reviews.llvm.org/D42271

llvm-svn: 322971
2018-01-19 17:45:54 +00:00
Dan Gohman 5d2b9354b1 [WebAssembly] Make sign-extension opcodes a distinct feature.
Sign-extension opcodes have been split into a separate proposal from
the main threads proposal, so switch them to their own target
feature. See:

https://github.com/WebAssembly/sign-extension-ops

llvm-svn: 322966
2018-01-19 17:16:24 +00:00
Daniel Neilson 1e68724d24 Remove alignment argument from memcpy/memmove/memset in favour of alignment attributes (Step 1)
Summary:
 This is a resurrection of work first proposed and discussed in Aug 2015:
   http://lists.llvm.org/pipermail/llvm-dev/2015-August/089384.html
and initially landed (but then backed out) in Nov 2015:
   http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20151109/312083.html

 The @llvm.memcpy/memmove/memset intrinsics currently have an explicit argument
which is required to be a constant integer. It represents the alignment of the
dest (and source), and so must be the minimum of the actual alignment of the
two.

 This change is the first in a series that allows source and dest to each
have their own alignments by using the alignment attribute on their arguments.

 In this change we:
1) Remove the alignment argument.
2) Add alignment attributes to the source & dest arguments. We, temporarily,
   require that the alignments for source & dest be equal.

 For example, code which used to read:
  call void @llvm.memcpy.p0i8.p0i8.i32(i8* %dest, i8* %src, i32 100, i32 4, i1 false)
will now read
  call void @llvm.memcpy.p0i8.p0i8.i32(i8* align 4 %dest, i8* align 4 %src, i32 100, i1 false)

 Downstream users may have to update their lit tests that check for
@llvm.memcpy/memmove/memset call/declaration patterns. The following extended sed script
may help with updating the majority of your tests, but it does not catch all possible
patterns so some manual checking and updating will be required.

s~declare void @llvm\.mem(set|cpy|move)\.p([^(]*)\((.*), i32, i1\)~declare void @llvm.mem\1.p\2(\3, i1)~g
s~call void @llvm\.memset\.p([^(]*)i8\(i8([^*]*)\* (.*), i8 (.*), i8 (.*), i32 [01], i1 ([^)]*)\)~call void @llvm.memset.p\1i8(i8\2* \3, i8 \4, i8 \5, i1 \6)~g
s~call void @llvm\.memset\.p([^(]*)i16\(i8([^*]*)\* (.*), i8 (.*), i16 (.*), i32 [01], i1 ([^)]*)\)~call void @llvm.memset.p\1i16(i8\2* \3, i8 \4, i16 \5, i1 \6)~g
s~call void @llvm\.memset\.p([^(]*)i32\(i8([^*]*)\* (.*), i8 (.*), i32 (.*), i32 [01], i1 ([^)]*)\)~call void @llvm.memset.p\1i32(i8\2* \3, i8 \4, i32 \5, i1 \6)~g
s~call void @llvm\.memset\.p([^(]*)i64\(i8([^*]*)\* (.*), i8 (.*), i64 (.*), i32 [01], i1 ([^)]*)\)~call void @llvm.memset.p\1i64(i8\2* \3, i8 \4, i64 \5, i1 \6)~g
s~call void @llvm\.memset\.p([^(]*)i128\(i8([^*]*)\* (.*), i8 (.*), i128 (.*), i32 [01], i1 ([^)]*)\)~call void @llvm.memset.p\1i128(i8\2* \3, i8 \4, i128 \5, i1 \6)~g
s~call void @llvm\.memset\.p([^(]*)i8\(i8([^*]*)\* (.*), i8 (.*), i8 (.*), i32 ([0-9]*), i1 ([^)]*)\)~call void @llvm.memset.p\1i8(i8\2* align \6 \3, i8 \4, i8 \5, i1 \7)~g
s~call void @llvm\.memset\.p([^(]*)i16\(i8([^*]*)\* (.*), i8 (.*), i16 (.*), i32 ([0-9]*), i1 ([^)]*)\)~call void @llvm.memset.p\1i16(i8\2* align \6 \3, i8 \4, i16 \5, i1 \7)~g
s~call void @llvm\.memset\.p([^(]*)i32\(i8([^*]*)\* (.*), i8 (.*), i32 (.*), i32 ([0-9]*), i1 ([^)]*)\)~call void @llvm.memset.p\1i32(i8\2* align \6 \3, i8 \4, i32 \5, i1 \7)~g
s~call void @llvm\.memset\.p([^(]*)i64\(i8([^*]*)\* (.*), i8 (.*), i64 (.*), i32 ([0-9]*), i1 ([^)]*)\)~call void @llvm.memset.p\1i64(i8\2* align \6 \3, i8 \4, i64 \5, i1 \7)~g
s~call void @llvm\.memset\.p([^(]*)i128\(i8([^*]*)\* (.*), i8 (.*), i128 (.*), i32 ([0-9]*), i1 ([^)]*)\)~call void @llvm.memset.p\1i128(i8\2* align \6 \3, i8 \4, i128 \5, i1 \7)~g
s~call void @llvm\.mem(cpy|move)\.p([^(]*)i8\(i8([^*]*)\* (.*), i8([^*]*)\* (.*), i8 (.*), i32 [01], i1 ([^)]*)\)~call void @llvm.mem\1.p\2i8(i8\3* \4, i8\5* \6, i8 \7, i1 \8)~g
s~call void @llvm\.mem(cpy|move)\.p([^(]*)i16\(i8([^*]*)\* (.*), i8([^*]*)\* (.*), i16 (.*), i32 [01], i1 ([^)]*)\)~call void @llvm.mem\1.p\2i16(i8\3* \4, i8\5* \6, i16 \7, i1 \8)~g
s~call void @llvm\.mem(cpy|move)\.p([^(]*)i32\(i8([^*]*)\* (.*), i8([^*]*)\* (.*), i32 (.*), i32 [01], i1 ([^)]*)\)~call void @llvm.mem\1.p\2i32(i8\3* \4, i8\5* \6, i32 \7, i1 \8)~g
s~call void @llvm\.mem(cpy|move)\.p([^(]*)i64\(i8([^*]*)\* (.*), i8([^*]*)\* (.*), i64 (.*), i32 [01], i1 ([^)]*)\)~call void @llvm.mem\1.p\2i64(i8\3* \4, i8\5* \6, i64 \7, i1 \8)~g
s~call void @llvm\.mem(cpy|move)\.p([^(]*)i128\(i8([^*]*)\* (.*), i8([^*]*)\* (.*), i128 (.*), i32 [01], i1 ([^)]*)\)~call void @llvm.mem\1.p\2i128(i8\3* \4, i8\5* \6, i128 \7, i1 \8)~g
s~call void @llvm\.mem(cpy|move)\.p([^(]*)i8\(i8([^*]*)\* (.*), i8([^*]*)\* (.*), i8 (.*), i32 ([0-9]*), i1 ([^)]*)\)~call void @llvm.mem\1.p\2i8(i8\3* align \8 \4, i8\5* align \8 \6, i8 \7, i1 \9)~g
s~call void @llvm\.mem(cpy|move)\.p([^(]*)i16\(i8([^*]*)\* (.*), i8([^*]*)\* (.*), i16 (.*), i32 ([0-9]*), i1 ([^)]*)\)~call void @llvm.mem\1.p\2i16(i8\3* align \8 \4, i8\5* align \8 \6, i16 \7, i1 \9)~g
s~call void @llvm\.mem(cpy|move)\.p([^(]*)i32\(i8([^*]*)\* (.*), i8([^*]*)\* (.*), i32 (.*), i32 ([0-9]*), i1 ([^)]*)\)~call void @llvm.mem\1.p\2i32(i8\3* align \8 \4, i8\5* align \8 \6, i32 \7, i1 \9)~g
s~call void @llvm\.mem(cpy|move)\.p([^(]*)i64\(i8([^*]*)\* (.*), i8([^*]*)\* (.*), i64 (.*), i32 ([0-9]*), i1 ([^)]*)\)~call void @llvm.mem\1.p\2i64(i8\3* align \8 \4, i8\5* align \8 \6, i64 \7, i1 \9)~g
s~call void @llvm\.mem(cpy|move)\.p([^(]*)i128\(i8([^*]*)\* (.*), i8([^*]*)\* (.*), i128 (.*), i32 ([0-9]*), i1 ([^)]*)\)~call void @llvm.mem\1.p\2i128(i8\3* align \8 \4, i8\5* align \8 \6, i128 \7, i1 \9)~g

 The remaining changes in the series will:
Step 2) Expand the IRBuilder API to allow creation of memcpy/memmove with differing
   source and dest alignments.
Step 3) Update Clang to use the new IRBuilder API.
Step 4) Update Polly to use the new IRBuilder API.
Step 5) Update LLVM passes that create memcpy/memmove calls to use the new IRBuilder API,
        and those that use use MemIntrinsicInst::[get|set]Alignment() to use
        getDestAlignment() and getSourceAlignment() instead.
Step 6) Remove the single-alignment IRBuilder API for memcpy/memmove, and the
        MemIntrinsicInst::[get|set]Alignment() methods.

Reviewers: pete, hfinkel, lhames, reames, bollu

Reviewed By: reames

Subscribers: niosHD, reames, jholewinski, qcolombet, jfb, sanjoy, arsenm, dschuff, dylanmckay, mehdi_amini, sdardis, nemanjai, david2050, nhaehnle, javed.absar, sbc100, jgravelle-google, eraman, aheejin, kbarton, JDevlieghere, asb, rbar, johnrusso, simoncook, jordy.potman.lists, apazos, sabuasal, llvm-commits

Differential Revision: https://reviews.llvm.org/D41675

llvm-svn: 322965
2018-01-19 17:13:12 +00:00
Carey Williams 22c49c6470 Test commit
llvm-svn: 322958
2018-01-19 16:55:23 +00:00
Sanjay Patel 74a1eef7c4 [x86] shrink 'and' immediate values by setting the high bits (PR35907)
Try to reverse the constant-shrinking that happens in SimplifyDemandedBits()
for 'and' masks when it results in a smaller sign-extended immediate.

We are also able to detect dead 'and' ops here (the mask is all ones). In
that case, we replace and return without selecting the 'and'.

Other targets might want to share some of this logic by enabling this under a
target hook, but I didn't see diffs for simple cases with PowerPC or AArch64,
so they may already have some specialized logic for this kind of thing or have
different needs.

This should solve PR35907:
https://bugs.llvm.org/show_bug.cgi?id=35907

Differential Revision: https://reviews.llvm.org/D42088

llvm-svn: 322957
2018-01-19 16:37:25 +00:00
Nirav Dave 72d32f24f5 [X86] Extend load-op-store fusion merge to ADC/SBB.
Summary: Add handling of EFLAG input to X86 Load-op-store fusion checking.

Reviewers: craig.topper, RKSimon

Subscribers: llvm-commits, hiraditya

Differential Revision: https://reviews.llvm.org/D42128

llvm-svn: 322952
2018-01-19 15:37:57 +00:00
Sander de Smalen 909cf956a1 [AArch64][SVE] Asm: Add support for RDVL/ADDVL/ADDPL instructions
Reviewers: fhahn, rengolin, t.p.northover, echristo, olista01, SjoerdMeijer

Reviewed By: SjoerdMeijer

Subscribers: SjoerdMeijer, aemerson, javed.absar, tschuett, kristof.beyls, llvm-commits

Differential Revision: https://reviews.llvm.org/D41900

llvm-svn: 322951
2018-01-19 15:22:00 +00:00
Dmitry Preobrazhensky 0e074e349d [AMDGPU][MC] Corrected parsing of image modifiers and encoding of image atomics
See bugs
    35962: https://bugs.llvm.org/show_bug.cgi?id=35962
    35963: https://bugs.llvm.org/show_bug.cgi?id=35963

Differential Revision: https://reviews.llvm.org/D42184

Reviewers: vpykhtin, artem.tamazov, arsenm
llvm-svn: 322942
2018-01-19 13:49:53 +00:00
Matthias Braun 4a7c8e7aa2 Split MachineLICM into EarlyMachineLICM and MachineLICM; NFC
This avoids playing games with pseudo pass IDs and avoids using an
unreliable MRI::isSSA() check to determine whether register allocation
has happened.

Note that this renames:
- MachineLICMID -> EarlyMachineLICM
- PostRAMachineLICMID -> MachineLICMID
to be consistent with the EarlyTailDuplicate/TailDuplicate naming.

llvm-svn: 322927
2018-01-19 06:46:10 +00:00
Craig Topper f4cd9083ac [X86] Make better use of instregex for cmovcc/setcc/jcc instructions in the Intel scheduler models.
Combine all the separate condition codes into a singular expression when possible.

llvm-svn: 322924
2018-01-19 05:47:32 +00:00
Matthias Braun 5c290dc206 AArch64: Fix emergency spillslot being out of reach for large callframes
Re-commit of r322200: The testcase shouldn't hit machineverifiers
anymore with r322917 in place.

Large callframes (calls with several hundreds or thousands or
parameters) could lead to situations in which the emergency spillslot is
out of range to be addressed relative to the stack pointer.
This commit forces the use of a frame pointer in the presence of large
callframes.

This commit does several things:
- Compute max callframe size at the end of instruction selection.
- Add mirFileLoaded target callback. Use it to compute the max callframe size
  after loading a .mir file when the size wasn't specified in the file.
- Let TargetFrameLowering::hasFP() return true if there exists a
  callframe > 255 bytes.
- Always place the emergency spillslot close to FP if we have a frame
  pointer.
- Note that `useFPForScavengingIndex()` would previously return false
  when a base pointer was available leading to the emergency spillslot
  getting allocated late (that's the whole effect of this callback).
  Which made no sense to me so I took this case out: Even though the
  emergency spillslot is technically not referenced by FP in this case
  we still want it allocated early.

Differential Revision: https://reviews.llvm.org/D40876

llvm-svn: 322919
2018-01-19 03:16:36 +00:00
Matthias Braun dc4b3e87f4 AArch64: Omit callframe setup/destroy when not necessary
Do not create CALLSEQ_START/CALLSEQ_END when there is no callframe to
setup and the callframe size is 0.

- Fixes an invalid callframe nesting for byval arguments, which would
  look like this before this patch (as in `big-byval.ll`):
    ...
    ADJCALLSTACKDOWN 32768, 0, ...   # Setup for extfunc
    ...
    ADJCALLSTACKDOWN 0, 0, ...  # setup for memcpy
    ...
    BL &memcpy ...
    ADJCALLSTACKUP 0, 0, ...    # destroy for memcpy
    ...
    BL &extfunc
    ADJCALLSTACKUP 32768, 0, ...   # destroy for extfunc

- Saves us two instructions in the common case of zero-sized stackframes.
- Remove an unnecessary scheduling barrier (hence the small unittest
  changes).

Differential Revision: https://reviews.llvm.org/D42006

llvm-svn: 322917
2018-01-19 02:45:38 +00:00
Sam Clegg b6c5bc27c4 [WebAssembly] Add test expectations for gcc C++ tests (gcc/testsuite/g++.dg)
Differential Revision: https://reviews.llvm.org/D42226

llvm-svn: 322915
2018-01-19 01:40:52 +00:00
Craig Topper 84b26b90d1 [X86] Add intrinsic support for the RDPID instruction
This adds a new instrinsic to support the rdpid instruction. The implementation is a bit weird because the intrinsic is defined as always returning 32-bits, but the assembler support thinks the instruction produces a 64-bit register in 64-bit mode. But really it zeros the upper 32 bits. So I had to add separate patterns where 64-bit mode uses an extract_subreg.

Differential Revision: https://reviews.llvm.org/D42205

llvm-svn: 322910
2018-01-18 23:52:31 +00:00
Changpeng Fang ba6240cc71 AMDGPU/SI: Fix typos in d16 support patch the buffer intrinsics.
llvm-svn: 322906
2018-01-18 22:57:57 +00:00
Changpeng Fang 4737e892de AMDGPU/SI: Add d16 support for image intrinsics.
Summary:
  This patch implements d16 support for image load, image store and image sample intrinsics.

Reviewers:
  Matt, Brian.

Differential Revision:
  https://reviews.llvm.org/D3991

llvm-svn: 322903
2018-01-18 22:08:53 +00:00
Amara Emerson d5785775f8 [AArch64][GlobalISel] Add isel support for global values in the large code model.
Fixes PR35958.

Differential Revision: https://reviews.llvm.org/D42175

llvm-svn: 322878
2018-01-18 19:21:27 +00:00
Ana Pazos 1b57c7a0f4 [RISCV] Fixed setting predicates for compressed instructions.
Summary:
Fixed setting predicates for compressed instructions.
Some instructions were being generated with C extension
enabled only, without proper checks for the other
required extensions like F, D and 32 and 64-bit target checks.
Affected instructions:
C_FLD, C_FLW, C_LD, C_FSD, C_FSW, C_SD,
C_JAL, C_ADDIW, C_SUBW, C_ADDW,
C_FLDSP, C_FLWSP, C_LDSP, C_FSDSP, C_FSWSP, C_SDSP

Reviewers: asb, shiva0217

Reviewed By: asb

Subscribers: rbar, johnrusso, simoncook, jordy.potman.lists, sabuasal, niosHD, llvm-commits

Differential Revision: https://reviews.llvm.org/D42132

llvm-svn: 322876
2018-01-18 18:54:05 +00:00
Alex Bradbury 921383828e [RISCV] Codegen support for the standard RV32M instruction set extension
llvm-svn: 322843
2018-01-18 12:36:38 +00:00
Alex Bradbury 7d6aa1f7ae [RISCV] Implement frame pointer elimination
llvm-svn: 322839
2018-01-18 11:34:02 +00:00
Craig Topper 83b0a98902 [X86] Use vmovdqu64/vmovdqa64 for unmasked integer vector stores for consistency with loads.
Previously we used 64 for vXi64 stores and 32 for everything else. This change uses 64 for everything just like do for loads.

llvm-svn: 322820
2018-01-18 07:44:09 +00:00
Craig Topper 21c8a8fa49 [X86] Remove isel patterns for using unmasked vmovdqa32/vmovdqu32 for integer vector loads.
These patterns were just looking for a vXi64 bitcasted to vXi32, but there is no advantage to using vmovdqa32 over vmovdqa64.

llvm-svn: 322819
2018-01-18 07:44:06 +00:00
Derek Schuff 53b3855b2b [WebAssembly] Remove duplicated RTLIB names
Remove the tight coupling between llvm/CodeGenRuntimeLibcalls.def and
the table of supported singatures for wasm. This will allow adding new libcalls
without changing wasm's signature table.

Also, some cleanup:
Use ManagedStatics instead of const tables to avoid memory/binary bloat.
Use a StringMap instead of a linear search for name lookup.

Differential Revision: https://reviews.llvm.org/D35592

llvm-svn: 322802
2018-01-18 01:15:45 +00:00
Reid Kleckner 1aa9061c5f [CodeGen] Hoist common AsmPrinter code out of X86, ARM, and AArch64
Every known PE COFF target emits /EXPORT: linker flags into a .drective
section. The AsmPrinter should handle this.

While we're at it, use global_values() and emit each export flag with
its own .ascii directive. This should make the .s file output more
readable.

llvm-svn: 322788
2018-01-17 23:55:23 +00:00
Volkan Keles a79b0620a0 Add a TargetOption to enable/disable GlobalISel
Summary:
This patch adds a new target option in order to control GlobalISel.
This will allow the users to enable/disable GlobalISel prior to the
backend by calling `TargetMachine::setGlobalISel(bool Enable)`.

No test case as there is already a test to check GlobalISel
command line options.
See: CodeGen/AArch64/GlobalISel/gisel-commandline-option.ll.

Reviewers: qcolombet, aemerson, ab, dsanders

Reviewed By: qcolombet

Subscribers: rovka, javed.absar, kristof.beyls, llvm-commits

Differential Revision: https://reviews.llvm.org/D42137

llvm-svn: 322773
2018-01-17 22:34:21 +00:00
Benjamin Kramer 8b1986b5cb Add support for emitting libcalls for x86_fp80 -> fp128 and vice-versa
compiler_rt doesn't provide them (yet), but libgcc does. PR34076.

llvm-svn: 322772
2018-01-17 22:29:16 +00:00
Zaara Syeda c9dc7b451b Revert [PowerPC] This reverts commit rL322721
Failing build bots. Revert the commit now.

llvm-svn: 322748
2018-01-17 20:00:15 +00:00
Aditya Nandakumar 18b3f9d384 [GISel] Make constrainSelectedInstRegOperands() available to the legalizer. NFC
https://reviews.llvm.org/D42149

llvm-svn: 322743
2018-01-17 19:31:33 +00:00
Rafael Espindola d700869235 Use a got to access a hidden weak undefined on MachO.
Trying to link

__attribute__((weak, visibility("hidden"))) extern int foo;
int *main(void) {
  return &foo;
}

on OS X fails with

ld: 32-bit RIP relative reference out of range (-4294971318 max is +/-2GB): from _main (0x100000FAB) to _foo@0x00001000 (0x00000000) in '_main' from test.o for architecture x86_64

The problem being that 0 cannot be computed as a fixed difference from
%rip. Exactly the same issue exists on ELF and we can use the same
solution.

llvm-svn: 322739
2018-01-17 19:19:55 +00:00
Joel Galenson bbcaf4ac5c [ARM] Optimize {s,u}mul.with.overflow.
This extends my previous patches to also optimize overflow-checked multiplies during SelectionDAG.

Differential revision: https://reviews.llvm.org/D40922

llvm-svn: 322738
2018-01-17 19:19:05 +00:00
Joel Galenson fe7fa40869 [ARM] Optimize {s,u}{add,sub}.with.overflow.
The ARM backend contains code that tries to optimize compares by replacing them with an existing instruction that sets the flags the same way. This allows it to replace a "cmp" with a "adds", generalizing the code that replaces "cmp" with "sub". It also heuristically disables sinking of instructions that could potentially be used to replace compares (currently only if they're next to each other).

Differential revision: https://reviews.llvm.org/D38378

llvm-svn: 322737
2018-01-17 19:19:05 +00:00
Simon Pilgrim 8c87a2e7bd [X86][BTVER2] Reduce instregex usage (PR35955)
Most are just replaced with instrs lists, but a few regexps have been further generalized to match more instructions with a single pattern.

llvm-svn: 322734
2018-01-17 19:12:48 +00:00
Craig Topper b70ca5060f [X86] Teach LowerBUILD_VECTOR to recognize pair-wise splats of 32-bit elements and use a 64-bit broadcast
If we are splatting pairs of 32-bit elements, we can use a 64-bit broadcast to get the job done.

We could probably could probably do this with other sizes too, for example four 16-bit elements. Or we could broadcast pairs of 16-bit elements using a 32-bit element broadcast. But I've left that as a future improvement.

I've also restricted this to AVX2 only because we can only broadcast loads under AVX.

Differential Revision: https://reviews.llvm.org/D42086

llvm-svn: 322730
2018-01-17 18:58:22 +00:00
Craig Topper 279ace187a [X86] When legalizing (v64i1 select i8, v64i1, v64i1) make sure not to introduce bitcasts to i64 in 32-bit mode
We legalize selects of masks with scalar conditions using a bitcast to an integer type. But if we are in 32-bit mode we can't convert v64i1 to i64. So instead split the v64i1 to v32i1 and concat it back together. Each half will then be legalized by bitcasting to i32 which is fine.

The test case is a little indirect. If we have the v64i1 select in IR it will get legalized by legalize vector ops which has a run of type legalization after it. That type legalization run is able to fix this i64 bitcast. So in order to avoid that we need a build_vector of a splat which legalize vector ops will ignore. Legalize DAG will then turn that into a select via LowerBUILD_VECTORvXi1. And the select will get legalized. In this case there is no type legalizer run to cleanup the bitcast.

This fixes pr35972.

llvm-svn: 322724
2018-01-17 18:46:01 +00:00
Zaara Syeda 8e951fd2f6 [PowerPC] Add handling for ColdCC calling convention and a pass to mark
candidates with coldcc attribute.

This patch adds support for the coldcc calling convention for Power.
This changes the set of non-volatile registers. It includes a pass to stress
test the implementation by marking all static directly called functions with
the coldcc attribute through the option -enable-coldcc-stress-test. It also
includes an option, -ppc-enable-coldcc, to add the coldcc attribute to
functions which are cold at all call sites based on BlockFrequencyInfo when
the containing function does not call any non cold functions.

Differential Revision: https://reviews.llvm.org/D38413

llvm-svn: 322721
2018-01-17 18:22:55 +00:00
Tatyana Krasnukha 8979eea04e [ARC] Add missing condition codes.
Summary: Added VS and VC, required for disassembling.

Reviewers: petecoup

Reviewed By: petecoup

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D42172

llvm-svn: 322718
2018-01-17 17:58:28 +00:00
Jonas Paulsson ef785694f2 [SystemZ] Handle BRCTH branches correctly in SystemZLongBranch.cpp.
BRCTH is capable of a long branch which needs to be recognized during branch
relaxation. This is done by checking for ExtraRelaxSize == 0.

Review: Ulrich Weigand
llvm-svn: 322688
2018-01-17 17:16:07 +00:00
Matt Arsenault 1491ca8911 AMDGPU: Error in SIAnnotateControlFlow instead of assert
This assert typically happens if an unstructured CFG is passed
to the pass. This can happen if the pass is run independently
without the structurizer.

llvm-svn: 322685
2018-01-17 16:30:01 +00:00
Diana Picus 01bcfd2112 [ARM GlobalISel] Rename local variable. NFC
llvm-svn: 322667
2018-01-17 15:25:37 +00:00
Pablo Barrio f2c29571da [AArch64] Fix incorrect LD1 of 16-bit FP vectors in big endian
Summary:
Loading a vector of 4 half-precision FP sometimes results in an LD1
of 2 single-precision FP + a reversal. This results in an incorrect
byte swap due to the conversion from little endian to big endian.

In order to generate the correct byte swap, it is easier to
generate the correct LD1 of 4 half-precision FP, thus avoiding the
subsequent reversal.

Reviewers: craig.topper, jmolloy, olista01

Reviewed By: olista01

Subscribers: efriedma, samparker, SjoerdMeijer, rogfer01, aemerson, rengolin, javed.absar, kristof.beyls, llvm-commits

Differential Revision: https://reviews.llvm.org/D41863

llvm-svn: 322663
2018-01-17 14:39:29 +00:00
Alex Bradbury d93f889d89 [RISCV] Allow RISCVAsmBackend::writeNopData to generate c.nop when supported
When the compressed instruction set is enabled, the 16-bit c.nop can be
generated if necessary.

Differential Revision: https://reviews.llvm.org/D41221
Patch by Shiva Chen.

llvm-svn: 322658
2018-01-17 14:17:12 +00:00
Diana Picus c62a16234b [ARM GlobalISel] Map G_FPEXT and G_FPTRUNC to FPR
llvm-svn: 322657
2018-01-17 14:14:14 +00:00
Daniil Fukalov d5fca554e2 [AMDGPU] add LDS f32 intrinsics
added llvm.amdgcn.atomic.{add|min|max}.f32 intrinsics
to allow generate ds_{add|min|max}[_rtn]_f32 instructions
needed for OpenCL float atomics in LDS

Reviewed by: arsenm

Differential Revision: https://reviews.llvm.org/D37985

llvm-svn: 322656
2018-01-17 14:05:05 +00:00
Dmitry Preobrazhensky 6b65f7c380 [AMDGPU][MC][GFX9] Enable inline constants for SDWA operands
See bug 35771: https://bugs.llvm.org/show_bug.cgi?id=35771

Differential Revision: https://reviews.llvm.org/D42058

Reviewers: vpykhtin, artem.tamazov, arsenm
llvm-svn: 322655
2018-01-17 14:00:48 +00:00
Diana Picus 65ed364fac [ARM GlobalISel] Legalize G_FPEXT and G_FPTRUNC
Mark G_FPEXT and G_FPTRUNC as legal or libcall, depending on hardware
support, but only for conversions between float and double.

Also add the necessary boilerplate so that the LegalizerHelper can
introduce the required libcalls. This also works only for float and
double, but isn't too difficult to extend when the need arises.

llvm-svn: 322651
2018-01-17 13:34:10 +00:00
Benjamin Kramer 8d073a2c2d [X86] Don't mutate shuffle arguments after early-out for AVX512
The match* functions have the annoying behavior of modifying its inputs.
Save and restore the inputs, just in case the early out for AVX512 is
hit. This is still not great and its only a matter of time this kind of
bug happens again, but I couldn't come up with a better pattern without
rewriting significant chunks of this code. Fixes PR35977.

llvm-svn: 322644
2018-01-17 13:01:06 +00:00
Benjamin Kramer 05dc3527de [X86] Constify DebugLoc parameters. No functionality change.
llvm-svn: 322643
2018-01-17 13:00:58 +00:00
Andrew V. Tischenko f7706994a6 Allow usage of X86-prefixes as separate instrs.
Differential Revision: https://reviews.llvm.org/D42102

llvm-svn: 322623
2018-01-17 10:12:06 +00:00
Craig Topper 77ba1e7c08 [X86] In LowerBUILD_VECTOR, rename ExtVT to EltVT so it makes sense.
llvm-svn: 322616
2018-01-17 03:58:21 +00:00
Craig Topper de1d28e053 [X86] Remove duplicate lines from scheduler models. NFC
llvm-svn: 322615
2018-01-17 03:50:21 +00:00
Simon Pilgrim a8e6b885bd [X86][BTVER2] Fix scheduling of VCMPSD/VCMPSS instructions
For some reason they don't have a trailing i like the packed equivalents.

llvm-svn: 322600
2018-01-16 22:15:41 +00:00
Simon Pilgrim 3c66e2c541 [X86][BTVER2] Use instrs instead of instregex for low match counts (PR35955)
llvm-svn: 322598
2018-01-16 22:08:43 +00:00
Simon Pilgrim e9a2832f32 [X86][BTVER2] Use instrs instead of instregex for single use matches (PR35955)
llvm-svn: 322597
2018-01-16 21:44:48 +00:00
Guozhi Wei e6fb4e1f8a [PPC] Add a new register XER aliased to CARRY
When "xer" is specified as clobbered register in inline assembler, clang can accept it, but llvm simply ignore it when lowered to machine instructions. It may cause problems later in scheduler.

This patch adds a new register XER aliased to CARRY, and adds it to register class CARRYRC. Now PPCTargetLowering::getRegForInlineAsmConstraint can return correct register number for inline asm constraint "{xer}", and scheduler behave correctly.

Differential Revision: https://reviews.llvm.org/D41967

llvm-svn: 322591
2018-01-16 19:28:50 +00:00
Volkan Keles f7f2568613 [GlobalISel][TableGen] Add support for SDNodeXForm
Summary:
This patch adds CustomRenderer which renders the matched
operands to the specified instruction.

Targets can enable the matching of SDNodeXForm by adding
a definition that inherits from GICustomOperandRenderer and
GISDNodeXFormEquiv as follows.

def gi_imm8 : GICustomOperandRenderer<"renderImm8”>,
                       GISDNodeXFormEquiv<imm8_xform>;

Custom renderer functions should be of the form:
void render(MachineInstrBuilder &MIB, const MachineInstr &I);

Reviewers: dsanders, ab, rovka

Reviewed By: dsanders

Subscribers: kristof.beyls, javed.absar, llvm-commits, mgrang, qcolombet

Differential Revision: https://reviews.llvm.org/D42012

llvm-svn: 322582
2018-01-16 18:44:05 +00:00
Simon Pilgrim 3e0aafbfcc [X86][MMX] Accept UNDEF upper bits for MOVD GR32->MMX
llvm-svn: 322574
2018-01-16 17:01:31 +00:00
Simon Pilgrim 85e6139633 [X86][MMX] Improve MMX constant generation
Extend the MMX zero code to take any constant with zero'd upper 32-bits

llvm-svn: 322553
2018-01-16 14:21:28 +00:00
Yonghong Song 035dd256d5 [BPF] Mark pseudo insn patterns as isCodeGenOnly
These pseudos are not supposed to be visible to user.

This patch reduced the auto-generated instruction matcher. For example,
the following words are removed from keyword list of LLVM BPF assembler.

-  MCK__35_, // '#'
-  MCK__COLON_, // ':'
-  MCK__63_, // '?'
-  MCK_ADJCALLSTACKDOWN, // 'ADJCALLSTACKDOWN'
-  MCK_ADJCALLSTACKUP, // 'ADJCALLSTACKUP'
-  MCK_PSEUDO, // 'PSEUDO'
-  MCK_Select, // 'Select'

Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Acked-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
llvm-svn: 322535
2018-01-16 07:27:20 +00:00
Yonghong Song b42c7c7863 [BPF] Teach DAG2DAG AND elimination about load intrinsics
As commented on the existing code:

  // The Reg operand should be a virtual register, which is defined
  // outside the current basic block. DAG combiner has done a pretty
  // good job in removing truncating inside a single basic block.

However, when the Reg operand comes from bpf_load_[byte | half | word]
intrinsics, the generic optimizer doesn't understand their results are
zero extended, so these single basic block elimination opportunities were
missed.

Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Acked-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
llvm-svn: 322534
2018-01-16 07:27:19 +00:00
Craig Topper 7a0c601f95 [X86] Revisit the fix I made years ago to make 'xchgl %eax, %eax' not encode using the 0x90 encoding in 64-bit mode.
Prior to this we had a separate instruction and register class that excluded eax to prevent matching the instruction that would encode with 0x90.

This patch changes this to just use an InstAlias to force xchgl %eax, %eax to use XCHG32rr instruction in 64-bit mode. This gets rid of the separate instruction and register class.

llvm-svn: 322532
2018-01-16 06:07:16 +00:00
Craig Topper daa385f480 [X86] Make 'xchgq %rax, %rax' an alias for the 0x90 nop encoding to match gas.
Previously we encoded it as 0x48 0x90.

llvm-svn: 322531
2018-01-16 06:07:14 +00:00
Simon Pilgrim e5dad1365c Avoid Wparentheses warning.
llvm-svn: 322526
2018-01-15 22:40:06 +00:00
Simon Pilgrim 85bd9141ca [X86][MMX] Add support for MMX zero vector creation
As mentioned on PR35869, (and came up recently on D41517) we don't create a MMX zero register via the PXOR but instead perform a spill to stack from a XMM zero register.

This patch adds support for direct MMX zero vector creation and should make it easier to add better constant vector creation in the future as well.

Differential Revision: https://reviews.llvm.org/D41908

llvm-svn: 322525
2018-01-15 22:32:40 +00:00
Simon Pilgrim 940eae3cc1 [X86][SSE] Add custom execution domain fixing for BLENDPD/BLENDPS/PBLENDD/PBLENDW (PR34873)
Add support for custom execution domain fixing and implement support for BLENDPD/BLENDPS/PBLENDD/PBLENDW.

Differential Revision: https://reviews.llvm.org/D42042

llvm-svn: 322524
2018-01-15 22:18:45 +00:00
Craig Topper 1393ccf949 [X86] Use MVT::getVectorVT instead of EVT::getVectorVT when splitting 256/512 bit build_vectors. NFC
We must be creating a legal type here which means it can be an MVT.

llvm-svn: 322512
2018-01-15 20:33:53 +00:00
Craig Topper aacc622564 [X86] Generalize some code in LowerBUILD_VECTOR. NFC
llvm-svn: 322511
2018-01-15 20:33:52 +00:00
Craig Topper 4f7fadd029 [X86] Remove unnecessary if statement from LowerBUILD_VECTOR. NFCI
We were checking for 128, 256, or 512 bit vectors, but those are the only types that can get here.

llvm-svn: 322510
2018-01-15 20:33:50 +00:00
Dan Gohman 7aa1fcdf3e [WebAssembly] Update README.txt.
Describe more of the current status, mention Rust as another easy
way to use this backend, and add more documentation links.

llvm-svn: 322508
2018-01-15 20:08:14 +00:00
Stanislav Mekhanoshin 62875fcd6c [AMDGPU] Add HW_REG_SH_MEM_BASES symbolic name for s_getreg_b32
Differential Revision: https://reviews.llvm.org/D41617

llvm-svn: 322500
2018-01-15 18:49:15 +00:00
Krzysztof Parzyszek 7fb738ab71 [Hexagon] Implement signed and unsigned multiply-high for vectors
llvm-svn: 322499
2018-01-15 18:43:55 +00:00
Krzysztof Parzyszek b8f2a1e7b7 [Hexagon] Rewrite LowerVECTOR_SHUFFLE for 32-/64-bit vectors
The old implementation was not always correct. The new one recognizes
more shuffles that match specific instructions.

llvm-svn: 322498
2018-01-15 18:33:33 +00:00
Stanislav Mekhanoshin f630047ef6 [AMDGPU] Copy impdefs from pseudo to real instructions
In some cases we do not copy implicit defs from pseudo to real
VOP instructions. It has no visible impact at the moment thus no
tests are affected or added.

Differential Revision: https://reviews.llvm.org/D41783

llvm-svn: 322496
2018-01-15 17:55:35 +00:00
Simon Pilgrim 79add5f155 [X86] Fix typos in WriteVMOVNTDQSt and WriteVMOVNTPYSt pattern names. NFCI.
llvm-svn: 322495
2018-01-15 17:55:21 +00:00
Jonas Paulsson 776a81a483 [SystemZ] Check for legality before doing LOAD AND TEST transformations.
Since a load and test instruction treat its operands as signed, it can only
replace a logical compare for EQ/NE uses.

Review: Ulrich Weigand
https://bugs.llvm.org/show_bug.cgi?id=35662

llvm-svn: 322488
2018-01-15 15:41:26 +00:00
Clement Courbet da1fad3ec6 [X86] Add missing predicates for VRNDSCALES{D,S}{m,r}
Summary: This is similar to https://reviews.llvm.org/D41983.

Reviewers: gchatelet

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D42069

llvm-svn: 322486
2018-01-15 14:24:07 +00:00
Andrew V. Tischenko e58c0c96b2 Update BTVER2 sched numbers for some AVX instructions (xmm version).
Differential Revision: https://reviews.llvm.org/D40067

llvm-svn: 322485
2018-01-15 14:21:11 +00:00
Clement Courbet 36c7be664f [X86]Add missing predicates for VMOVDQUYrm,VMOVDQUYmr.
Summary:
Due to missing parentheses.

This is similar to https://reviews.llvm.org/D41983.

Reviewers: gchatelet

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D42062

llvm-svn: 322483
2018-01-15 13:37:05 +00:00
Sander de Smalen 5aa809db79 [AArch64][AsmParser] Cleanup isSImm7s4, isSImm7s8, (etc) functions.
Reviewers: fhahn, rengolin, t.p.northover, echristo, olista01, samparker

Reviewed By: fhahn, samparker

Subscribers: samparker, aemerson, javed.absar, kristof.beyls, llvm-commits

Differential Revision: https://reviews.llvm.org/D41899

llvm-svn: 322481
2018-01-15 12:47:17 +00:00
Clement Courbet 41a13740c5 [X86] Fix missing predicates HasAVX512 Predicates in avx512_sqrt_scalar.
Summary:
For example, VSQRTSDZr and VSQRTSSZr were missing the predicate.
Also fix braces indentation and braces for consistency.

Reviewers: craig.topper, RKSimon

Suscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D41983

llvm-svn: 322478
2018-01-15 12:05:33 +00:00
Simon Pilgrim 9904fe77a0 [X86][SSE] Support combining MOVLHPS undef inputs
llvm-svn: 322459
2018-01-14 18:50:34 +00:00
Craig Topper b2868233b7 [X86] Use ISD::TRUNCATE instead of X86ISD::VTRUNC when input and output types have the same number of elements.
llvm-svn: 322455
2018-01-14 08:11:36 +00:00
Craig Topper 57d58051bb [X86] Add X86ISD::VTRUNC to computeKnownBitsForTargetNode.
We have to take special care to avoid the cases where the result of the truncate would be padded with zero elements.

Ideally we'd just use ISD::TRUNCATE for these cases instead.

llvm-svn: 322454
2018-01-14 08:11:33 +00:00
Craig Topper e9fc0cd920 [X86] Improve legalization of vXi16/vXi8 selects.
Extend vXi1 conditions of vXi8/vXi16 selects even before type legalization gets a chance to split wide vectors. Previously we would only extend 128 and 256 bit vectors. But if we start with a 512 bit vector or wider that needs to be split we wouldn't extend until after the split had taken place. By extending early we improve the results of type legalization.

Don't widen condition of 128/256 bit vXi16/vXi8 selects when we have BWI but not VLX. We can still use a mask register by widening the select to 512-bits instead. This is similar to what we do for compares already.

llvm-svn: 322450
2018-01-14 02:05:51 +00:00
Zvi Rackover 652f9a1896 X86: Add pattern matching for PMADDWD
In addition to the existing match as part of a loop-reduction, add a
straightforward pattern match for DAG-contained patterns.

Reviewers: RKSimon, craig.topper

Subscribers: llvm-commits

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D41811

llvm-svn: 322446
2018-01-13 17:42:19 +00:00
Craig Topper 6f109f8c6c [X86] Add DAG combine to promote vXi1 result of a vXi8/vXi16 setcc when we have AVX512 but not BWI.
This avoids having the result type stick around until lowering where we have to extend the setcc and insert a truncate. If we get the types converted early we can do more to optimize it.

llvm-svn: 322432
2018-01-13 06:24:46 +00:00
Jessica Paquette 757e120379 [MachineOutliner] Move hasAddressTaken check to MachineOutliner.cpp
*Mostly* NFC. Still updating the test though just for completeness.

This moves the hasAddressTaken check to MachineOutliner.cpp and replaces it
with a per-basic block test rather than a per-function test. The old test was
too conservative and was preventing functions in C programs from being
outlined even though they were safe to outline.

This was mostly a problem in C sources.

llvm-svn: 322425
2018-01-13 00:42:28 +00:00
Tim Renouf 75ced9d5b8 [AMDGPU] stop image_store being moved illegally
Summary:
A recent change
321556: AMDGPU: Remove mayLoad/hasSideEffects from MIMG stores
can allow the machine instruction scheduler to move an image store past
an image load using the same descriptor.

V2: Fixed by marking image ops as mayAlias and isAliased. This may be
overly conservative, and we may need to revisit.
V3: Reverted test change done on 321556.

Reviewers: arsenm, nhaehnle, dstuttard

Subscribers: llvm-commits, t-tye, yaxunl, wdng, kzhuravl

Differential Revision: https://reviews.llvm.org/D41969

llvm-svn: 322419
2018-01-12 22:57:24 +00:00
Changpeng Fang 44dfa1de3b AMDGPU/SI: Add d16 support for buffer intrinsics.
Differential Revision:
  https://reviews.llvm.org/D38906

Reviewers:
  Matt and Brian.

llvm-svn: 322402
2018-01-12 21:12:19 +00:00
Florian Hahn 6a684b2593 Silence GCC 7 warning by using an enum class.
This silences the following GCC7 warning:

    lib/Target/Hexagon/HexagonISelDAGToDAGHVX.cpp:142:30: warning:
    enumeral and non-enumeral type in conditional expression [-Wextra]
         return F != Colors.end() ? F->second : None;
                    ~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~

Reviewers: amharc, RKSimon, davide

Reviewed By: RKSimon, davide

Differential Revision: https://reviews.llvm.org/D41003

llvm-svn: 322398
2018-01-12 20:35:45 +00:00
Evandro Menezes 2e05279399 [AArch64] Fix scheduling resources for post indexed loads and stores
Fix typos in the default scheduling resources when using the post indexed
addressing modes.

Differential revision: https://reviews.llvm.org/D40511

llvm-svn: 322392
2018-01-12 19:20:11 +00:00
Sam Clegg 5e102eeee6 MC: Remove redundant `SetUsed` arguments in MCSymbol methods
We can probably take this a step further since the only
user of the isUsed flag is AsmParser it should probably
be doing this explicitly. For now this is a step in the
right direction though.

Differential Revision: https://reviews.llvm.org/D41971

llvm-svn: 322386
2018-01-12 18:05:40 +00:00
Craig Topper cb09bd1227 [X86] Remove unused isel pattern for zero extend from v16i1/v8i1 to v16i32/v8i64.
We have custom lowering on vzext that produces a vselect and a build vector. So zext never gets to isel.

llvm-svn: 322381
2018-01-12 17:34:09 +00:00
Benjamin Kramer 309124e0b1 [PowerPC] Don't miscompile rotate+mask into an ANDIo if it can't recreate the immediate
I'm not even sure if this transform is ever worth it, but this at least
stops the bleeding.

llvm-svn: 322373
2018-01-12 15:03:24 +00:00
Nemanja Ivanovic ebb23078e9 [PowerPC] Zero-extend the compare operand for ATOMIC_CMP_SWAP
Part of the fix for https://bugs.llvm.org/show_bug.cgi?id=35812.
This patch ensures that the compare operand for the atomic compare and swap
is properly zero-extended to 32 bits if applicable.
A follow-up commit will fix the extension for the SETCC node generated when
expanding an ATOMIC_CMP_SWAP_WITH_SUCCESS. That will complete the bug fix.

Differential Revision: https://reviews.llvm.org/D41856

llvm-svn: 322372
2018-01-12 14:58:41 +00:00
Stefan Pintilie 70bfe66111 Revert "[PowerPC] Manually schedule the prologue and epilogue"
This reverts commit r322124 since some tests were broken by that patch.
Will recommmit once the patch is fixed.

llvm-svn: 322369
2018-01-12 13:12:49 +00:00
Diana Picus 2dc5405693 [ARM GlobalISel] Map G_FMA to FPR
llvm-svn: 322367
2018-01-12 12:06:01 +00:00
Diana Picus e74243d473 [ARM GlobalISel] Legalize G_FMA
For hard float with VFP4, it is legal. Otherwise, we use libcalls.

This needs a bit of support in the LegalizerHelper for soft float
because we didn't handle G_FMA libcalls yet. The support is trivial, as
the only difference between G_FMA and other libcalls that we already
handle is that it has 3 input operands rather than just 2.

llvm-svn: 322366
2018-01-12 11:30:45 +00:00
Andre Vieira 5627c218e1 [ARM] Add codegen for SMMULR, SMMLAR and SMMLSR
This patch teaches the Arm back-end to generate the SMMULR, SMMLAR and SMMLSR
instructions from equivalent IR patterns.

Differential Revision: https://reviews.llvm.org/D41775

llvm-svn: 322361
2018-01-12 09:24:41 +00:00
Andre Vieira 26b9de9ebb [ARM] Fix erroneous availability of SMMLS for Armv7-M
Differential Revision: https://reviews.llvm.org/D41855

llvm-svn: 322360
2018-01-12 09:21:09 +00:00
Craig Topper 72001f4647 [X86] Don't allow lods/stos/scas/cmps/movs to be parsed without a suffix and only memory operand in at&t syntax.
Without a register with a size being mentioned the instruction is ambiguous in at&t syntax. With Intel syntax the memory operation caries a size that can be used to disambiguate.

llvm-svn: 322356
2018-01-12 06:48:26 +00:00
Craig Topper 29ccb5c87d [X86] Don't require suffix on 'clr' mnemonic in intel syntax
llvm-svn: 322355
2018-01-12 06:48:24 +00:00
Craig Topper b1623321af [X86] Add 'l' and 'q' suffixes to the tbm instruction mnemonics.
While the suffix isn't required to disambiguate the instructions, it is required in order to parse the instructions when the suffix is specified in order to match the GNU assembler.

llvm-svn: 322354
2018-01-12 06:21:36 +00:00
Craig Topper 0ccbbf3f3b [X86] Disable sldtq parsing in 64-bit mode.
llvm-svn: 322353
2018-01-12 05:38:15 +00:00
Craig Topper 3554a71cf1 [X86] Disable movsq/stosq/scasqcmpsq/lodsq parsing in 64-bit mode.
llvm-svn: 322352
2018-01-12 05:38:14 +00:00
Ana Pazos e3d248361e [RISCV] Pass MCSubtargetInfo to print methods.
Summary:

This change allows checking for ISA extensions in print methods.

Reviewers: asb, niosHD

Reviewed By: asb, niosHD

Subscribers: llvm-commits, niosHD, asb, rbar, johnrusso, simoncook, jordy.potman.lists, sabuasal

Differential Revision: https://reviews.llvm.org/D41503

llvm-svn: 322345
2018-01-12 02:27:00 +00:00
David L. Jones 8c87213c26 Revert r322279 due to Skylake miscompile.
Summary:
This revision causes Skylake (and apparently, only Skylake) codegen to fail in
certain cases. Details: https://bugs.llvm.org/show_bug.cgi?id=35918

Subscribers: sanjoy, llvm-commits

Differential Revision: https://reviews.llvm.org/D41972

llvm-svn: 322335
2018-01-12 00:17:38 +00:00
Evgeniy Stepanov 99fa3e774d [hwasan] Stack instrumentation.
Summary:
Very basic stack instrumentation using tagged pointers.
Tag for N'th alloca in a function is built as XOR of:
 * base tag for the function, which is just some bits of SP (poor
   man's random)
 * small constant which is a function of N.

Allocas are aligned to 16 bytes. On every ReturnInst allocas are
re-tagged to catch use-after-return.

This implementation has a bunch of issues that will be taken care of
later:
1. lifetime intrinsics referring to tagged pointers are not
   recognized in SDAG. This effectively disables stack coloring.
2. Generated code is quite inefficient. There is one extra
   instruction at each memory access that adds the base tag to the
   untagged alloca address. It would be better to keep tagged SP in a
   callee-saved register and address allocas as an offset of that XOR
   retag, but that needs better coordination between hwasan
   instrumentation pass and prologue/epilogue insertion.
3. Lifetime instrinsics are ignored and use-after-scope is not
   implemented. This would be harder to do than in ASan, because we
   need to use a differently tagged pointer depending on which
   lifetime.start / lifetime.end the current instruction is dominated
   / post-dominated.

Reviewers: kcc, alekseyshl

Subscribers: srhines, kubamracek, javed.absar, hiraditya, llvm-commits

Differential Revision: https://reviews.llvm.org/D41602

llvm-svn: 322324
2018-01-11 22:53:30 +00:00
Matthias Braun ea4359e922 PeepholeOptimizer: Fix for vregs without defs
The PeepholeOptimizer would fail for vregs without a definition. If this
was caused by an undef operand abort to keep the code simple (so we
don't need to add logic everywhere to replicate the undef flag).

Differential Revision: https://reviews.llvm.org/D40763

llvm-svn: 322319
2018-01-11 22:30:43 +00:00
Rafael Espindola e4b0231c63 Make internal/private GVs implicitly dso_local.
While updating clang tests for having clang set dso_local I noticed
that:

- There are *a lot* of tests to update.
- Many of the updates are redundant.

They are redundant because a GV is "obviously dso_local". This patch
starts formalizing that a bit by requiring that internal and private
GVs be dso_local too. Since they all are, we don't have to print
dso_local to the textual representation, making it a bit more compact
and easier to read.

llvm-svn: 322317
2018-01-11 22:15:05 +00:00
Evgeniy Stepanov 5223b5d9d6 [arm] Implement Target Operand Flag MIR serialization.
Reviewers: efriedma, pcc

Subscribers: aemerson, javed.absar, kristof.beyls, hiraditya, llvm-commits

Differential Revision: https://reviews.llvm.org/D39975

llvm-svn: 322312
2018-01-11 21:37:58 +00:00
Craig Topper 2aac3ee5bc [X86] Legalize 128/256 gathers/scatters on KNL by using widening rather than sign extending the index.
We can just widen the vectors with undef and zero extend the mask.

llvm-svn: 322308
2018-01-11 19:38:30 +00:00
Krzysztof Parzyszek 240df6faa4 [Hexagon] Fix building 64-bit vector from constant values
The constants were aggregated in a reverse order.

llvm-svn: 322303
2018-01-11 18:30:41 +00:00
Krzysztof Parzyszek 4ef6cfff6a [Hexagon] Cast elements to correct type when creating constant vector
llvm-svn: 322301
2018-01-11 18:03:23 +00:00
Krzysztof Parzyszek be6fa82ee5 [Hexagon] Impose limits on container sizes in HexagonGenInsert
With over 300k virtual registers, the size of the data exceeded 12GB.
Impose limits on how much information is collected.

llvm-svn: 322299
2018-01-11 18:02:13 +00:00
Krzysztof Parzyszek e156e9ba0f [Hexagon] Use SetVector when queuing nodes to scan in selectVectorConstants
llvm-svn: 322298
2018-01-11 17:59:34 +00:00
Zvi Rackover 61beca9368 X86: Refactor type-splitting to target-legal size vector to a helper function
Summary: This is a preparatory step for D41811: refactoring code for breaking vector operands of binary operation to legal-types.

Reviewers: RKSimon, craig.topper, spatel

Reviewed By: RKSimon

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D41925

llvm-svn: 322296
2018-01-11 17:29:47 +00:00
Joel Jones 90a60501c3 [AArch64] Remove Unsupported = 1 flag for the WriteAtomic WriteRes.
In practice, this patch has no effect on scheduling.

There is no test case as there already exists a comprehensive test case for
LSE Atomics.

Patch by Stefan Teleman

Differential Revision: https://reviews.llvm.org/D40694

llvm-svn: 322291
2018-01-11 16:50:56 +00:00
Simon Pilgrim 6e6da3f449 [X86][SSE] Add ISD::VECTOR_SHUFFLE to faux shuffle decoding
Primarily, this allows us to use the aggressive extraction mechanisms in combineExtractWithShuffle earlier and make use of UNDEF elements that may be lost during lowering.

llvm-svn: 322279
2018-01-11 14:25:18 +00:00
Zvi Rackover 3ee66d9cd1 X86: Fix LowerBUILD_VECTORAsVariablePermute for case Src is smaller than Indices
Summary:
As RKSimon suggested in pr35820, in the case that Src is smaller in
bit-size than Indices, need to widen Src to avoid type mismatch.

Fixes pr35820

Reviewers: RKSimon, craig.topper

Reviewed By: RKSimon

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D41865

llvm-svn: 322272
2018-01-11 12:26:52 +00:00
Alex Bradbury 0715d35ed5 [RISCV] Reserve an emergency spill slot for the register scavenger when necessary
Although the register scavenger can often find a spare register, an emergency 
spill slot is needed to guarantee success. Reserve this slot in cases where 
the function is known to have a large stack (meaning the scavenger may be 
needed when forming stack addresses).

llvm-svn: 322269
2018-01-11 11:17:19 +00:00
Andrew V. Tischenko d037b1446b Implementation of X86Operand::print.
Differential Revision: https://reviews.llvm.org/D41610

llvm-svn: 322267
2018-01-11 10:31:01 +00:00
Stefan Maksimovic 5481c2176e [Mips] Handle one byte unsupported relocations
Fail gracefully instead of crashing upon encountering
this type of relocation.

Differential revision: https://reviews.llvm.org/D41857

llvm-svn: 322266
2018-01-11 10:07:47 +00:00
Craig Topper d1696e8d6c [X86] Fix unused variable in release builds.
llvm-svn: 322262
2018-01-11 07:19:29 +00:00
Craig Topper 0b59034b15 [X86] Optimize v2i32/v2f32 scatters.
If the index is v2i64 we can use the scatter instruction that has v4i32/v4f32 data register, v2i64 index, and v2i1 mask. Similar was already done for gather.

Implement custom widening for v2i32 data to remove the code that reverses type legalization during lowering.

llvm-svn: 322254
2018-01-11 06:31:28 +00:00
Matthias Braun e3a8db7ba1 Revert "AArch64: Fix emergency spillslot being out of reach for large callframes"
Revert for now as the testcase is hitting a pre-existing verifier error
that manifest as a failure when expensive checks are enabled (or
-verify-machineinstrs) is used.

This reverts commit r322200.

llvm-svn: 322231
2018-01-10 22:36:28 +00:00
Craig Topper 505f38a059 [X86] Move HasNOPL to a subtarget feature bit. Plumb MCSubtargetInfo through the MCAsmBackend constructor
After D41349, we can no get a MCSubtargetInfo into the MCAsmBackend constructor. This allows us to get NOPL from a subtarget feature rather than a CPU name blacklist.

Differential Revision: https://reviews.llvm.org/D41721

llvm-svn: 322227
2018-01-10 22:07:16 +00:00
Alex Bradbury 315cd3ace4 [RISCV] Implement support for the BranchRelaxation pass
Branch relaxation is needed to support branch displacements that overflow the
instruction's immediate field.

Differential Revision: https://reviews.llvm.org/D40830

llvm-svn: 322224
2018-01-10 21:05:07 +00:00
Alex Bradbury e027c93ac2 [RISCV] Implement branch analysis
This is a prerequisite for the branch relaxation pass, and allows a number of
optimisation passes (e.g. BranchFolding and MachineBlockPlacement) to work.

Differential Revision: https://reviews.llvm.org/D40808

llvm-svn: 322222
2018-01-10 20:47:00 +00:00
Alex Bradbury 70f137b6bf [RISCV] Add support for llvm.{frameaddress,returnaddress} intrinsics
llvm-svn: 322218
2018-01-10 20:12:00 +00:00
Alex Bradbury 9330e64485 [RISCV] Add basic support for inline asm constraints
llvm-svn: 322217
2018-01-10 20:05:09 +00:00
Alex Bradbury 9fea4881d0 [RISCV] Support stack frames and offsets up to 32-bits
Differential Revision: https://reviews.llvm.org/D40807

llvm-svn: 322216
2018-01-10 19:53:46 +00:00
Alex Bradbury c85be0de56 [RISCV] Support for varargs
Includes support for expanding va_copy. Also adds support for using 'aligned'
registers when necessary for vararg calls, and ensure the frame pointer always
points to the bottom of the vararg spill region. This is necessary to ensure
that the saved return address and stack pointer are always available at fixed
known offsets of the frame pointer.

Differential Revision: https://reviews.llvm.org/D40805

llvm-svn: 322215
2018-01-10 19:41:03 +00:00
Craig Topper af4eb17223 [SelectionDAG][X86] Explicitly store the scale in the gather/scatter ISD nodes
Currently we infer the scale at isel time by analyzing whether the base is a constant 0 or not. If it is we assume scale is 1, else we take it from the element size of the pass thru or stored value. This seems a little weird and I think it makes more sense to make it explicit in the DAG rather than doing tricky things in the backend.

Most of this patch is just making sure we copy the scale around everywhere.

Differential Revision: https://reviews.llvm.org/D40055

llvm-svn: 322210
2018-01-10 19:16:05 +00:00
Jessica Paquette c191f1097c [MachineOutliner] Outline ADRPs
ADRP instructions weren't being outlined because they're PC-relative and thus
fail the LR checks. This patch adds a special case for ADRPs to
getOutliningType to make sure that ADRPs can be outlined and updates the MIR
test.

llvm-svn: 322207
2018-01-10 18:49:57 +00:00
Matthias Braun b42ffa1283 AArch64: Fix emergency spillslot being out of reach for large callframes
Large callframes (calls with several hundreds or thousands or
parameters) could lead to situations in which the emergency spillslot is
out of range to be addressed relative to the stack pointer.
This commit forces the use of a frame pointer in the presence of large
callframes.

This commit does several things:
- Compute max callframe size at the end of instruction selection.
- Add mirFileLoaded target callback. Use it to compute the max callframe size
  after loading a .mir file when the size wasn't specified in the file.
- Let TargetFrameLowering::hasFP() return true if there exists a
  callframe > 255 bytes.
- Always place the emergency spillslot close to FP if we have a frame
  pointer.
- Note that `useFPForScavengingIndex()` would previously return false
  when a base pointer was available leading to the emergency spillslot
  getting allocated late (that's the whole effect of this callback).
  Which made no sense to me so I took this case out: Even though the
  emergency spillslot is technically not referenced by FP in this case
  we still want it allocated early.

Differential Revision: https://reviews.llvm.org/D40876

llvm-svn: 322200
2018-01-10 18:16:24 +00:00
Simon Pilgrim 8b63227279 [X86][MMX] Pull out common MMX VT test. NFCI.
llvm-svn: 322195
2018-01-10 15:32:19 +00:00
Dmitry Preobrazhensky 3afbd825a3 [AMDGPU][MC][GFX8][GFX9] Added XNACK_MASK support
See bug 35764: https://bugs.llvm.org/show_bug.cgi?id=35764

Differential Revision: https://reviews.llvm.org/D41614

Reviewers: vpykhtin, artem.tamazov, arsenm
llvm-svn: 322189
2018-01-10 14:22:19 +00:00
Sander de Smalen a7ec090eaa [AArch64][SVE] Asm: Add support for (mov|dup) of scalar
Summary: This patch adds support for 'dup' (Scalar -> SVE) and its corresponding 'mov' alias.

Reviewers: fhahn, rengolin, evandro, echristo

Reviewed By: fhahn

Subscribers: aemerson, javed.absar, tschuett, kristof.beyls, llvm-commits

Differential Revision: https://reviews.llvm.org/D41822

llvm-svn: 322172
2018-01-10 11:32:47 +00:00
Diana Picus 0ed7513c83 [ARM GlobalISel] Map G_FNEG to the FPR bank
llvm-svn: 322169
2018-01-10 11:13:31 +00:00
Diana Picus f949a0abac [ARM GlobalISel] Legalize G_FNEG for s32 and s64
For hard float, it is legal.

For soft float, we need to lower to 0 - x first, and then we can use the
libcall for G_FSUB. This is undoing some of the canonicalization
performed by the IRTranslator (which introduces G_FNEG when it sees a
0 - x). Ideally, that canonicalization would be performed by a
pre-legalizer pass that would allow targets to opt out of this behaviour
rather than dance around it in the legalizer.

llvm-svn: 322168
2018-01-10 10:45:34 +00:00
Sander de Smalen 886510f350 [TableGen][AsmMatcherEmitter] Generate assembler checks for tied operands
Summary:
This extends TableGen's AsmMatcherEmitter with code that generates
a table with tied-operand constraints. The constraints are checked
when parsing the instruction. If an operand is not equal to its tied operand,
the assembler will give an error.

Patch [2/3] in a series to add operand constraint checks for SVE's predicated ADD/SUB.

Reviewers: olista01, rengolin, mcrosier, fhahn, craig.topper, evandro, echristo

Reviewed By: fhahn

Subscribers: javed.absar, llvm-commits

Differential Revision: https://reviews.llvm.org/D41446

llvm-svn: 322166
2018-01-10 10:10:56 +00:00
Jonas Paulsson 1a76f3a2c2 Temporarily revert
"[SystemZ]  Check for legality before doing LOAD AND TEST transformations."

, due to test failures.

llvm-svn: 322165
2018-01-10 10:05:55 +00:00
Diana Picus 8f14886630 [ARM GlobalISel] Legalize s32/s64 G_FCONSTANT
Legal for hard float.
Change to G_CONSTANT for soft float (but preserve the binary
representation).

llvm-svn: 322164
2018-01-10 10:01:49 +00:00
Diana Picus 734a5e8912 [ARM GlobalISel] Legalize G_CONSTANT for scalars > 32 bits
Make G_CONSTANT narrow for any scalars larger than 32 bits.

llvm-svn: 322162
2018-01-10 09:32:01 +00:00
Jonas Paulsson d9dde1ac56 [SystemZ] Check for legality before doing LOAD AND TEST transformations.
Since a load and test instruction treat its operands as signed, it can only
replace a logical compare for EQ/NE uses.

Review: Ulrich Weigand
https://bugs.llvm.org/show_bug.cgi?id=35662

llvm-svn: 322161
2018-01-10 09:18:17 +00:00
Stefan Pintilie 1712700842 [PowerPC] Manually schedule the prologue and epilogue
This patch makes the following changes to the schedule of instructions in the
prologue and epilogue.

The stack pointer update is moved down in the prologue so that the callee saves
do not have to wait for the update to happen.
Saving the lr is moved down in the prologue to hide the latency of the mflr.
The stack pointer is moved up in the epilogue so that restoring of the lr can
happen sooner.
The mtlr is moved up in the epilogue so that it is away form the blr at the end
of the epilogue. The latency of the mtlr can now be hidden by the loads of the
callee saved registers.

This commit is almost identical to this one: r322036 except that two warnings
that broke build bots have been fixed.

The revision number is D41737 as before.

llvm-svn: 322124
2018-01-09 21:57:49 +00:00
Tim Renouf 6eaad1e539 [AMDGPU] Fixed incorrect uniform branch condition
Summary:
I had a case where multiple nested uniform ifs resulted in code that did
v_cmp comparisons, combining the results with s_and_b64, s_or_b64 and
s_xor_b64 and using the resulting mask in s_cbranch_vccnz, without first
ensuring that bits for inactive lanes were clear.

There was already code for inserting an "s_and_b64 vcc, exec, vcc" to
clear bits for inactive lanes in the case that the branch is instruction
selected as s_cbranch_scc1 and is then changed to s_cbranch_vccnz in
SIFixSGPRCopies. I have added the same code into SILowerControlFlow for
the case that the branch is instruction selected as s_cbranch_vccnz.

This de-optimizes the code in some cases where the s_and is not needed,
because vcc is the result of a v_cmp, or multiple v_cmp instructions
combined by s_and/s_or. We should add a pass to re-optimize those cases.

Reviewers: arsenm, kzhuravl

Subscribers: wdng, yaxunl, t-tye, llvm-commits, dstuttard, timcorringham, nhaehnle

Differential Revision: https://reviews.llvm.org/D41292

llvm-svn: 322119
2018-01-09 21:34:43 +00:00
Alexey Bataev 771ec9f399 [COST]Fix PR35865: Fix cost model evaluation for shuffle on X86.
Summary:
If the vector type is transformed to non-vector single type, the compile
may crash trying to get vector information about non-vector type.

Reviewers: RKSimon, spatel, mkuper, hfinkel

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D41862

llvm-svn: 322106
2018-01-09 19:08:22 +00:00
Derek Schuff e9c278ccf1 [WebAssembly] Update libcall signature lists
New signatures added in r322087. A fix for this tight coupling is forthcoming.

llvm-svn: 322105
2018-01-09 19:05:34 +00:00
Craig Topper c4d2dd80b6 [X86] Add a DAG combine to combine (sext (setcc)) with VLX
Normally target independent DAG combine would do this combine based on getSetCCResultType, but with VLX getSetCCResultType returns a vXi1 type preventing the DAG combining from kicking in.

But doing this combine can allow us to remove the explicit sign extend that would otherwise be emitted.

This patch adds a target specific DAG combine to combine the sext+setcc when the result type is the same size as the input to the setcc. I've restricted this to FP compares and things that can be represented with PCMPEQ and PCMPGT since we don't have full integer compare support on the older ISAs.

Differential Revision: https://reviews.llvm.org/D41850

llvm-svn: 322101
2018-01-09 18:14:22 +00:00
Francis Visoiu Mistrih 7d9bef8f5c [CodeGen] Don't print "pred:" and "opt:" in -debug output
In -debug output we print "pred:" whenever a MachineOperand is a
predicate operand in the instruction descriptor, and "opt:" whenever a
MachineOperand is an optional def in the instruction descriptor.

Differential Revision: https://reviews.llvm.org/D41870

llvm-svn: 322096
2018-01-09 17:31:07 +00:00
Sander de Smalen 906a5deace Recommit r322073: [AArch64][SVE] Asm: Add predicated ADD/SUB instructions
Fixed issue that was found on sanitizer-x86_64-linux-fast.
I changed the result type of 'Parser.getTok().getString().lower()'
in AArch64AsmParser::tryParseSVEPredicateVector() from 'StringRef' to
'auto', since StringRef::lower() returns a std::string.

llvm-svn: 322092
2018-01-09 17:01:27 +00:00
Sander de Smalen 6595603187 Reverted r322073 because of AddressSanitizer failure on
sanitizer-x86_64-linux-fast builder.

llvm-svn: 322077
2018-01-09 13:51:09 +00:00
Sander de Smalen 1f97363e5f [AArch64][SVE] Asm: Add predicated ADD/SUB instructions
Summary:
Add the predicated ADD/SUB instructions and corresponding tests.

Patch [3/3] in a series to add predicated ADD/SUB instructions for SVE.

Reviewers: rengolin, mcrosier, evandro, fhahn, echristo

Reviewed By: fhahn

Subscribers: aemerson, javed.absar, tschuett, llvm-commits, kristof.beyls

Differential Revision: https://reviews.llvm.org/D41443

llvm-svn: 322073
2018-01-09 12:43:46 +00:00
Sander de Smalen 7868e74033 [AArch64][SVE] Asm: Add parsing of merging/zeroing suffix for SVE predicate vector operands
Summary:
Parsing of the '/m' (merging) or '/z' (zeroing) suffix of a predicate operand.

Patch [2/3] in a series to add predicated ADD/SUB instructions for SVE.

Reviewers: rengolin, mcrosier, evandro, fhahn, echristo, MatzeB, t.p.northover

Reviewed By: fhahn

Subscribers: t.p.northover, MatzeB, aemerson, javed.absar, tschuett, llvm-commits, kristof.beyls

Differential Revision: https://reviews.llvm.org/D41442

llvm-svn: 322070
2018-01-09 11:17:06 +00:00
Nikolai Bozhenov eededdade9 [Nios2] Arithmetic instructions for R1 and R2 ISA.
Summary:
This commit enables some of the arithmetic instructions for Nios2 ISA (for both
R1 and R2 revisions), implements facilities required to emit those instructions
and provides LIT tests for added instructions.

Reviewed By: hfinkel

Differential Revision: https://reviews.llvm.org/D41236

Author: belickim <mateusz.belicki@intel.com>
llvm-svn: 322069
2018-01-09 11:15:08 +00:00
Oren Ben Simhon 1c6308ecd5 Instrument Control Flow For Indirect Branch Tracking
CET (Control-Flow Enforcement Technology) introduces a new mechanism called IBT (Indirect Branch Tracking).
According to IBT, each Indirect branch should land on dedicated ENDBR instruction (End Branch).
The new pass adds ENDBR instructions for every indirect jmp/call (including jumps using jump tables / switches).
For more information, please see the following:
https://software.intel.com/sites/default/files/managed/4d/2a/control-flow-enforcement-technology-preview.pdf

Differential Revision: https://reviews.llvm.org/D40482

Change-Id: Icb754489faf483a95248f96982a4e8b1009eb709
llvm-svn: 322062
2018-01-09 08:51:18 +00:00
Craig Topper def1c30c66 [X86] Allow more cmpps/pd immediate encodings to be commuted during isel.
The code that checks the immediate wasn't masking to the lower 3-bits like the code in X86InstrInfo.cpp that's used by the peephole pass does.

llvm-svn: 322060
2018-01-09 07:09:34 +00:00
Sean Fertile 33a17762bb [PowerPC] Can not assume an intrinsic argument is a simple type.
The CTRLoop pass performs checks on the argument of certain libcalls/intrinsics,
and assumes the arguments must be of a simple type. This isn't always the case
though. For example if we unroll and vectorize a loop we may end up with vectors
larger then the largest legal type, along with intrinsics that operate on those
wider types. This happened in the ffmpeg build, where we unrolled a loop and
ended up with a sqrt intrinsic that operated on V16f64, triggering an assertion.

Differential Revision: https://reviews.llvm.org/D41758

llvm-svn: 322055
2018-01-09 03:03:41 +00:00
Eric Christopher c44717774a Remove unused function HvxSelector::zerous.
llvm-svn: 322053
2018-01-09 02:38:17 +00:00
Stefan Pintilie 7e10987b12 Revert "[PowerPC] Manually schedule the prologue and epilogue"
[PowerPC] This reverts commit r322036.

Failing build bots. Revert the commit now.

llvm-svn: 322051
2018-01-09 01:06:21 +00:00
Craig Topper cc342d465e [X86] Remove llvm.x86.avx512.cvt*2mask.* intrinsics and autoupgrade to (icmp slt X, 0)
I had to drop fast-isel-abort from a test because we can't fast isel some of the mask stuff. When we used intrinsics we implicitly fell back to SelectionDAG for the intrinsic call without triggering the abort error. But with native IR that doesn't happen the same way.

llvm-svn: 322050
2018-01-09 00:50:47 +00:00
Craig Topper 7c2abdd249 [X86] Remove unnecessary isel pattern that is a combination of two other patterns.
The pattern was this

 def : Pat<(i32 (zext (i8 (bitconvert (v8i1 VK8:$src))))),
           (MOVZX32rr8 (EXTRACT_SUBREG (i32 (COPY_TO_REGCLASS VK8:$src, GR32)), sub_8bit))>, Requires<[NoDQI]>;

but if you just let (i32 (zext X)) match byte itself you'll get MOVZX32rr8. And if you let (i8 (bitconvert (v8i1 VK8:$src))) match by itself you'll get (EXTRACT_SUBREG (i32 (COPY_TO_REGCLASS VK8:$src, GR32)), sub_8bit).

So we can just let isel do the two patterns naturally.

llvm-svn: 322049
2018-01-09 00:50:42 +00:00
Jessica Paquette 3291e7353e [MachineOutliner] AArch64: Handle instrs that use SP and will never need fixups
This commit does two things. Firstly, it adds a collection of flags which can
be passed along to the target to encode information about the MBB that an
instruction lives in to the outliner.

Second, it adds some of those flags to the AArch64 outliner in order to add
more stack instructions to the list of legal instructions that are handled
by the outliner. The two flags added check if

- There are calls in the MachineBasicBlock containing the instruction
- The link register is available in the entire block

If the link register is available and there are no calls, then a stack
instruction can always be outlined without fixups, regardless of what it is,
since in this case, the outliner will never modify the stack to create a
call or outlined frame.

The motivation for doing this was checking which instructions are most often
missed by the outliner. Instructions like, say

%sp<def> = ADDXri %sp, 32, 0; flags: FrameDestroy

are very common, but cannot be outlined in the case that the outliner might
modify the stack. This commit allows us to outline instructions like this.
  

llvm-svn: 322048
2018-01-09 00:26:18 +00:00