Commit Graph

23794 Commits

Author SHA1 Message Date
Francis Visoiu Mistrih e85b06d65f [CodeGen] Use MIR syntax for MachineMemOperand printing
Get rid of the "; mem:" suffix and use the one we use in MIR: ":: (load 2)".

rdar://38163529

Differential Revision: https://reviews.llvm.org/D42377

llvm-svn: 327580
2018-03-14 21:52:13 +00:00
Simon Pilgrim adf72e8549 [X86] Add haswell testing for PR35635 as well.
To improve complete model testing for schedulers for instructions with multiple results.

llvm-svn: 327572
2018-03-14 21:03:09 +00:00
Francis Visoiu Mistrih 164560bd74 [AArch64] Emit CSR loads in the same order as stores
Optionally allow the order of restoring the callee-saved registers in the
epilogue to be reversed.

The flag -reverse-csr-restore-seq generates the following code:

```
stp     x26, x25, [sp, #-64]!
stp     x24, x23, [sp, #16]
stp     x22, x21, [sp, #32]
stp     x20, x19, [sp, #48]

; [..]

ldp     x24, x23, [sp, #16]
ldp     x22, x21, [sp, #32]
ldp     x20, x19, [sp, #48]
ldp     x26, x25, [sp], #64
ret
```

Note how the CSRs are restored in the same order as they are saved.

One exception to this rule is the last `ldp`, which allows us to merge
the stack adjustment and the ldp into a post-index ldp. This is done by
first generating:

ldp x26, x27, [sp]
add sp, sp, #64

which gets merged by the arm64 load store optimizer into

ldp x26, x25, [sp], #64

The flag is disabled by default.

llvm-svn: 327569
2018-03-14 20:34:03 +00:00
Craig Topper 9c098ed819 [X86] Add back fast-isel code for handling i8 shifts.
I removed this in r316797 because the coverage report showed no coverage and I thought it should have been handled by the auto generated table. I now see that there is code that bypasses the table if the shift amount is out of bounds.

This adds back the code. We'll codegen out of bounds i8 shifts to effectively (amount & 0x1f). The 0x1f is a strange quirk of x86 that shift amounts are always masked to 5-bits(except 64-bits). So if the masked value is still out bounds the result will be 0.

Fixes PR36731.

llvm-svn: 327540
2018-03-14 17:57:19 +00:00
Francis Visoiu Mistrih 084e7d8770 [AArch64] Keep track of MIFlags in the LoadStoreOptimizer
Merging:

* $x26, $x25 = frame-setup LDPXi $sp, 0
* $sp = frame-destroy ADDXri $sp, 64, 0

into an LDPXpost should preserve the flags from both instructions as
following:

* frame-setup frame-destroy LDPXpost

Differential Revision: https://reviews.llvm.org/D44446

llvm-svn: 327533
2018-03-14 17:10:58 +00:00
Craig Topper b36cb20ef9 [X86] Teach X86TargetLowering::targetShrinkDemandedConstant to set non-demanded bits if it helps created an and mask that can be matched as a zero extend.
I had to modify the bswap recognition to allow unshrunk masks to make this work.

Fixes PR36689.

Differential Revision: https://reviews.llvm.org/D44442

llvm-svn: 327530
2018-03-14 16:55:15 +00:00
Simon Pilgrim d1c3c995c0 [X86][AVX] Use WriteFShuffleLd for broadcast reg-mem instructions
They shouldn't be treated as pure loads.

Found while investigating D44428

llvm-svn: 327524
2018-03-14 15:47:08 +00:00
Arnold Schwaighofer bf1638daa8 SjLjEHPrepare: Don't reg-to-mem swifterror values
swifterror llvm values model the swifterror register as memory at the
LLVM IR level. ISel will perform adhoc mem-to-reg on them. swifterror
values are constraint in how they can be used. Spilling them to memory
is not allowed.

SjLjEHPrepare tried to lower swifterror values to memory which is
unecessary since the back-end will spill and reload the register as
neccessary (as long as clobbering calls are marked as such which is the
case here) and further leads to invalid IR because swifterror values
can't be stored to memory.

rdar://38164004

llvm-svn: 327521
2018-03-14 15:44:07 +00:00
Alexander Ivchenko 86ef9ab28f [GlobalIsel][X86] Support for G_SDIV instruction
Reviewed By: igorb

Differential Revision: https://reviews.llvm.org/D44430

llvm-svn: 327520
2018-03-14 15:41:11 +00:00
Simon Pilgrim d594942928 [X86][Btver2] Fix YMM shuffle, permute and permutevar scheduler costs
Account for ymm double pumping and add proper pshufb/permutevar support

llvm-svn: 327510
2018-03-14 14:05:19 +00:00
Simon Pilgrim de995e6e37 [X86][SSE] Use WriteFShuffleLd for MOVDDUP/MOVSHDUP/MOVSLDUP reg-mem instructions
They shouldn't be treated as pure loads.

Found while investigating D44428

llvm-svn: 327505
2018-03-14 13:22:56 +00:00
Martin Storsjo bde677289a [AArch64] Don't produce R_AARCH64_TLSLE_LDST32_TPREL_LO12_NC
Support for this relocation is missing in both LLD and GNU binutils
at the moment.

This reverts the ELF parts of SVN r327316.

llvm-svn: 327503
2018-03-14 13:09:10 +00:00
Alexander Ivchenko 0bd4d8c901 [GlobalISel][X86] Support G_LSHR/G_ASHR/G_SHL
Support G_LSHR/G_ASHR/G_SHL. We have 3 variance for
shift instructions : shift gpr, shift imm, shift 1.
Currently GlobalIsel TableGen generate patterns for
shift imm and shift 1, but with shiftCount i8.
In G_LSHR/G_ASHR/G_SHL like LLVM-IR both arguments
has the same type, so for now only shift i8 can use
auto generated TableGen patterns.

The support of G_SHL/G_ASHR enables tryCombineSExt
from LegalizationArtifactCombiner.h to hit, which
results in different legalization for the following tests:
    LLVM :: CodeGen/X86/GlobalISel/ext-x86-64.ll
    LLVM :: CodeGen/X86/GlobalISel/gep.ll
    LLVM :: CodeGen/X86/GlobalISel/legalize-ext-x86-64.mir

-; X64-NEXT:    movsbl %dil, %eax
+; X64-NEXT:    movl $24, %ecx
+; X64-NEXT:    # kill: def $cl killed $ecx
+; X64-NEXT:    shll %cl, %edi
+; X64-NEXT:    movl $24, %ecx
+; X64-NEXT:    # kill: def $cl killed $ecx
+; X64-NEXT:    sarl %cl, %edi
+; X64-NEXT:    movl %edi, %eax

..which is not optimal and should be addressed later.

Rework of the patch by igorb

Reviewed By: igorb

Differential Revision: https://reviews.llvm.org/D44395

llvm-svn: 327499
2018-03-14 11:23:57 +00:00
Alexander Ivchenko 327de80529 [GlobalIsel][X86] Support for G_ZEXT instruction
Reviewed By: igorb

Differential Revision: https://reviews.llvm.org/D44378

llvm-svn: 327482
2018-03-14 09:11:23 +00:00
Craig Topper 9ca7e67c4c [X86] Re-generate test to get proper capitalization of its CHECK lines. NFC
llvm-svn: 327462
2018-03-13 23:31:48 +00:00
Craig Topper cc060e921b [X86] Rewrite LowerAVXCONCAT_VECTORS similar to how we handle vXi1 concats.
This better able to detect undef and zeros pieces in the concat. Or cases when only one subvector is non-zero. This allows us to avoid silly things like double inserts into progressively larger undefs.

This still builds 512 bit concats of 128 bits by building up through 256 bits first. But I don't know if that's best.

We probably want to merge this with the vXi1 concat code since they are very similar.

llvm-svn: 327454
2018-03-13 22:05:25 +00:00
Craig Topper 4aeec51986 [DAGCombiner] Allow visitEXTRACT_SUBVECTOR to combine with BUILD_VECTORS between LegalizeVectorOps and LegalizeDAG.
BUILD_VECTORs aren't themselves legalized until LegalizeDAG so we should still be able to create an "illegal" one before that. This helps combine with BUILD_VECTORS that are introduced during LegalizeVectorOps due to unrolling.

llvm-svn: 327446
2018-03-13 20:36:28 +00:00
Francis Visoiu Mistrih 3abf05739f [MIR] Allow frame-setup and frame-destroy on the same instruction
Nothing prevents us from having both frame-setup and frame-destroy on
the same instruction.

When merging:
* frame-setup OPCODE1
* frame-destroy OPCODE2
into
* frame-setup frame-destroy OPCODE3

we want to be able to print and parse both flags.

llvm-svn: 327442
2018-03-13 19:53:16 +00:00
Sanjay Patel bb45cc126d [x86] add test for WriteZero sched class instructions; NFC
Nops should have zero latency because there is no result.
Idioms like 'xorps xmm0, xmm0' may have zero latency because 
they are handled without using an execution unit.

llvm-svn: 327435
2018-03-13 19:20:01 +00:00
Simon Pilgrim 9855b39380 [DAGCombine] visitREM - Don't assume that one divrem isn't driving another
Under some circumstances the divrems won't have been combined together before getting to this code.

So replace the assertion with a if() guard to not expand to X-((X/C)*C) to give the other combine chance to happen.

Reduced from OSS-Fuzz #6883
https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=6883

llvm-svn: 327424
2018-03-13 17:17:15 +00:00
Simon Pilgrim 3d4c86d399 [X86][Btver2] Split i8/i16/i32/i64 div/idiv costs
We were assuming a mixture of 32/64 division costs.

llvm-svn: 327407
2018-03-13 15:22:24 +00:00
Simon Dardis 476ed8f26e [mips] Fix the definitions of the EVA instructions
Correct their availability to their respective ISAs.

Reviewers: atanasyan

Differential Revision: https://reviews.llvm.org/D44209

llvm-svn: 327403
2018-03-13 14:39:44 +00:00
Simon Dardis 9d7e9032f1 [mips] Don't create nested CALLSEQ_START..CALLSEQ_END nodes.
For the MIPS O32 ABI, the current call lowering logic naively lowers each
call, creating the reserved argument area to hold the argument spill areas for
$a0..$a3 and the outgoing parameter area if one is required at each call site.

In the case of a sufficently large byval argument, a call to memcpy is used
to write the start+16..end of the argument into the outgoing parameter area.
This is done within the CALLSEQ_START..CALLSEQ_END of the callee. The CALLSEQ
nodes are responsible for performing the necessary stack adjustments.

Since the O32/N32/N64 MIPS ABIs do not have a red-zone and writing below the
stack pointer and reading the values back is unpredictable, the call to memcpy
cannot be hoisted out of the callee's CALLSEQ nodes.

However, for the O32 ABI requires the reserved argument area for functions
which have parameters. The naive lowering of calls will then create nested
CALLSEQ sequences. For N32 and N64 these nodes are also created, but with
zero stack adjustments as those ABIs do not have a reserved argument area.

This patch addresses the correctness issue by recognizing the special case
of lowering a byval argument that uses memcpy. By recognizing that the
incoming chain already has a CALLSEQ_START node on it when calling memcpy,
the CALLSEQ nodes are not created. For the N32 and N64 ABIs, this is not an
issue, as no stack adjustment has to be performed.

For the O32 ABI, the correctness reasoning is different. In the case of a
sufficently large byval argument, registers a0..a3 are going to be used for
the callee's arguments, mandating the creation of the reserved argument area.
The call to memcpy in the naive case will also create its own reserved
argument area. However, since the reserved argument area consists of undefined
values, both calls can use the same reserved argument area.

Reviewers: abeserminji, atanasyan

Differential Revision: https://reviews.llvm.org/D44296

llvm-svn: 327388
2018-03-13 12:50:03 +00:00
Simon Pilgrim 93bd7187f4 [X86][SSE41] createVariablePermute v2X64 - PCMPEQQ can test for index 0/1 and select between them.
llvm-svn: 327385
2018-03-13 12:22:58 +00:00
Jonas Paulsson 5612bb292c [CodeGenPrepare] Respect endianness in splitMergedValStore.
splitMergedValStore will split a store into two if target prefers this, or if
-force-split-store is passed.

This patch adds the missing handling for endianness in this function along
with a test case.

Review: Eli Friedman
https://reviews.llvm.org/D44396

llvm-svn: 327375
2018-03-13 08:36:20 +00:00
Yonghong Song c88bcdec43 bpf: Extends zero extension elimination beyond comparison instructions
The current zero extension elimination was restricted to operands of
comparison. It actually could be extended to more cases.

For example:

  int *inc_p (int *p, unsigned a)
  {
    return p + a;
  }

'a' will be promoted to i64 during addition, and the zero extension could
be eliminated as well.

For the elimination optimization, it should be much better to start
recognizing the candidate sequence from the SRL instruction instead of J*
instructions.

This patch makes it an generic zero extension elimination pass instead of
one restricted with comparison.

Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
Signed-off-by: Yonghong Song <yhs@fb.com>
llvm-svn: 327367
2018-03-13 06:47:03 +00:00
Yonghong Song 905d13c123 bpf: J*_RR should check both operands
There is a mistake in current code that we "break" out the optimization
when the first operand of J*_RR doesn't qualify the elimination. This
caused some elimination opportunities missed, for example the one in the
testcase.

The code should just fall through to handle the second operand.

Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
Signed-off-by: Yonghong Song <yhs@fb.com>
llvm-svn: 327366
2018-03-13 06:47:02 +00:00
Yonghong Song 89e47ac671 bpf: Tighten subregister definition check
The current subregister definition check stops after the MOV_32_64
instruction.

This means we are thinking all the following instruction sequences
are safe to be eliminated:

  MOV_32_64 rB, wA
  SLL_ri    rB, rB, 32
  SRL_ri    rB, rB, 32

However, this is *not* true. The source subregister wA of MOV_32_64 could
come from a implicit truncation of 64-bit register in which case the high
bits of the 64-bit register is not zeroed, therefore we can't eliminate
above sequence.

For example, for i32_val, we shouldn't do the elimination:

  long long bar ();

  int foo (int b, int c)
  {
    unsigned int i32_val = (unsigned int) bar();

    if (i32_val < 10)
      return b;
    else
      return c;
  }

Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
Signed-off-by: Yonghong Song <yhs@fb.com>
llvm-svn: 327365
2018-03-13 06:47:00 +00:00
Yonghong Song fddb9f4e28 bpf: Add more check directives in peephole testcase
Improve the test accuracy by adding more check directives.

Shifts are expected to be eliminated for zero extension but not for signed
extension.

Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
Signed-off-by: Yonghong Song <yhs@fb.com>
llvm-svn: 327364
2018-03-13 06:46:59 +00:00
Craig Topper 80058e30cc [LegalizeTypes] In SplitVecOp_TruncateHelper, use GetSplitVector on the input instead of creating new extract_subvectors.
llvm-svn: 327355
2018-03-13 01:17:40 +00:00
Martin Storsjo 7bc64bd889 [AArch64] Fold adds with tprel_lo12_nc and secrel_lo12 into a following ldr/str
Differential Revision: https://reviews.llvm.org/D44355

llvm-svn: 327316
2018-03-12 18:47:43 +00:00
Krzysztof Parzyszek 36ca823f76 [Hexagon] Fix typo in testcase
llvm-svn: 327310
2018-03-12 18:29:47 +00:00
Krzysztof Parzyszek 2d08f2ebf8 [Hexagon] Counting leading/trailing bits is cheap
llvm-svn: 327308
2018-03-12 18:18:23 +00:00
Krzysztof Parzyszek 5d41cc19bd [Hexagon] Subtarget feature to emit one instruction per packet
This adds two features: "packets", and "nvj".

Enabling "packets" allows the compiler to generate instruction packets,
while disabling it will prevent it and disable all optimizations that
generate them. This feature is enabled by default on all subtargets.
The feature "nvj" allows the compiler to generate new-value jumps and it
implies "packets". It is enabled on all subtargets.

The exception is made for packets with endloop instructions, since they
require a certain minimum number of instructions in the packets to which
they apply. Disabling "packets" will not prevent hardware loops from
being generated.

llvm-svn: 327302
2018-03-12 17:47:46 +00:00
Yaxun Liu a99e7d8e44 [AMDGPU] Fix lowering enqueue kernel when kernel has no name
Since the enqueued kernels have internal linkage, their names may be dropped.
In this case, give them unique names __amdgpu_enqueued_kernel or
__amdgpu_enqueued_kernel.n where n is a sequential number starting from 1.

Differential Revision: https://reviews.llvm.org/D44322

llvm-svn: 327291
2018-03-12 16:34:06 +00:00
Krzysztof Parzyszek ea2324f882 [Hexagon] Add REQUIRES: asserts to testcases that use -stats
llvm-svn: 327281
2018-03-12 15:20:36 +00:00
Krzysztof Parzyszek 5b39f7cfef [Hexagon] Add REQUIRES: asserts to testcases that use -debug-only
llvm-svn: 327279
2018-03-12 15:11:16 +00:00
Dmitry Preobrazhensky da4a7c01bf [AMDGPU][MC] Corrected GATHER4 opcodes
See bug 36252: https://bugs.llvm.org/show_bug.cgi?id=36252

Differential Revision: https://reviews.llvm.org/D43874

Reviewers: artem.tamazov, arsenm
llvm-svn: 327278
2018-03-12 15:03:34 +00:00
Krzysztof Parzyszek 046090db53 [Hexagon] Add more lit tests
llvm-svn: 327271
2018-03-12 14:01:28 +00:00
Matt Arsenault 7b9ed89dcf AMDGPU/GlobalISel: Legality and RegBankInfo for G_{INSERT|EXTRACT}_VECTOR_ELT
llvm-svn: 327269
2018-03-12 13:35:53 +00:00
Matt Arsenault c0aefd561e AMDGPU/GlobalISel: InstrMapping for G_MERGE_VALUES
llvm-svn: 327268
2018-03-12 13:35:49 +00:00
Matt Arsenault 503afda95f AMDGPU/GlobalISel: Make some G_MERGE_VALUEs legal
llvm-svn: 327267
2018-03-12 13:35:43 +00:00
Simon Pilgrim 6618e2a09c [X86][SSE] createVariablePermute - PSHUFB requires SSSE3 not just SSE3
llvm-svn: 327259
2018-03-12 12:30:04 +00:00
Simon Pilgrim d09cc9c62c [X86][MMX] Support MMX build vectors to avoid SSE usage (PR29222)
64-bit MMX vector generation usually ends up lowering into SSE instructions before being spilled/reloaded as a MMX type.

This patch creates a MMX vector from MMX source values, taking the lowest element from each source and constructing broadcasts/build_vectors with direct calls to the MMX PUNPCKL/PSHUFW intrinsics.

We're missing a few consecutive load combines that could be handled in a future patch if that would be useful - my main interest here is just avoiding a lot of the MMX/SSE crossover.

Differential Revision: https://reviews.llvm.org/D43618

llvm-svn: 327247
2018-03-11 19:22:13 +00:00
Simon Pilgrim 55ed3dc676 [X86][AVX512] Added more non-VLX test cases
Cleaned up check prefixes so that they actually share a bit more

llvm-svn: 327246
2018-03-11 18:28:37 +00:00
Simon Pilgrim 30f74c14ff [X86][AVX] createVariablePermute - scale v16i16 variable permutes to use v32i8 codegen
XOP was already doing this, and now AVX performs v32i8 variable permutes as well.

llvm-svn: 327245
2018-03-11 17:23:54 +00:00
Simon Pilgrim b306501796 [X86][AVX] createVariablePermute - widen permutes for cases where the source vector is wider than the destination type
llvm-svn: 327244
2018-03-11 17:00:46 +00:00
Simon Pilgrim 9a5d0c7540 [X86][AVX] createVariablePermute - use PSHUFB+PCMPGT+SELECT for v32i8 variable permutes
Same as the VPERMILPS/VPERMILPD approach for v8f32/v4f64 cases, rely on PSHUFB using bits[3:0] for indexing - we can ignore the sign bit (zero element) as those index vector values are considered undefined. The select between the lo/hi permute results based on the index size.

llvm-svn: 327242
2018-03-11 16:28:11 +00:00
Simon Pilgrim f9cc80d218 [X86][AVX] createVariablePermute - use 2xVPERMIL+PCMPGT+SELECT for v8i32/v8f32 and v4i64/v4f64 variable permutes
As VPERMILPS/VPERMILPD only selects elements based on the bits[1:0]/bit[1] then we can permute both the (repeated) lo/hi 128-bit vectors in each case and then select between these results based on whether the index was for for lo/hi.

For v4i64/v4f64 this avoids some rather nasty v4i64 multiples on the AVX2 implementation, which seems to be worse than the extra port5 pressure from the additional shuffles/blends.

llvm-svn: 327239
2018-03-11 11:52:26 +00:00
Simon Pilgrim 2565bd421e [X86][AVX512] createVariablePermute - Non-VLX targets can widen v4i64/v8f64 variable permutes to v8i64/v8f64
Permutes in the upper elements will be undefined, but they will be discarded anyway.

llvm-svn: 327238
2018-03-11 11:19:19 +00:00