Commit Graph

53908 Commits

Author SHA1 Message Date
David Green 22a2209433 [ARM] Reserve an emergency spill slot for fp16 addressing modes that need it
Similar to D67327, but this time for the FP16 VLDR and VSTR instructions that
use the AddrMode5FP16 addressing mode. We need to reserve an emergency spill
slot for instructions that will be out of range to use sp directly.
AddrMode5FP16 is 8 bits with a scale of 2.

Differential Revision: https://reviews.llvm.org/D67483

llvm-svn: 372132
2019-09-17 15:23:09 +00:00
Benjamin Kramer 167b302075 [RISCV] Unbreak the build
llvm-svn: 372127
2019-09-17 14:27:31 +00:00
Sam Parker 1d9ba08543 [ARM] Fix for buildbots
Remove setPreservesCFG from ARMConstantIslandPass and add a couple
of -verify-machine-dom-info instances into the existing codegen
tests.

llvm-svn: 372126
2019-09-17 14:21:36 +00:00
Luis Marques 6cf896b284 [RISCV][NFC] Use NoRegister instead of 0 literal
Summary: Trivial cleanup.

Reviewers: asb, lenary

Reviewed By: lenary

Subscribers: hiraditya, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, kito-cheng, shiva0217, jrtc27, MaskRay, zzheng, edward-jones, rogfer01, MartinMosbeck, brucehoult, the_o, rkruppe, PkmX, jocewei, psnobl, benna, Jim, s.egerton, pzheng, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D67526

llvm-svn: 372120
2019-09-17 13:34:17 +00:00
Simon Pilgrim f12a3da5a7 [X86] X86DAGToDAGISel::tryFoldLoad - assert root/parent pointers are non-null. NFCI.
Silences a static analyzer warning.

llvm-svn: 372118
2019-09-17 13:27:02 +00:00
David Green 1ff9553057 [ARM] Fix for MVE load/store stack accesses
MVE loads and stores have a 7 bit immediate range, scaled by the length of the type. This needs to be taught to the stack estimation code to ensure that an emergency spill slot is reserved in case we run out of registers when materialising stack indices.

Also the narrowing loads/stores can be created with frame indices even though they do not accept SP as a register. We need in those cases to make sure we have an emergency register to use as the frame base, as SP can never be used.

Differential Revision: https://reviews.llvm.org/D67327

llvm-svn: 372114
2019-09-17 12:58:51 +00:00
Benjamin Kramer df4b9a3f4f Hide implementation details in namespaces.
llvm-svn: 372113
2019-09-17 12:56:29 +00:00
Sam Parker 36c922278e [ARM][LowOverheadLoops] Add LR def safety check
Converting the *LoopStart pseudo instructions into DLS/WLS results in
LR being defined. These instructions were inserted on the assumption
that LR would already contain the loop counter because a mov is
introduced during ISel as the the consumers in the loop can only use
LR. That assumption proved wrong!

So perform a safety check, finding an appropriate place to insert the
DLS/WLS instructions or revert if this isn't possible.

Differential Revision: https://reviews.llvm.org/D67539

llvm-svn: 372111
2019-09-17 12:19:32 +00:00
Luis Marques 3d0fbafd0b [RISCV] Switch to the Machine Scheduler
Most of the test changes are trivial instruction reorderings and differing
register allocations, without any obvious performance impact.

Differential Revision: https://reviews.llvm.org/D66973

llvm-svn: 372106
2019-09-17 11:15:35 +00:00
Luis Marques 2d550d19b3 Revert Patch from Phabricator
This reverts r372092 (git commit e38695a025)

llvm-svn: 372104
2019-09-17 10:52:09 +00:00
Simon Pilgrim 0b10da7cc7 [X86] Use APInt::getLowBitsSet helper. NFCI.
Also avoids a static analyzer warning about out of range shifts.

llvm-svn: 372103
2019-09-17 10:51:30 +00:00
Graham Hunter 1a9195d817 [SVE][MVT] Fixed-length vector MVT ranges
* Reordered MVT simple types to group scalable vector types
    together.
  * New range functions in MachineValueType.h to only iterate over
    the fixed-length int/fp vector types.
  * Stopped backends which don't support scalable vector types from
    iterating over scalable types.

Reviewers: sdesmalen, greened

Reviewed By: greened

Differential Revision: https://reviews.llvm.org/D66339

llvm-svn: 372099
2019-09-17 10:19:23 +00:00
Luis Marques e38695a025 Patch from Phabricator
llvm-svn: 372092
2019-09-17 09:43:08 +00:00
Alexander Timofeev 6524a7a2b9 [AMDGPU]: PHI Elimination hooks added for custom COPY insertion. Fixed
Defferential Revision: https://reviews.llvm.org/D67101

Reviewers: rampitec, vpykhtin
llvm-svn: 372086
2019-09-17 09:08:58 +00:00
Sam Parker 95b28a4c72 [ARM] LE support in ConstantIslands
The low-overhead branch extension provides a loop-end 'LE' instruction
that performs no decrement nor compare, it just jumps backwards. This
patch modifies the constant islands pass to try to insert LE
instructions in place of a Thumb2 conditional branch, instead of
shrinking it. This only happens if a cmp can be converted to a cbn/z
and used to exit the loop.

Differential Revision: https://reviews.llvm.org/D67404

llvm-svn: 372085
2019-09-17 09:08:05 +00:00
Sam Parker 26a475afe5 [ARM][MVE] Add invalidForTailPredication to TSFlags
Set this bit for the MVE reduction instructions to prevent a loop from
becoming tail predicated in their presence.

Differential Revision: https://reviews.llvm.org/D67444

llvm-svn: 372076
2019-09-17 07:43:04 +00:00
Craig Topper 95aea74494 [X86] Split oversized vXi1 vector arguments and return values into scalars on avx512 targets.
Previously we tried to split them into narrower v64i1 or v16i1
pieces that each got promoted to vXi8 and then passed in a zmm
or xmm register. But this crashes when you need to pass more
pieces than available registers reserved for argument passing.

The scalarizing done here generates much longer and slower code,
but is consistent with the behavior of avx2 and earlier targets
for these types.

Fixes PR43323.

llvm-svn: 372069
2019-09-17 04:41:14 +00:00
Craig Topper 769dd59a27 [X86] Allow masked VBROADCAST instructions to be turned into BLENDM with a broadcast load to avoid a copy.
The BLENDM instructions allow an 2 sources and an independent
destination while masked VBROADCAST has the destination tied
to the source.

llvm-svn: 372068
2019-09-17 04:41:10 +00:00
Craig Topper 2cc57bedd5 [X86] Add support for commuting EVEX VCMP instructons with any immediate value.
Previously we limited to the EQ/NE/TRUE/FALSE/ORD/UNORD immediates.

llvm-svn: 372067
2019-09-17 04:41:05 +00:00
Craig Topper 359918dadf [X86] Enable commuting of EVEX VCMP for all immediate values during isel.
llvm-svn: 372065
2019-09-17 04:40:58 +00:00
Nemanja Ivanovic e63c676825 [PowerPC] Cust lower fpext v2f32 to v2f64 from extract_subvector v4f32
Add the missing piece of r372029.
Somehow when the patch for review D61961 was committed, only the test case
went in and the code didn't. This of course caused all kinds of build bot
breaks.
This patch just adds the code for that patch.

Author: Lei Huang
Differential revision: https://reviews.llvm.org/D61961

llvm-svn: 372043
2019-09-16 22:54:52 +00:00
Simon Pilgrim 3df0daddfd [X86][AVX] matchShuffleWithSHUFPD - add support for zeroable operands
Determine if all of the uses of LHS/RHS operands can be replaced with a zero vector.

llvm-svn: 372013
2019-09-16 17:30:33 +00:00
David Green 8d21460dc5 [ARM] A predicate cast of a predicate cast is a predicate cast
The adds some very basic folding of PREDICATE_CASTS, removing cases when they
are chained together. These would already be removed eventually, as these are
lowered to copies. This just allows it to happen earlier, which can help other
simplifications.

Differential Revision: https://reviews.llvm.org/D67591

llvm-svn: 372012
2019-09-16 17:29:07 +00:00
Oliver Cruickshank ee6fbebbaf [ARM] Add patterns for BSWAP intrinsic on MVE
BSWAP can use the VREV instruction on MVE to produce better results than
expanding.

llvm-svn: 372002
2019-09-16 15:20:10 +00:00
Oliver Cruickshank e9510a6cad [ARM] Add patterns for bitreverse intrinsic on MVE
BITREVERSE can use the VBRSR which will reverse and right shift.
Shifting right by 0 will just reverse the bits.

llvm-svn: 372001
2019-09-16 15:20:03 +00:00
Oliver Cruickshank 5f799ef162 [ARM] Lower CTTZ on MVE
Lower CTTZ on MVE using VBRSR and VCLS which will reverse the bits and
count the leading zeros, equivalent to a count trailing zeros (CTTZ).

llvm-svn: 372000
2019-09-16 15:19:56 +00:00
Oliver Cruickshank cd1a0b9271 [ARM] Add patterns for CTLZ on MVE
CTLZ intrinsic can use the VCLS instruction on MVE, which produces
better results than expanding.

llvm-svn: 371999
2019-09-16 15:19:49 +00:00
Jonas Paulsson b7dadc3562 [SystemZ] Call erase() on the right MBB in SystemZTargetLowering::emitSelect()
Since MBB was split *before* MI, the MI(s) will reside in JoinMBB (MBB) at
the point of erasing them, so calling StartMBB->erase() is actually wrong,
although it is "working" by all appearances.

Review: Ulrich Weigand
llvm-svn: 371995
2019-09-16 14:49:36 +00:00
Matt Arsenault fb51e64eac AMDGPU/GlobalISel: Fail select of G_INSERT non-32-bit source
This was producing an illegal copy which would hit an assert
later. Error on selection for now until this is implemented.

llvm-svn: 371993
2019-09-16 14:26:14 +00:00
Matt Arsenault 1fc07d6648 AMDGPU/GlobalISel: Fix RegBankSelect for G_FRINT and G_FCEIL
llvm-svn: 371991
2019-09-16 14:14:37 +00:00
Clement Courbet 44bfbcc28e [X86][NFC] Add a `use-aa` feature.
Summary:
This allows enabling useaa on the command-line and will allow enabling the
feature on a per-CPU basis where benchmarking shows improvements.

This is modelled after the ARM/AArch64 target.

Reviewers: RKSimon, andreadb, craig.topper

Subscribers: javed.absar, kristof.beyls, hiraditya, ychen, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D67266

llvm-svn: 371989
2019-09-16 14:05:28 +00:00
David Green ce7328cb61 [ARM] Fold VCMP into VPT
MVE has VPT instructions, which perform the duties of both a VCMP and a VPST in
a single instruction, performing the compare and starting the VPT block in one.
This teaches the MVEVPTBlockPass to fold them, searching back through the
basicblock for a valid VCMP and creating the VPT from its operands.

There are some changes to the VPT instructions to accommodate this, altering
the order of the operands to match the VCMP better, and changing P0 register
defs to be VPR defs, as is used in other places.

Differential Revision: https://reviews.llvm.org/D66577

llvm-svn: 371982
2019-09-16 13:02:41 +00:00
Kerry McLaughlin e55b3bf40e [SVE][Inline-Asm] Add constraints for SVE predicate registers
Summary:
Adds the following inline asm constraints for SVE:
  - Upl: One of the low eight SVE predicate registers, P0 to P7 inclusive
  - Upa: SVE predicate register with full range, P0 to P15

Reviewers: t.p.northover, sdesmalen, rovka, momchil.velikov, cameron.mcinally, greened, rengolin

Reviewed By: rovka

Subscribers: javed.absar, tschuett, rkruppe, psnobl, cfe-commits, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D66524

llvm-svn: 371967
2019-09-16 09:45:27 +00:00
Sjoerd Meijer b1e1a26e8e [AArch64] Some more FP16 FMA pattern matching
After our previous machinecombiner exercises (rL371321, rL371818, rL371833), we
were still missing a few FP16 FMA patterns.

Differential Revision: https://reviews.llvm.org/D67576

llvm-svn: 371960
2019-09-16 07:32:13 +00:00
Jonas Paulsson ca5acf5b5e [SystemZ] Merge the SystemZExpandPseudo pass into SystemZPostRewrite.
SystemZExpandPseudo:s only job was to expand LOCRMux instructions into jump
sequences. This needs to be done if expandLOCRPseudo() or expandSELRPseudo()
fails to find a legal opcode (all registers "high" or "low"). This task has
now been moved to SystemZPostRewrite while removing the SystemZExpandPseudo
pass.

It is in fact preferred to expand these pseudos directly after register
allocation in SystemZPostRewrite since the hinted register combinations are
then not subject to later optimizations.

Review: Ulrich Weigand
https://reviews.llvm.org/D67432

llvm-svn: 371959
2019-09-16 07:29:37 +00:00
Matt Arsenault bc8de8a8da AMDGPU/GlobalISel: Select SMRD loads for more types
llvm-svn: 371954
2019-09-16 00:54:07 +00:00
Matt Arsenault 48b158acae AMDGPU/GlobalISel: RegBankSelect for kill
llvm-svn: 371953
2019-09-16 00:48:37 +00:00
Matt Arsenault 01c7f40de3 AMDGPU/GlobalISel: Legalize s1 source G_[SU]ITOFP
llvm-svn: 371952
2019-09-16 00:37:10 +00:00
Matt Arsenault 60169ed613 AMDGPU/GlobalISel: Set type on vgpr live in special arguments
Fixes assertion with workitem ID intrinsics used in non-kernel
functions.

llvm-svn: 371951
2019-09-16 00:33:00 +00:00
Matt Arsenault 9f52c1ea58 AMDGPU/GlobalISel: Select S16->S32 fptoint
llvm-svn: 371950
2019-09-16 00:32:56 +00:00
Matt Arsenault 0a6123595f AMDGPU/GlobalISel: Select s32->s16 G_[US]ITOFP
llvm-svn: 371949
2019-09-16 00:29:12 +00:00
Matt Arsenault f5d5cd205e AMDGPU/GlobalISel: Fix VALU s16 fneg
llvm-svn: 371948
2019-09-16 00:20:54 +00:00
David Green b325c05732 [ARM] Masked loads and stores
Masked loads and store fit naturally with MVE, the instructions being easily
predicated. This adds lowering for the simple cases of masked loads and stores.
It does not yet deal with widening/narrowing or pre/post inc, and so is
currently behind an option.

The llvm masked load intrinsic will accept a "passthru" value, dictating the
values used for the zero masked lanes. In MVE the instructions write 0 to the
zero predicated lanes, so we need to match a passthru that isn't 0 (or undef)
with a select instruction to pull in the correct data after the load.

Differential Revision: https://reviews.llvm.org/D67186

llvm-svn: 371932
2019-09-15 14:14:47 +00:00
Thomas Lively ae530c5c80 [WebAssembly] Narrowing and widening SIMD ops
Summary:
Implements target-specific LLVM intrinsics and clang builtins for
these new SIMD operations, as described at https://github.com/WebAssembly/simd/blob/master/proposals/simd/SIMD.md#integer-to-integer-narrowing.

Reviewers: aheejin

Subscribers: dschuff, sbc100, jgravelle-google, hiraditya, sunfish, cfe-commits, llvm-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D67425

llvm-svn: 371906
2019-09-13 22:54:41 +00:00
Jessica Paquette 727328ab63 [AArch64][GlobalISel] Tail call memory intrinsics
Because memory intrinsics are handled differently than other calls, we need to
check them for tail call eligiblity in the legalizer. This allows us to still
inline them when it's beneficial to do so, but also tail call when possible.

This adds simple tail calling support for when the intrinsic is followed by a
return.

It ports the attribute checks from `TargetLowering::isInTailCallPosition` into
a similarly-named function in LegalizerHelper.cpp. The target-specific
`isUsedByReturnOnly` hook is not ported here.

Update tailcall-mem-intrinsics.ll to show that GlobalISel can now tail call
memory intrinsics.

Update legalize-memcpy-et-al.mir to have a case where we don't tail call.

Differential Revision: https://reviews.llvm.org/D67566

llvm-svn: 371893
2019-09-13 20:25:58 +00:00
Sebastian Pop d93e136be1 [aarch64] move custom isel of extract_vector_elt to td file - NFC
In preparation for def-pat selection of dot product instructions,
this patch moves the custom instruction selection of extract_vector_elt
to the td file. Without this change it is impossible to catch a pattern that
starts with an extract_vector_elt: the custom cpp code is executed first
ahead of the patterns in the td files that are only executed at the end of
the switch statement in SelectCode(Node).

With this patch applied, it becomes possible to select a different pattern
that starts with extract_vector_elt by selecting a higher complexity than
this pattern.

The patch has been tested on aarch64-linux with make check-all.

Differential Revision: https://reviews.llvm.org/D67497

llvm-svn: 371887
2019-09-13 19:28:30 +00:00
Tim Northover 52a89cc07d AArch64: fix EXPENSIVE_CHECKS for arm64_32.
For some reason I'd decided to mark the end-result of a GOT load as
dead. It's clearly not (necessarily).

llvm-svn: 371883
2019-09-13 18:55:38 +00:00
Alexander Timofeev 9ff70132bf Revert for: [AMDGPU]: PHI Elimination hooks added for custom COPY insertion.
llvm-svn: 371873
2019-09-13 17:37:30 +00:00
Jessica Paquette 14bfb56b1a [AArch64][GlobalISel] Add support for sibcalling callees with varargs
This adds support for tail calling callees with varargs, equivalent to how it
is done in AArch64ISelLowering.

This only works for sibling calls, and does not add the necessary support for
musttail with varargs. (See r345641 for equivalent ISelLowering support.) This
should be implemented when we stop falling back on musttail.

Update call-translator-tail-call.ll to show that we can now tail call varargs.

Differential Revision: https://reviews.llvm.org/D67518

llvm-svn: 371868
2019-09-13 16:10:19 +00:00
Craig Topper 8e0f104916 [X86] Use incDecVectorConstant to simplify the min/max code in LowerVSETCC.
incDecVectorConstant is used for a similar reason in LowerVSETCCWithSUBUS
so we might as well share the code.

llvm-svn: 371861
2019-09-13 14:59:08 +00:00