Commit Graph

52710 Commits

Author SHA1 Message Date
Matt Arsenault 40c08052a5 AMDGPU: Correct properties for adjcallstack* pseudos
These should be SALU writes, and these are lowered to instructions
that def SCC.

llvm-svn: 364859
2019-07-01 22:01:05 +00:00
Craig Topper 3f722d40c5 [X86] Use v4i32 vzloads instead of v2i64 for vpmovzx/vpmovsx patterns where only 32-bits are loaded.
v2i64 vzload defines a 64-bit memory access. It doesn't look like
we have any coverage for this either way.

Also remove some vzload usages where the instruction loads only
16-bits.

llvm-svn: 364851
2019-07-01 21:25:11 +00:00
Simon Atanasyan fa27500676 [mips] Add missing schedinfo for MIPSeh_return[32|64] instructions
llvm-svn: 364850
2019-07-01 21:25:04 +00:00
Simon Atanasyan 29801f7851 [mips] Add virtualization ASE to P5600 scheduling definitions
llvm-svn: 364849
2019-07-01 21:24:58 +00:00
Simon Atanasyan 574d0a61bd [mips] Add missing schedinfo for LONG_BRANCH_* instructions
llvm-svn: 364848
2019-07-01 21:24:51 +00:00
Craig Topper 328b24150e [X86] Remove several bad load folding isel patterns for VPMOVZX/VPMOVSX.
These patterns all matched a v2i64 vzload which only loads 64-bits
to instructions that load a full 128-bits.

llvm-svn: 364847
2019-07-01 21:23:38 +00:00
Craig Topper 5e7815b695 [X86] Correct v4f32->v2i64 cvt(t)ps2(u)qq memory isel patterns
These instructions only read 64-bits of memory so we shouldn't
allow a full vector width load to be pattern matched in case it
is marked volatile.

Instead allow vzload or scalar_to_vector+load.

Also add a DAG combine to turn full vector loads into vzload when
used by one of these instructions if the load isn't volatile.

This fixes another case for PR42079

llvm-svn: 364838
2019-07-01 19:01:37 +00:00
Matt Arsenault bae3636f96 AMDGPU/GlobalISel: Handle more input argument intrinsics
llvm-svn: 364836
2019-07-01 18:50:50 +00:00
Matt Arsenault 9e8e8c60fa AMDGPU/GlobalISel: Lower kernarg segment ptr intrinsics
llvm-svn: 364835
2019-07-01 18:49:01 +00:00
Matt Arsenault 756d81905f AMDGPU/GlobalISel: Legalize workgroup ID intrinsics
llvm-svn: 364834
2019-07-01 18:47:22 +00:00
Matt Arsenault e2c86cce3a AMDGPU/GlobalISel: Legalize workitem ID intrinsics
Tests don't cover the masked input path since non-kernel arguments
aren't lowered yet.

Test is copied directly from the existing test, with 2 additions.

llvm-svn: 364833
2019-07-01 18:45:36 +00:00
Matt Arsenault e15770aec4 AMDGPU/GlobalISel: Custom lower control flow intrinsics
Replace the brcond for the 2 cases that act as branches. For now
follow how the current system works, although I think we can
eventually get rid of the pseudos.

llvm-svn: 364832
2019-07-01 18:40:23 +00:00
Matt Arsenault 4073b33786 AMDGPU/GlobalISel: Handle 16-bit SALU min/max
This needs to be extended to s32, and expanded into cmp+select.  This
is relying on the fact that widenScalar happens to leave the
instruction in place, but this isn't a guaranteed property of
LegalizerHelper.

llvm-svn: 364831
2019-07-01 18:33:37 +00:00
Matt Arsenault 5a7d5111e5 AMDGPU/GlobalISel: Lower SALU min/max to cmp+select
Use a change observer to apply a register bank to the newly created
intermediate result register.

llvm-svn: 364830
2019-07-01 18:30:45 +00:00
Robert Lougher e20030f612 [X86] Avoid SFB - Fix inconsistent codegen with/without debug info(2)
The function findPotentialBlockers may consider debug info instructions as
potential blockers and may stop searching for a store-load pair prematurely.

This patch corrects this and tests the cases where the store is separated
from the load by more than InspectionLimit debug instructions.

Patch by Chris Dawson.

Differential Revision: https://reviews.llvm.org/D62408

llvm-svn: 364829
2019-07-01 18:28:21 +00:00
Matt Arsenault ef59cb6982 AMDGPU/GlobalISel: Legalize s16 add/sub/mul
If this is scalar, promote to s32. Use a new observer class to assign
the register bank of newly created registers.

llvm-svn: 364827
2019-07-01 18:18:55 +00:00
Matt Arsenault 9470bb262b AMDGPU/GlobalISel: Fix allowing non-boolean conditions for G_SELECT
The condition register bank must be scc or vcc so that a copy will be
inserted, which will be lowered to a compare.

Currently greedy unnecessarily forces using a VCC select.

llvm-svn: 364825
2019-07-01 18:13:12 +00:00
Matt Arsenault b2ea20eedd AMDGPU/GlobalISel: RegBankSelect for sendmsg/sendmsghalt
llvm-svn: 364819
2019-07-01 17:40:18 +00:00
Matt Arsenault 40d1faf38f AMDGPU/GlobalISel: Legalize s16 fcmp
llvm-svn: 364817
2019-07-01 17:35:53 +00:00
Nicolai Haehnle 10c911db63 AMDGPU/GFX10: implement ds_ordered_count changes
Summary:
ds_ordered_count can now simultaneously operate on up to 4 dwords
in a single instruction, which are taken from (and returned to)
lanes 0..3 of a single VGPR.

Change-Id: I19b6e7b0732b617c10a779a7f9c0303eec7dd276

Reviewers: mareko, arsenm, rampitec

Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D63716

llvm-svn: 364815
2019-07-01 17:17:52 +00:00
Nicolai Haehnle 4dc3b2bf95 AMDGPU: Support GDS atomics
Summary:
Original patch by Marek Olšák

Change-Id: Ia97d5d685a63a377d86e82942436d1fe6e429bab

Reviewers: mareko, arsenm, rampitec

Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, jfb, Petar.Avramovic, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D63452

llvm-svn: 364814
2019-07-01 17:17:45 +00:00
Matt Arsenault 1094e6a814 AMDGPU/GlobalISel: RegBankSelect for DS ordered add/swap
llvm-svn: 364811
2019-07-01 17:04:57 +00:00
Matt Arsenault 732149b24e AArch64/GlobalISel: Fix trying to select invalid MIR
Physical registers are not allowed to be a phi operand.

llvm-svn: 364810
2019-07-01 17:02:24 +00:00
Matt Arsenault 265059eaf6 AMDGPU/GlobalISel: RegBankSelect for amdgcn.writelane
llvm-svn: 364808
2019-07-01 16:41:36 +00:00
Matt Arsenault a310727830 AMDGPU/GlobalISel: Fail instead of assert when selecting loads
llvm-svn: 364807
2019-07-01 16:36:39 +00:00
Matt Arsenault 0a52e9d026 AMDGPU/GlobalISel: Complete implementation of G_GEP
Also works around tablegen defect in selecting add with unused carry,
but if we have to manually select GEP, might as well handle add
manually.

llvm-svn: 364806
2019-07-01 16:34:48 +00:00
Matt Arsenault e1006259d8 AMDGPU/GlobalISel: Select G_PHI
llvm-svn: 364805
2019-07-01 16:32:47 +00:00
Matt Arsenault d810ff2588 AMDGPU/GlobalISel: Try to select VOP3 form of add
There are several things broken, but at least emit the right thing for
gfx9.

The import of the pattern with the unused carry out seems to not
work. Needs a special class for clamp, because OperandWithDefaultOps
doesn't really work.

llvm-svn: 364804
2019-07-01 16:27:32 +00:00
Simon Pilgrim e3e38cce4a [X86] Add widenSubVector to size in bits helper. NFCI.
We can already widenSubVector to a specific type (of the same scalar type) - this variant just specifies the target vector size.

This will be useful when CombineShuffleWithExtract relaxes the need to have the same scalar type for all shuffle operand subvector sources.

llvm-svn: 364803
2019-07-01 16:20:47 +00:00
Matt Arsenault 62d64b0c30 AMDGPU/GlobalISel: RegBankSelect for readlane/readfirstlane
llvm-svn: 364801
2019-07-01 16:19:39 +00:00
Tom Stellard 9e9dd30de3 AMDGPU/GlobalISel: Implement select for 32-bit G_ADD
Reviewers: arsenm

Reviewed By: arsenm

Subscribers: hiraditya, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, rovka, kristof.beyls, dstuttard, tpr, t-tye, Petar.Avramovic, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D58804

llvm-svn: 364797
2019-07-01 16:09:33 +00:00
Mikhail Maltsev 8b2e304bc5 [ARM] Fix MVE_VQxDMLxDH instruction class
Summary:
According to the ARMARM, the VQDMLADH, VQRDMLADH, VQDMLSDH and
VQRDMLSDH instructions handle their results as follows: "The base
variant writes the results into the lower element of each pair of
elements in the destination register, whereas the exchange variant
writes to the upper element in each pair". I.e., the initial content
of the output register affects the result, as usual, we model this
with an additional input.

Also, for 32-bit variants Qd is not allowed to be the same register as
Qm and Qn, we use @earlyclobber to indicate this.

This patch also changes vpred_r to vpred_n because the instructions
don't have an explicit 'inactive' operand.

Reviewers: dmgreen, ostannard, simon_tatham

Reviewed By: simon_tatham

Subscribers: javed.absar, kristof.beyls, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D64007

llvm-svn: 364796
2019-07-01 16:07:58 +00:00
Matt Arsenault 2ab25f9ceb AMDGPU/GlobalISel: Select G_BRCOND for vcc
llvm-svn: 364795
2019-07-01 16:06:02 +00:00
Mikhail Maltsev 4a9e3f15bb [ARM] MVE: support QQPRRegClass and QQQQPRRegClass
Summary:
QQPRRegClass and QQQQPRRegClass are used by the
interleaving/deinterleaving loads/stores to represent sequences of
consecutive SIMD registers.

Reviewers: ostannard, simon_tatham, dmgreen

Reviewed By: simon_tatham

Subscribers: javed.absar, kristof.beyls, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D64009

llvm-svn: 364794
2019-07-01 16:05:23 +00:00
Krzysztof Parzyszek 5abf80cdfa [Hexagon] Custom-lower UADDO(x, 1) and USUBO(x, 1)
llvm-svn: 364790
2019-07-01 15:50:09 +00:00
Matt Arsenault cda82f0bb6 AMDGPU/GlobalISel: Select G_FRAME_INDEX
llvm-svn: 364789
2019-07-01 15:48:18 +00:00
Nicolai Haehnle 7cfd99ab15 AMDGPU/GFX10: fix scratch resource descriptor
Summary:
The stride should depend on the wave size, not the hardware generation.

Also, the 32_FLOAT format is 0x16, not 16; though that shouldn't be
relevant.

Change-Id: I088f93bf6708974d085d1c50967f119061da6dc6

Reviewers: arsenm, rampitec, mareko

Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D63808

llvm-svn: 364788
2019-07-01 15:43:00 +00:00
Matt Arsenault fdf36729c7 AMDGPU/GlobalISel: Make s16 select legal
This is easy to handle and avoids legalization artifacts which are
likely to obscure combines.

llvm-svn: 364787
2019-07-01 15:42:47 +00:00
Matt Arsenault 6464280eb0 AMDGPU/GlobalISel: Select G_BRCOND for scc conditions
llvm-svn: 364786
2019-07-01 15:39:27 +00:00
Matt Arsenault 1daad91af6 AMDGPU/GlobalISel: Tolerate copies with no type set
isVCC has the same bug, but isn't used in a context where it can cause
a problem.

llvm-svn: 364784
2019-07-01 15:23:04 +00:00
Matt Arsenault 4f64ade04c AMDGPU/GlobalISel: Select src modifiers
llvm-svn: 364782
2019-07-01 15:18:56 +00:00
Krzysztof Parzyszek 511ad50db4 [Hexagon] Rework VLCR algorithm
Add code to catch pattern for commutative instructions for VLCR.

Patch by Suyog Sarda.

llvm-svn: 364770
2019-07-01 13:50:47 +00:00
Matt Arsenault 1b317685e9 AMDGPU: Convert some places to Register
llvm-svn: 364769
2019-07-01 13:44:46 +00:00
Matt Arsenault 5bf850d52e AMDGPU/GlobalISel: Fix RegBankSelect for G_FCANONICALIZE
llvm-svn: 364768
2019-07-01 13:40:18 +00:00
Matt Arsenault b5fc94f3e7 AMDGPU/GlobalISel: Fix RegBankSelect for G_BUILD_VECTOR
llvm-svn: 364767
2019-07-01 13:40:17 +00:00
Matt Arsenault 89fc8bcdd6 AMDGPU/GlobalISel: Fail on store to 32-bit address space
llvm-svn: 364766
2019-07-01 13:37:39 +00:00
Matt Arsenault 3b7668ae4b AMDGPU/GlobalISel: Improve icmp selection coverage.
Select s64 eq/ne scalar icmp.

llvm-svn: 364765
2019-07-01 13:34:26 +00:00
Matt Arsenault c23149f612 AMDGPU/GlobalISel: RegBankSelect for WWM/WQM
llvm-svn: 364763
2019-07-01 13:30:12 +00:00
Matt Arsenault facf69e844 AMDGPU/GlobalISel: Use vcc reg bank for amdgcn.wqm.vote
llvm-svn: 364762
2019-07-01 13:30:09 +00:00
Matt Arsenault 9f992c238a AMDGPU/GlobalISel: Fix scc->vcc copy handling
This was checking the size of the register with the value of the size,
which happens to be exec. Also fix assuming VCC is 64-bit to fix
wave32.

Also remove some untested handling for physical registers which is
skipped. This doesn't insert the V_CNDMASK_B32 if SCC is the physical
copy source. I'm not sure if this should be trying to handle this
special case instead of dealing with this in copyPhysReg.

llvm-svn: 364761
2019-07-01 13:22:07 +00:00