Commit Graph

279 Commits

Author SHA1 Message Date
Matt Arsenault a9ee0589a8 AMDGPU/GlobalISel: Match global saddr addressing mode 2020-08-17 15:48:06 -04:00
Matt Arsenault c8a9872259 AMDGPU/GlobalISel: Look through copies in getPtrBaseWithConstantOffset
We may have an SGPR->VGPR copy if a totally uniform pointer
calculation is used for a VGPR pointer operand.

Also hack around a bug in MUBUF matching which would incorrectly use
MUBUF for global when flat was requested. This should really be a
predicate on the parent pattern, but the DAG always checked this
manually inside the complex pattern.
2020-08-17 12:31:38 -04:00
Matt Arsenault 625db2fe5b AMDGPU: Remove slc from flat offset complex patterns
This was always set to 0. Use a default value of 0 in this context to
satisfy the instruction definition patterns. We can't unconditionally
use SLC with a default value of 0 due to limitations in TableGen's
handling of defaulted operands when followed by non-default operands.
2020-08-15 12:12:24 -04:00
Matt Arsenault 0dc4c36d3a AMDGPU/GlobalISel: Manually select llvm.amdgcn.writelane
Fixup the special case constant bus handling pre-gfx10.
2020-08-11 11:56:16 -04:00
Matt Arsenault 40188f807d AMDGPU/GlobalISel: Don't try to handle undef source operand
This is now illegal MIR
2020-08-10 08:49:43 -04:00
Matt Arsenault a0ec81f70d AMDGPU/GlobalISel: Merge load/store select cases 2020-08-10 08:46:26 -04:00
Matt Arsenault 63cdc9a49f AMDGPU/GlobalISel: Handle llvm.amdgcn.ds.{fadd|fmin|fmax}
These intrinsics are missing mangling for both the pointer and data
type.
2020-08-06 11:09:08 -04:00
Matt Arsenault dcf3ffb0a8 AMDGPU/GlobalISel: Move frame index selection to patterns
Doesn't really save any code until global value is handled too.
2020-08-06 10:42:15 -04:00
Matt Arsenault d188a608bd AMDGPU: Fix code duplication between the selectors
Not sure this is the right place for this helper.
2020-08-06 10:42:15 -04:00
Matt Arsenault 5316256709 AMDGPU/GlobalISel: Fix assert on copy to vcc
This was trying to constrain a physical register. By the verifier's
understanding, it's impossible to have a 1-bit copy to vcc/vcc_lo so
don't try to handle physregs.
2020-08-06 09:41:14 -04:00
Matt Arsenault 486e84dfa4 AMDGPU/GlobalISel: Use live in helper function for returnaddress 2020-08-04 17:36:01 -04:00
Matt Arsenault 89011fc3c9 AMDGPU/GlobalISel: Select llvm.returnaddress 2020-08-04 17:14:38 -04:00
Matt Arsenault 0de547ed4a AMDGPU/GlobalISel: Ensure subreg is valid when selecting G_UNMERGE_VALUES
Fixes verifier error with SGPR unmerges with 96-bit result types.
2020-08-04 12:27:34 -04:00
Matt Arsenault 2414bab5d7 AMDGPU/GlobalISel: Remove old hacks for boolean selection
There were various hacks used to try to avoid making s1 SGPR vs. s1
VCC ambiguous after constraining the register before we had a strategy
to deal with this. This also attempted to handle undef operands, which
are now illegal gMIR.
2020-08-03 09:04:14 -04:00
Matt Arsenault d8ef1d1251 AMDGPU/GlobalISel: Fix selecting broken copies for s32->s64 anyext
These should probably not be legal in the first place, but that might
also be a pain.
2020-08-03 08:36:41 -04:00
Stanislav Mekhanoshin 5b32518f96 [AMDGPU] Do not use undef on indirect source
We are using undef on the indirect move source subreg and then
using implicit super-reg. This creates a problem in RA when
Greedy decides to split the register. It reassigns the implicit
super-reg but does not bother to change undef source because
it is really does not matter. The fix is to stop lying to RA and
drop undef flag.

This has also hit a problem in SIFoldOperands as it can fold
immediate into an indirect move since there is no undef flag
anymore. That results in multiple test failures, so added the
check for this case.

Differential Revision: https://reviews.llvm.org/D84899
2020-07-30 10:41:59 -07:00
Matt Arsenault 59fac51ff2 AMDGPU/GlobalISel: Handle llvm.amdgcn.reloc.constant 2020-07-29 14:24:21 -04:00
Matt Arsenault bb23b5cfe0 AMDGPU/GlobalISel: Merge identical select cases 2020-07-28 11:42:17 -04:00
Matt Arsenault d35e2c101d AMDGPU/GlobalISel: Fix not constraining ds_append/consume operands 2020-07-26 10:17:36 -04:00
Matt Arsenault 5819159995 AMDGPU/GlobalISel: Pack constant G_BUILD_VECTOR_TRUNCs when selecting 2020-07-26 09:55:34 -04:00
Matt Arsenault 4033aa1467 AMDGPU/GlobalISel: Sign extend integer constants
This matches the DAG behavior and fixes immediate folding
2020-07-26 09:30:14 -04:00
Matt Arsenault 392b969c32 AMDGPU/GlobalISel: Don't assert on G_INSERT > 128-bits
Just fallback for now. Really tablegen needs to generate all of the
subregister index handling we need.
2020-07-25 10:05:44 -04:00
Matt Arsenault 21ef01b7e3 AMDGPU: Remove outdated fixme 2020-07-20 11:41:41 -04:00
Matt Arsenault 79f67cae91 AMDGPU: Rename add/sub with carry out instructions
The hardware has created a real mess in the naming for add/sub, which
have been renamed basically every generation. Switch the carry out
pseudos to have the gfx9/gfx10 names. We were using the original SI/CI
v_add_i32/v_sub_i32 names. Later targets reintroduced these names as
carryless instructions with a saturating clamp bit, which we do not
define. Do this rename so we can unambiguously add these missing
instructions.

The carry-in versions should also be renamed, but at least those had a
consistent _u32 name to begin with. The 16-bit instructions were also
renamed, but aren't ambiguous.

This does regress assembler error message quality in some cases. In
mismatched wave32/wave64 situations, this will switch from
"unsupported instruction" to "invalid operand", with the error
pointing at the wrong position. I couldn't quite follow how the
assembler selects these, but the previous behavior seemed accidental
to me. It looked like there was a partial attempt to handle this which
was never completed (i.e. there is an AMDGPUOperand::isBoolReg but it
isn't used for anything).
2020-07-16 13:16:30 -04:00
Petar Avramovic 5658002b80 AMDGPU/GlobalISel: Select G_FREEZE
Select G_FREEZE in the same way that COPY is selected.

Differential Revision: https://reviews.llvm.org/D83031
2020-07-16 11:10:48 +02:00
Mirko Brkusanin 38998cfa9c [AMDGPU][GlobalISel] Fix subregister index for EXEC register in selectBallot.
Temporarily remove subregister for EXEC in selectBallot added in
https://reviews.llvm.org/D83214 to fix failures on expensive checks buildbot.
2020-07-13 13:35:34 +02:00
Mirko Brkusanin ce23e54162 [AMDGPU][GlobalISel] Select llvm.amdgcn.ballot
Select ballot intrinsic for GlobalISel.

Differential Revision: https://reviews.llvm.org/D83214
2020-07-13 12:14:43 +02:00
Petar Avramovic d717382633 AMDGPU/GlobalISel: Select icmp intrinsic
Select into corresponding V_CMP instruction based on CmpInst predicate,
stored as immediate, in last operand.

Differential Revision: https://reviews.llvm.org/D82652
2020-06-30 10:57:41 +02:00
Matt Arsenault fb51d508ee AMDGPU/GlobalISel: Select general case for G_PTRMASK 2020-06-14 13:12:29 -04:00
Sebastian Neubauer 29a6ad94fd [AMDGPU] Add G16 support to image instructions
Add G16 feature for GFX10 and support A16 and G16 in GlobalISel.

Differential Revision: https://reviews.llvm.org/D76836
2020-06-12 11:26:31 +02:00
Matt Arsenault bb10fa3a53 AMDGPU: Fix wrong null value for private address space
I'm guessing this was a holdover from when 0 was an invalid stack
pointer, but surprised nobody has discovered this before.

Also don't allow offset folding for -1 pointers, since it looks weird
to partially fold this.
2020-05-26 16:35:13 -04:00
Matt Arsenault 50d4b22ca0 AMDGPU/GlobalISel: Fix assert on 16-bit G_EXTRACT results
I consider this to be a hack, since we probably should not mark any
16-bit extract as legal, and require all extracts to be done on
multiples of 32. There are quite a few more battles to fight in the
legalizer for sub-dword vectors, so just select this for now so we can
pass OpenCL conformance without crashing.

Also fix the same assert for G_INSERTs. Unlike G_EXTRACT there's not a
trivial way to select this so just fail on it.
2020-05-26 12:14:08 -04:00
Matt Arsenault 8bc03d2168 GlobalISel: Merge G_PTR_MASK with llvm.ptrmask intrinsic
Confusingly, these were unrelated and had different semantics. The
G_PTR_MASK instruction predates the llvm.ptrmask intrinsic, but has a
different format. G_PTR_MASK only allows clearing the low bits of a
pointer, and only a constant number of bits. The ptrmask intrinsic
allows an arbitrary mask. Replace G_PTR_MASK to match the intrinsic.

Only selects the cases that look like the old instruction. More work
is needed to select the general case. Also new legalization code is
still needed to deal with the case where the incoming mask size does
not match the pointer size, which has a specified behavior in the
langref.
2020-05-26 11:48:13 -04:00
Matt Arsenault 2dd7714b8d AMDGPU/GlobalISel: Don't select boolean phi by default
This is currently missing most of the hard parts to lower correctly,
so disable it for now. This fixes at least one OpenCL conformance test
and allows it to pass with fallback. Hide this behind an option for
now.
2020-05-26 11:01:21 -04:00
Matt Arsenault 7dece2fde3 AMDGPU: Use Register 2020-04-21 15:19:35 -04:00
Jay Foad 858d8db470 AMDGPU/GlobalISel: Work around another selector crash
This does for G_EXTRACT_VECTOR_ELT what 588bd7be36 did for G_TRUNC.

Ideally types without a corresponding register class wouldn't reach
here, but we're currently missing some (in particular a 192-bit class
is missing).
2020-04-17 12:07:54 +01:00
Matt Arsenault 588bd7be36 AMDGPU/GlobalISel: Work around a selector crash
Ideally types without a corresponding register class wouldn't reach
here, but we're currently missing some (in particular a 192-bit class
is missing).
2020-04-15 14:38:50 -04:00
Matt Arsenault cc149172da AMDGPU/GlobalISel: Fix selection of scalar f64 G_FABS
This wasn't covered by existing tablegen patterns, but also suffers
the same issues as G_FNEG. Workaround them by manually selecting, like
G_FNEG.
2020-04-14 22:05:22 -04:00
Matt Arsenault 8a5f0dafd4 AMDGPU/GlobalISel: Select llvm.amdgcn.div.scale 2020-04-06 11:50:19 -04:00
Austin Kerbow 30f18ed387 [AMDGPU] Handle SMRD signed offset immediate
Summary:
This fixes a few issues related to SMRD offsets. On gfx9 and gfx10 we have a
signed byte offset immediate, however we can overflow into a negative since we
treat it as unsigned.

Also, the SMRD SOFFSET sgpr is an unsigned offset on all subtargets. We
sometimes tried to use negative values here.

Third, S_BUFFER instructions should never use a signed offset immediate.

Differential Revision: https://reviews.llvm.org/D77082
2020-04-02 17:41:52 -07:00
Simon Pilgrim be7a233e93 Fix operator precedence warning. NFCI. 2020-04-01 14:36:52 +01:00
Matt Arsenault d0dd24a381 AMDGPU/GlobalISel: Fix crashing on weird G_INSERT sources
No test since these cases shouldn't really be getting through the
legalizer.
2020-03-30 18:14:04 -04:00
Matt Arsenault bcb643c8af AMDGPU/GlobalISel: Handle image atomics 2020-03-30 17:41:04 -04:00
Matt Arsenault 48eda37282 AMDGPU/GlobalISel: Start selecting image intrinsics
Does not handled atomics yet.
2020-03-30 17:33:04 -04:00
Scott Linder 60b1967c39 [AMDGPU] Add Scratch Wave Offset to Scratch Buffer Descriptor in entry functions
Add the scratch wave offset to the scratch buffer descriptor (SRSrc) in
the entry function prologue. This allows us to removes the scratch wave
offset register from the calling convention ABI.

As part of this change, allow the use of an inline constant zero for the
SOffset of MUBUF instructions accessing the stack in entry functions
when a frame pointer is not requested/required. Entry functions with
calls still need to set up the calling convention ABI stack pointer
register, and reference it in order to address arguments of called
functions. The ABI stack pointer register remains unswizzled, but is now
wave-relative instead of queue-relative.

Non-entry functions also use an inline constant zero SOffset for
wave-relative scratch access, but continue to use the stack and frame
pointers as before. When the stack or frame pointer is converted to a
swizzled offset it is now scaled directly, as the scratch wave offset no
longer needs to be subtracted first.

Update llvm/docs/AMDGPUUsage.rst to reflect these changes to the calling
convention.

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D75138
2020-03-19 15:35:16 -04:00
Matt Arsenault ed72bcae34 AMDGPU/GlobalISel: Fix mishandling SGPR v2s16 add/sub/mul
We weren't considering the packed case correctly, and this was passing
through to the selector. The selector only checked the size, so this
would incorrectly compile to a single 32-bit scalar add.

As usual, the LegalizerHelper is somewhat awkward to use from
applyMappingImpl. I think this is the first place we've needed
multi-step legalization here though.
2020-03-09 22:51:54 -04:00
Matt Arsenault 15bf916b54 AMDGPU: Remove VOP3OpSelMods0 complex pattern
Use default operand of 0 instead.
2020-03-04 17:18:22 -05:00
Matt Arsenault 0b46b078b6 AMDGPU/GlobalISel: Fix incorrect VOP3P fneg folding
We use some s32 values in VOP3P operands, and won't see any
intervening casts from a 32-bit fneg. Make sure it's really a packed
fneg before folding.
2020-02-24 21:20:35 -05:00
Matt Arsenault bf4933b4ea AMDGPU/GlobalISel: Remove dead code 2020-02-21 19:19:32 -05:00
Jay Foad b72f1448ce AMDGPU/GlobalISel: Better code for one case of G_SHUFFLE_VECTOR on v2i16
Reviewers: arsenm

Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, rovka, dstuttard, tpr, t-tye, hiraditya, kerbowa, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D74987
2020-02-21 21:16:39 +00:00
Matt Arsenault dfce5fd50a AMDGPU/GlobalISel: Select VOP3P instructions
This only handles the basic cases. More work is needed to make better
use of op_sel.
2020-02-21 13:35:40 -05:00
Matt Arsenault 72eef820d5 AMDGPU/GlobalISel: Select G_SHUFFLE_VECTOR
G_SHUFFLE_VECTOR is legal since it theoretically may help match op_sel
for VOP3P instructions. Expand it in some other way in case it doesn't
fold into the use instructions.
2020-02-21 13:35:40 -05:00
Matt Arsenault 043ed2e22a AMDGPU/GlobalISel: Fix xnor matching
We should try the generated matchers before the manual selection. This
means the patterns are now handling the common cases, but the manual
selection code is not yet dead. It's still handling the non-s32/s64
cases (like v2s16 and v2s32). Currently tablegen doesn't have a nice
way to have a single pattern that covers multiple types.
2020-02-21 11:42:49 -05:00
Matt Arsenault ac7abe0ba9 AMDGPU/GlobalISel: Manually select G_BUILD_VECTOR_TRUNC
We have patterns for s_pack* selection, but they assume the inputs are
a build_vector with 16-bit inputs, not a truncating build
vector. Since there's still outstanding work for how to handle
mismatched result and source element vector operations, and since I'm
trying a different packed vector strategy than SelectionDAG, just
manually select this for now.
2020-02-21 10:34:11 -05:00
Matt Arsenault b64aa8c715 AMDGPU/GlobalISel: Fix constant bus violation with source modifiers
This looked through copies to find the source modifiers, which may
have been SGPR->VGPR copies added to avoid potential constant bus
violations. Re-insert a copy to a VGPR if this happens.
2020-02-21 10:30:23 -05:00
Matt Arsenault ff4639f060 AMDGPU/GlobalISel: Select MUBUF path for global atomic cmpxchg
I'm not sure why this isn't a pattern, but the DAG manually selects
this.
2020-02-19 06:19:22 -08:00
Matt Arsenault 86813e2768 AMDGPU/GlobalISel: Select llvm.amdgcn.s.buffer.load
Doesn't try to fail on the dlc bit pre-gfx10 like the DAG lowering
does.
2020-02-17 08:02:40 -08:00
Matt Arsenault e5805529bf AMDGPU/GlobalISel: Select v2s32->v2s16 G_TRUNC
It would be nice if there was a way to avoid the tied operand, but as
far as I can tell there isn't a way to use or with op_sel to achieve
this
2020-02-17 09:20:13 -05:00
Matt Arsenault 8d8d46b57a AMDGPU/GlobalISel: Fix missing impdef of scc on boolean bit ops 2020-02-14 22:35:30 -05:00
Matt Arsenault dc3e499dd4 AMDGPU/GlobalISel: Fix G_EXTRACT of 96-bit results
This would assert on an unhandled size in getRegSplitParts.
2020-02-14 15:57:40 -08:00
Austin Kerbow 3a312c3ee5 [AMDGPU][GlobalISel] Refactor selectDS1Addr1Offset/selectDS64Bit4ByteAligned
Differential Revision: https://reviews.llvm.org/D74261
2020-02-11 16:57:13 -08:00
Stanislav Mekhanoshin 453a8f3af7 [AMDGPU] Remove AMDGPURegisterInfo
R600 and GCN do not have anything in common in terms of register
file organization anymore.

Differential Revision: https://reviews.llvm.org/D74426
2020-02-11 11:13:38 -08:00
Matt Arsenault 2126c70e3a AMDGPU/GlobalISel: Don't mis-select vector index on a constant
Vector indexing with a constant index should be folded out in the
legalizer, but this was accidentally falling through. This would
produce the indexing operation with $noreg. Handle this case as a
dynamic index just in case a bug like this happens again in the
future.
2020-02-09 18:02:37 -05:00
Matt Arsenault 12fe9b26ec AMDGPU/GlobalISel: Select G_SEXT_INREG 2020-02-04 13:23:53 -08:00
Matt Arsenault 49e424e08e AMDGPU/GlobalISel: Select global MUBUF atomicrmw 2020-01-31 06:05:41 -08:00
Matt Arsenault 0426c2d07d Reapply "AMDGPU: Cleanup and fix SMRD offset handling"
This reverts commit 6a4acb9d80.
2020-01-31 06:01:28 -08:00
Matt Arsenault 6a4acb9d80 Revert "AMDGPU: Cleanup and fix SMRD offset handling"
This reverts commit 17dbc6611d.

A test is failing on some bots
2020-01-30 15:39:51 -08:00
Matt Arsenault 17dbc6611d AMDGPU: Cleanup and fix SMRD offset handling
I believe this also fixes bugs with CI 32-bit handling, which was
incorrectly skipping offsets that look like signed 32-bit values. Also
validate the offsets are dword aligned before folding.
2020-01-30 15:04:21 -08:00
Matt Arsenault d6b83d6ba5 AMDGPU/GlobalISel: Don't use pointless getConstantVRegVal
This is always a G_CONSTANT already
2020-01-30 09:38:43 -05:00
Austin Kerbow 2605adb69c [AMDGPU][GlobalISel] Select 8-byte LDS Ops with 4-byte alignment
Reviewers: arsenm

Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, rovka, dstuttard, tpr, t-tye, hiraditya, Petar.Avramovic, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D73585
2020-01-29 10:42:12 -08:00
Matt Arsenault 96352e0a1b AMDGPU/GlobalISel: Handle LDS with relocations case 2020-01-29 08:18:55 -08:00
Matt Arsenault 94e8ef4d4c AMDGPU/GlobalISel: Look through copies for source modifiers
When all VOP instructions are legalized to VGPRs, any SGPR source
modifiers will have a copy in the way.
2020-01-29 08:08:13 -08:00
Matt Arsenault 02adfb5155 AMDGPU/GlobalISel: Manually select scalar f64 G_FNEG
This should be no problem to support with a pattern, but it turns out
there are just too many yaks to shave. The main problem is in the DAG
emitter, which I have no desire to sink effort into fixing.

If we had a bit to disable patterns in the DAG importer, fixing the
GlobalISelEmitter is more manageable.
2020-01-29 06:49:16 -08:00
Matt Arsenault d2a9739274 AMDGPU/GlobalISel: Eliminate SelectVOP3Mods_f32
Trivial type predicates should be moved into the tablegen pattern
itself, and not checked inside complex patterns. This eliminates a
redundant complex pattern, and fixes select source modifiers for
GlobalISel.

I have further patches which fully handle select in tablegen and
remove all of the C++ selection, although it requires the ugliness to
support the entire range of legal register types.
2020-01-27 17:53:54 -05:00
Matt Arsenault 533d650e94 AMDGPU/GlobalISel: Move llvm.amdgcn.raw.buffer.store handling
Treat this the same way as loads. There's less value to the
intermediate nodes, but it's good to be consistent.
2020-01-27 14:59:30 -05:00
Matt Arsenault 09ed0e44d9 AMDGPU/GlobalISel: Select llvm.amdgcn.raw.tbuffer.load 2020-01-27 13:40:37 -05:00
Matt Arsenault fc90222a91 AMDGPU/GlobalISel: Select llvm.amdgcn.raw.buffer.load
Use intermediate instructions, unlike with buffer stores. This is
necessary because of the need to have an internal way to distinguish
between signed and unsigned extloads. This introduces some duplication
and near duplication with the buffer store selection path. The store
handling should maybe be moved into legalization to match and
eliminate the duplication.
2020-01-27 12:49:23 -05:00
Matt Arsenault e60d658260 AMDGPU/GlobalISel: Handle VOP3NoMods 2020-01-27 09:03:44 -08:00
Matt Arsenault 0968234590 AMDGPU/GlobalISel: Minor refactor of MUBUF complex patterns
This will make it easier to support the small variants in the complex
patterns for atomics.
2020-01-27 09:00:00 -08:00
Matt Arsenault ac0b9b4ccf AMDPGPU/GlobalISel: Select more MUBUF global addressing modes
The handling of the high bits of the resource descriptor seem weird to
me, where the 3rd dword changes based on the instruction.
2020-01-27 07:28:36 -08:00
Matt Arsenault fdaad485e6 AMDGPU/GlobalISel: Initial selection of MUBUF addr64 load/store
Fixes the main reason for compile failures on SI, but doesn't really
try to use the addressing modes yet.
2020-01-27 07:13:56 -08:00
Maheaha Shivamallappa 66f93071cd AMDGPU/GlobalISel: Clean-up code around ISel for Intrinsics.
Summary:
A minor code clean-up around ISel for intrinsic llvm.amdgcn.end.cf()

Reviewers: arsenm, mshivama

Reviewed By: arsenm

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D73358
2020-01-26 14:09:31 +05:30
Matt Arsenault 3b93945587 AMDGPU/GlobalISel: Select wqm, softwqm and wwm intrinsics 2020-01-24 13:06:44 -08:00
Matt Arsenault 1192d7b254 AMDGPU/GlobalISel: Handle 16-bank LDS llvm.amdgcn.interp.p1.f16
The pattern is also mishandled by the generated matcher, so workaround
this as in the DAG path.

The existing DAG tests aren't particularly targeted to just this one
intrinsic. These also end up differing in scheduling from SGPR->VGPR
operand constraint copies.
2020-01-22 12:10:59 -05:00
Matt Arsenault 52ec7379ad AMDGPU/GlobalISel: Fold add of constant into G_INSERT_VECTOR_ELT
Move the subregister base like in the extract case.
2020-01-22 11:09:15 -05:00
Matt Arsenault d1dbb5e471 AMDGPU/GlobalISel: Select G_INSERT_VECTOR_ELT 2020-01-22 11:00:49 -05:00
Matt Arsenault e3d352c541 AMDGPU/GlobalISel: Fold constant offset vector extract indexes
Handle dynamic vector extracts that use an index that's an add of a
constant offset into moving the base subregister of the indexing
operation.

Force the add into the loop in regbankselect, which will be recognized
when selected.
2020-01-22 10:50:59 -05:00
Matt Arsenault a722cbf77c AMDGPU/GlobalISel: Handle atomic_inc/atomic_dec
The intermediate instruction drops the extra volatile argument. We are
missing an atomic ordering on these.
2020-01-22 09:26:17 -05:00
Matt Arsenault 592de0009f AMDGPU/GlobalISel: Select llvm.amdgcn.update.dpp
The existing test is overly reliant on -mattr=-flat-for-global, and
some missing optimizations to re-use.
2020-01-17 20:09:53 -05:00
Matt Arsenault ec9628318d AMDGPU/GlobalISel: Select DS append/consume 2020-01-17 20:09:53 -05:00
Matt Arsenault 9b2f3532c7 AMDGPU/GlobalISel: Select DS GWS intrinsics 2020-01-16 11:25:10 -05:00
Matt Arsenault 711a17afaf AMDGPU/GlobalISel: Select exp with patterns
This does produce slightly different code. Now a unique IMPLICIT_DEF
is emitted for each of the implicit_def operands, rather than reusing
the same one.
2020-01-15 18:33:15 -05:00
Matt Arsenault 203801425d AMDGPU/GlobalISel: Select llvm.amdgcn.ds.ordered.{add|swap} 2020-01-13 13:09:38 -05:00
Matt Arsenault 35c3d101ae AMDGPU/GlobalISel: Select G_EXTRACT_VECTOR_ELT
Doesn't try to do the fold into the base register of an add of a
constant in the index like the DAG path does.
2020-01-09 19:52:24 -05:00
Matt Arsenault b4a647449f TableGen/GlobalISel: Add way for SDNodeXForm to work on timm
The current implementation assumes there is an instruction associated
with the transform, but this is not the case for
timm/TargetConstant/immarg values. These transforms should directly
operate on a specific MachineOperand in the source
instruction. TableGen would assert if you attempted to define an
equivalent GISDNodeXFormEquiv using timm when it failed to find the
instruction matcher.

Specially recognize SDNodeXForms on timm, and pass the operand index
to the render function.

Ideally this would be a separate render function type that looks like
void renderFoo(MachineInstrBuilder, const MachineOperand&), but this
proved to be somewhat mechanically painful. Add an optional operand
index which will only be passed if the transform should only look at
the one source operand.

Theoretically it would also be possible to only ever pass the
MachineOperand, and the existing renderers would check the parent. I
think that would be somewhat ugly for the standard usage which may
want to inspect other operands, and I also think MachineOperand should
eventually not carry a pointer to the parent instruction.

Use it in one sample pattern. This isn't a great example, since the
transform exists to satisfy DAG type constraints. This could also be
avoided by just changing the MachineInstr's arbitrary choice of
operand type from i16 to i32. Other patterns have nontrivial uses, but
this serves as the simplest example.

One flaw this still has is if you try to use an SDNodeXForm defined
for imm, but the source pattern uses timm, you still see the "Failed
to lookup instruction" assert. However, there is now a way to avoid
it.
2020-01-09 17:37:52 -05:00
Matt Arsenault 7d67742160 AMDGPU/GlobalISel: Fix import of zext of s16 op patterns 2020-01-09 10:29:32 -05:00
Matt Arsenault e71af77568 AMDGPU/GlobalISel: Add IMMPopCount xform
Partially fixes BFE pattern import.
2020-01-09 10:29:32 -05:00
Matt Arsenault 79450a4ea2 AMDGPU/GlobalISel: Add selectVOP3Mods_nnan
This doesn't enable any new imports yet, but moves the fmed patterns
from failing on this to hitting the "complex suboperand referenced
more than once" limitation in tablegen.
2020-01-09 10:29:32 -05:00
Matt Arsenault d964086c62 AMDGPU/GlobalISel: Add equiv xform for bitcast_fpimm_to_i32
Only partially fixes one pattern import.
2020-01-09 10:29:31 -05:00
Matt Arsenault 3952748ffd AMDGPU/GlobalISel: Fix add of neg inline constant pattern 2020-01-09 10:29:31 -05:00