Commit Graph

303 Commits

Author SHA1 Message Date
Sebastian Neubauer 221fdedc69 [AMDGPU][GlobalISel] Fold flat vgpr + constant addresses
Use getPtrBaseWithConstantOffset in selectFlatOffsetImpl to fold more
vgpr+constant addresses.

Differential Revision: https://reviews.llvm.org/D93692
2020-12-23 10:40:30 +01:00
Matt Arsenault 581d13f8ae GlobalISel: Return APInt from getConstantVRegVal
Returning int64_t was arbitrarily limiting for wide integer types, and
the functions should handle the full generality of the IR.

Also changes the full form which returns the originally defined
vreg. Add another wrapper for the common case of just immediately
converting to int64_t (arguably this would be useful for the full
return value case as well).

One possible issue with this change is some of the existing uses did
break without conversion to getConstantVRegSExtVal, and it's possible
some without adequate test coverage are now broken.
2020-12-22 22:23:58 -05:00
Stanislav Mekhanoshin d15119a02d [AMDGPU][GlobalISel] GlobalISel for flat scratch
It does not seem to fold offsets but this is not specific
to the flat scratch as getPtrBaseWithConstantOffset() does
not return the split for these tests unlike its SDag
counterpart.

Differential Revision: https://reviews.llvm.org/D93670
2020-12-22 16:33:06 -08:00
Austin Kerbow 4aa842a800 [AMDGPU] Add new pseudos for indirect addressing with VGPR Indexing
It is possible for copies or spills to be inserted in the middle of indirect
addressing sequences which use VGPR indexing. Spills to accvgprs could be
effected by the indexing mode.

Add new pseudo instructions that are expanded after register allocation to avoid
the problematic spill or copy placement.

Differential Revision: https://reviews.llvm.org/D91048
2020-12-08 12:24:12 -08:00
Jay Foad 4f87d30a06 [AMDGPU] Introduce and use isGFX10Plus. NFC.
It's more future-proof to use isGFX10Plus from the start, on the
assumption that future architectures will be based on current
architectures.

Also make use of the existing isGFX9Plus in a few places.

Differential Revision: https://reviews.llvm.org/D92092
2020-11-26 09:02:36 +00:00
Matt Arsenault d2e52eec51 AMDGPU: Select global saddr mode from SGPR pointer
Use the 64-bit SGPR base with a 0 offset, since it's 1 fewer
instruction to materialize the 0 vs. the 64-bit copy.
2020-11-16 11:51:06 -05:00
Matt Arsenault a6e353b1d0 AMDGPU: Split large offsets when selecting global saddr mode
When the offset doesn't fit in the immediate field, move some to
voffset.
2020-11-16 11:36:01 -05:00
Jessica Paquette b184a2eccf [GlobalISel] Add matchers for specific constants and a matcher for negations
It's fairly common to need matchers for a specific constant value, or for
common idioms like finding a negated register.

Add

- `m_SpecificICst`, which returns true when matching a specific value..
- `m_ZeroInt`, which returns true when an integer 0 is matched.
- `m_Neg`, which returns when a register is negated.

Also update a few places which use idioms related to the new matchers.

Differential Revision: https://reviews.llvm.org/D91397
2020-11-13 09:24:54 -08:00
Jay Foad 0ad4d04002 [AMDGPU] Remove an unused return value. NFC.
Differential Revision: https://reviews.llvm.org/D91063
2020-11-10 09:15:14 +00:00
Stanislav Mekhanoshin f738aee0bb [AMDGPU] Add default 1 glc operand to rtn atomics
This change adds a real glc operand to the return atomic
instead of just string " glc" in the middle of the asm
string.

Improves asm parser diagnostics.

Differential Revision: https://reviews.llvm.org/D90730
2020-11-05 10:41:59 -08:00
Jay Foad 040c50278c [AMDGPU] Fix ds_read2/write2 with unaligned offsets
These instructions use a scaled offset. We were wrongly selecting them
even when the required offset was not a multiple of the scale factor.

Differential Revision: https://reviews.llvm.org/D90607
2020-11-03 15:16:10 +00:00
Jay Foad 7a79921edd [AMDGPU] Remove gds operand from ds_gws_* MachineInstrs
The operand value was always 1 (except in some bad MIR tests) so it was
redundant.

Differential Revision: https://reviews.llvm.org/D90378
2020-10-29 15:04:23 +00:00
Jay Foad 5b91a6a88b [AMDGPU] Allow some modifiers on VOP3B instructions
V_DIV_SCALE_F32/F64 are VOP3B encoded so they can't use the ABS src
modifier, but they can still use NEG and the usual output modifiers.

This partially reverts 3b99f12a4e "AMDGPU: Remove modifiers from v_div_scale_*".

Differential Revision: https://reviews.llvm.org/D90296
2020-10-28 21:54:14 +00:00
Rodrigo Dominguez f71f5f39f6 [AMDGPU] Implement hardware bug workaround for image instructions
Summary:
This implements a workaround for a hardware bug in gfx8 and gfx9,
where register usage is not estimated correctly for image_store and
image_gather4 instructions when D16 is used.

Change-Id: I4e30744da6796acac53a9b5ad37ac1c2035c8899

Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, kerbowa, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D81172
2020-10-07 07:39:52 -04:00
Sebastian Neubauer 6a089ce0e4 [AMDGPU] Use tablegen for argument indices
Use tablegen generic tables to get the index of image intrinsic
arguments.
Before, the computation of which image intrinsic argument is at which
index was scattered in a few places, tablegen, the SDag instruction
selection and GlobalISel. This patch changes that, so only tablegen
contains code to compute indices and the ImageDimIntrinsicInfo table
provides these information.

Differential Revision: https://reviews.llvm.org/D86270
2020-10-05 11:50:52 +02:00
Stanislav Mekhanoshin 27a62f6317 [AMDGPU] global-isel support for RT
Differential Revision: https://reviews.llvm.org/D87847
2020-09-24 10:29:45 -07:00
Stanislav Mekhanoshin 277de43d88 [AMDGPU] Unify intrinsic ret/nortn interface
We have a single noret intrinsic an a lot of special handling
around it. Declare it just as any other but do not define rtn
instructions itself instead.

Differential Revision: https://reviews.llvm.org/D87719
2020-09-15 15:26:42 -07:00
Petar Avramovic 6e2a86ed5a AMDGPU/GlobalISel Check for NoNaNsFPMath in isKnownNeverSNaN
Check for NoNaNsFPMath function attribute in isKnownNeverSNaN.
Function attributes are in held in 'TargetMachine.Options'.
Among other things, this allows selection of some patterns imported
in D87351 since G_FCANONICALIZE is not generated when isKnownNeverSNaN
returns true in lowerFMinNumMaxNum.

However we notice some incorrect results since function attributes are
not correctly written in TargetMachine.Options when next function is
processed. Take a look at @v_test_no_global_nnans_med3_f32_pat0_srcmod0,
it has "no-nans-fp-math"="false" but TargetMachine.Options still has it
set to true since first function in test file had this attribute set to
true. This will be fixed in D87511.

Differential Revision: https://reviews.llvm.org/D87456
2020-09-14 12:11:00 +02:00
Petar Avramovic 09b8871f8d AMDGPU/GlobalISel/Emitter Support for predicate code that uses operands
Predicates with 'let PredicateCodeUsesOperands = 1' want to examine
matched operands. When we encounter predicate code that uses operands,
analyze its named operand arguments and create a map between argument
index and name. Later, when leaf node with name is encountered, emit
GIM_RecordNamedOperand that will store that operand at its argument
index in operand list. This operand list will be an argument to c++
code of the predicate.

Differential Revision: https://reviews.llvm.org/D87285
2020-09-14 10:39:56 +02:00
Jay Foad 831457c6d5 [AMDGPU][GlobalISel] Eliminate barrier if workgroup size is not greater than wavefront size
If a workgroup size is known to be not greater than wavefront size
the s_barrier instruction is not needed since all threads are guaranteed
to come to the same point at the same time.

This is the same optimization that was implemented for SelectionDAG in
D31731.

Differential Revision: https://reviews.llvm.org/D86609
2020-08-26 13:47:51 +01:00
Mirko Brkusanin d17ea67b92 [AMDGPU][GlobalISel] Fix 96 and 128 local loads and stores
Fix local ds_read/write_b96/b128 so they can be selected if the alignment
allows. Otherwise, either pick appropriate ds_read2/write2 instructions or break
them down.

Differential Revision: https://reviews.llvm.org/D81638
2020-08-21 12:26:31 +02:00
Jay Foad 3497860203 [AMDGPU] Remove uses of Register::isPhysicalRegister/isVirtualRegister
... in favour of the isPhysical/isVirtual methods.
2020-08-20 17:59:11 +01:00
Matt Arsenault 2f5f5febf3 AMDGPU/GlobalISel: Select llvm.amdgcn.groupstaticsize
Previously, it would successfully select and assert if not HSA or PAL
when expanding the pseudoinstruction. We don't need the
pseudoinstruction anymore since we know the total size after
legalization.
2020-08-18 09:28:01 -04:00
Matt Arsenault 3ba7777b94 AMDGPU/GlobalISel: Fix selection of s1/s16 G_[F]CONSTANT
The code to determine the value size was overcomplicated and only
correct in the case where the result register already had a register
class assigned. We can always take the size directly from the
register's type.
2020-08-18 09:28:01 -04:00
Matt Arsenault a9ee0589a8 AMDGPU/GlobalISel: Match global saddr addressing mode 2020-08-17 15:48:06 -04:00
Matt Arsenault c8a9872259 AMDGPU/GlobalISel: Look through copies in getPtrBaseWithConstantOffset
We may have an SGPR->VGPR copy if a totally uniform pointer
calculation is used for a VGPR pointer operand.

Also hack around a bug in MUBUF matching which would incorrectly use
MUBUF for global when flat was requested. This should really be a
predicate on the parent pattern, but the DAG always checked this
manually inside the complex pattern.
2020-08-17 12:31:38 -04:00
Matt Arsenault 625db2fe5b AMDGPU: Remove slc from flat offset complex patterns
This was always set to 0. Use a default value of 0 in this context to
satisfy the instruction definition patterns. We can't unconditionally
use SLC with a default value of 0 due to limitations in TableGen's
handling of defaulted operands when followed by non-default operands.
2020-08-15 12:12:24 -04:00
Matt Arsenault 0dc4c36d3a AMDGPU/GlobalISel: Manually select llvm.amdgcn.writelane
Fixup the special case constant bus handling pre-gfx10.
2020-08-11 11:56:16 -04:00
Matt Arsenault 40188f807d AMDGPU/GlobalISel: Don't try to handle undef source operand
This is now illegal MIR
2020-08-10 08:49:43 -04:00
Matt Arsenault a0ec81f70d AMDGPU/GlobalISel: Merge load/store select cases 2020-08-10 08:46:26 -04:00
Matt Arsenault 63cdc9a49f AMDGPU/GlobalISel: Handle llvm.amdgcn.ds.{fadd|fmin|fmax}
These intrinsics are missing mangling for both the pointer and data
type.
2020-08-06 11:09:08 -04:00
Matt Arsenault dcf3ffb0a8 AMDGPU/GlobalISel: Move frame index selection to patterns
Doesn't really save any code until global value is handled too.
2020-08-06 10:42:15 -04:00
Matt Arsenault d188a608bd AMDGPU: Fix code duplication between the selectors
Not sure this is the right place for this helper.
2020-08-06 10:42:15 -04:00
Matt Arsenault 5316256709 AMDGPU/GlobalISel: Fix assert on copy to vcc
This was trying to constrain a physical register. By the verifier's
understanding, it's impossible to have a 1-bit copy to vcc/vcc_lo so
don't try to handle physregs.
2020-08-06 09:41:14 -04:00
Matt Arsenault 486e84dfa4 AMDGPU/GlobalISel: Use live in helper function for returnaddress 2020-08-04 17:36:01 -04:00
Matt Arsenault 89011fc3c9 AMDGPU/GlobalISel: Select llvm.returnaddress 2020-08-04 17:14:38 -04:00
Matt Arsenault 0de547ed4a AMDGPU/GlobalISel: Ensure subreg is valid when selecting G_UNMERGE_VALUES
Fixes verifier error with SGPR unmerges with 96-bit result types.
2020-08-04 12:27:34 -04:00
Matt Arsenault 2414bab5d7 AMDGPU/GlobalISel: Remove old hacks for boolean selection
There were various hacks used to try to avoid making s1 SGPR vs. s1
VCC ambiguous after constraining the register before we had a strategy
to deal with this. This also attempted to handle undef operands, which
are now illegal gMIR.
2020-08-03 09:04:14 -04:00
Matt Arsenault d8ef1d1251 AMDGPU/GlobalISel: Fix selecting broken copies for s32->s64 anyext
These should probably not be legal in the first place, but that might
also be a pain.
2020-08-03 08:36:41 -04:00
Stanislav Mekhanoshin 5b32518f96 [AMDGPU] Do not use undef on indirect source
We are using undef on the indirect move source subreg and then
using implicit super-reg. This creates a problem in RA when
Greedy decides to split the register. It reassigns the implicit
super-reg but does not bother to change undef source because
it is really does not matter. The fix is to stop lying to RA and
drop undef flag.

This has also hit a problem in SIFoldOperands as it can fold
immediate into an indirect move since there is no undef flag
anymore. That results in multiple test failures, so added the
check for this case.

Differential Revision: https://reviews.llvm.org/D84899
2020-07-30 10:41:59 -07:00
Matt Arsenault 59fac51ff2 AMDGPU/GlobalISel: Handle llvm.amdgcn.reloc.constant 2020-07-29 14:24:21 -04:00
Matt Arsenault bb23b5cfe0 AMDGPU/GlobalISel: Merge identical select cases 2020-07-28 11:42:17 -04:00
Matt Arsenault d35e2c101d AMDGPU/GlobalISel: Fix not constraining ds_append/consume operands 2020-07-26 10:17:36 -04:00
Matt Arsenault 5819159995 AMDGPU/GlobalISel: Pack constant G_BUILD_VECTOR_TRUNCs when selecting 2020-07-26 09:55:34 -04:00
Matt Arsenault 4033aa1467 AMDGPU/GlobalISel: Sign extend integer constants
This matches the DAG behavior and fixes immediate folding
2020-07-26 09:30:14 -04:00
Matt Arsenault 392b969c32 AMDGPU/GlobalISel: Don't assert on G_INSERT > 128-bits
Just fallback for now. Really tablegen needs to generate all of the
subregister index handling we need.
2020-07-25 10:05:44 -04:00
Matt Arsenault 21ef01b7e3 AMDGPU: Remove outdated fixme 2020-07-20 11:41:41 -04:00
Matt Arsenault 79f67cae91 AMDGPU: Rename add/sub with carry out instructions
The hardware has created a real mess in the naming for add/sub, which
have been renamed basically every generation. Switch the carry out
pseudos to have the gfx9/gfx10 names. We were using the original SI/CI
v_add_i32/v_sub_i32 names. Later targets reintroduced these names as
carryless instructions with a saturating clamp bit, which we do not
define. Do this rename so we can unambiguously add these missing
instructions.

The carry-in versions should also be renamed, but at least those had a
consistent _u32 name to begin with. The 16-bit instructions were also
renamed, but aren't ambiguous.

This does regress assembler error message quality in some cases. In
mismatched wave32/wave64 situations, this will switch from
"unsupported instruction" to "invalid operand", with the error
pointing at the wrong position. I couldn't quite follow how the
assembler selects these, but the previous behavior seemed accidental
to me. It looked like there was a partial attempt to handle this which
was never completed (i.e. there is an AMDGPUOperand::isBoolReg but it
isn't used for anything).
2020-07-16 13:16:30 -04:00
Petar Avramovic 5658002b80 AMDGPU/GlobalISel: Select G_FREEZE
Select G_FREEZE in the same way that COPY is selected.

Differential Revision: https://reviews.llvm.org/D83031
2020-07-16 11:10:48 +02:00
Mirko Brkusanin 38998cfa9c [AMDGPU][GlobalISel] Fix subregister index for EXEC register in selectBallot.
Temporarily remove subregister for EXEC in selectBallot added in
https://reviews.llvm.org/D83214 to fix failures on expensive checks buildbot.
2020-07-13 13:35:34 +02:00
Mirko Brkusanin ce23e54162 [AMDGPU][GlobalISel] Select llvm.amdgcn.ballot
Select ballot intrinsic for GlobalISel.

Differential Revision: https://reviews.llvm.org/D83214
2020-07-13 12:14:43 +02:00
Petar Avramovic d717382633 AMDGPU/GlobalISel: Select icmp intrinsic
Select into corresponding V_CMP instruction based on CmpInst predicate,
stored as immediate, in last operand.

Differential Revision: https://reviews.llvm.org/D82652
2020-06-30 10:57:41 +02:00
Matt Arsenault fb51d508ee AMDGPU/GlobalISel: Select general case for G_PTRMASK 2020-06-14 13:12:29 -04:00
Sebastian Neubauer 29a6ad94fd [AMDGPU] Add G16 support to image instructions
Add G16 feature for GFX10 and support A16 and G16 in GlobalISel.

Differential Revision: https://reviews.llvm.org/D76836
2020-06-12 11:26:31 +02:00
Matt Arsenault bb10fa3a53 AMDGPU: Fix wrong null value for private address space
I'm guessing this was a holdover from when 0 was an invalid stack
pointer, but surprised nobody has discovered this before.

Also don't allow offset folding for -1 pointers, since it looks weird
to partially fold this.
2020-05-26 16:35:13 -04:00
Matt Arsenault 50d4b22ca0 AMDGPU/GlobalISel: Fix assert on 16-bit G_EXTRACT results
I consider this to be a hack, since we probably should not mark any
16-bit extract as legal, and require all extracts to be done on
multiples of 32. There are quite a few more battles to fight in the
legalizer for sub-dword vectors, so just select this for now so we can
pass OpenCL conformance without crashing.

Also fix the same assert for G_INSERTs. Unlike G_EXTRACT there's not a
trivial way to select this so just fail on it.
2020-05-26 12:14:08 -04:00
Matt Arsenault 8bc03d2168 GlobalISel: Merge G_PTR_MASK with llvm.ptrmask intrinsic
Confusingly, these were unrelated and had different semantics. The
G_PTR_MASK instruction predates the llvm.ptrmask intrinsic, but has a
different format. G_PTR_MASK only allows clearing the low bits of a
pointer, and only a constant number of bits. The ptrmask intrinsic
allows an arbitrary mask. Replace G_PTR_MASK to match the intrinsic.

Only selects the cases that look like the old instruction. More work
is needed to select the general case. Also new legalization code is
still needed to deal with the case where the incoming mask size does
not match the pointer size, which has a specified behavior in the
langref.
2020-05-26 11:48:13 -04:00
Matt Arsenault 2dd7714b8d AMDGPU/GlobalISel: Don't select boolean phi by default
This is currently missing most of the hard parts to lower correctly,
so disable it for now. This fixes at least one OpenCL conformance test
and allows it to pass with fallback. Hide this behind an option for
now.
2020-05-26 11:01:21 -04:00
Matt Arsenault 7dece2fde3 AMDGPU: Use Register 2020-04-21 15:19:35 -04:00
Jay Foad 858d8db470 AMDGPU/GlobalISel: Work around another selector crash
This does for G_EXTRACT_VECTOR_ELT what 588bd7be36 did for G_TRUNC.

Ideally types without a corresponding register class wouldn't reach
here, but we're currently missing some (in particular a 192-bit class
is missing).
2020-04-17 12:07:54 +01:00
Matt Arsenault 588bd7be36 AMDGPU/GlobalISel: Work around a selector crash
Ideally types without a corresponding register class wouldn't reach
here, but we're currently missing some (in particular a 192-bit class
is missing).
2020-04-15 14:38:50 -04:00
Matt Arsenault cc149172da AMDGPU/GlobalISel: Fix selection of scalar f64 G_FABS
This wasn't covered by existing tablegen patterns, but also suffers
the same issues as G_FNEG. Workaround them by manually selecting, like
G_FNEG.
2020-04-14 22:05:22 -04:00
Matt Arsenault 8a5f0dafd4 AMDGPU/GlobalISel: Select llvm.amdgcn.div.scale 2020-04-06 11:50:19 -04:00
Austin Kerbow 30f18ed387 [AMDGPU] Handle SMRD signed offset immediate
Summary:
This fixes a few issues related to SMRD offsets. On gfx9 and gfx10 we have a
signed byte offset immediate, however we can overflow into a negative since we
treat it as unsigned.

Also, the SMRD SOFFSET sgpr is an unsigned offset on all subtargets. We
sometimes tried to use negative values here.

Third, S_BUFFER instructions should never use a signed offset immediate.

Differential Revision: https://reviews.llvm.org/D77082
2020-04-02 17:41:52 -07:00
Simon Pilgrim be7a233e93 Fix operator precedence warning. NFCI. 2020-04-01 14:36:52 +01:00
Matt Arsenault d0dd24a381 AMDGPU/GlobalISel: Fix crashing on weird G_INSERT sources
No test since these cases shouldn't really be getting through the
legalizer.
2020-03-30 18:14:04 -04:00
Matt Arsenault bcb643c8af AMDGPU/GlobalISel: Handle image atomics 2020-03-30 17:41:04 -04:00
Matt Arsenault 48eda37282 AMDGPU/GlobalISel: Start selecting image intrinsics
Does not handled atomics yet.
2020-03-30 17:33:04 -04:00
Scott Linder 60b1967c39 [AMDGPU] Add Scratch Wave Offset to Scratch Buffer Descriptor in entry functions
Add the scratch wave offset to the scratch buffer descriptor (SRSrc) in
the entry function prologue. This allows us to removes the scratch wave
offset register from the calling convention ABI.

As part of this change, allow the use of an inline constant zero for the
SOffset of MUBUF instructions accessing the stack in entry functions
when a frame pointer is not requested/required. Entry functions with
calls still need to set up the calling convention ABI stack pointer
register, and reference it in order to address arguments of called
functions. The ABI stack pointer register remains unswizzled, but is now
wave-relative instead of queue-relative.

Non-entry functions also use an inline constant zero SOffset for
wave-relative scratch access, but continue to use the stack and frame
pointers as before. When the stack or frame pointer is converted to a
swizzled offset it is now scaled directly, as the scratch wave offset no
longer needs to be subtracted first.

Update llvm/docs/AMDGPUUsage.rst to reflect these changes to the calling
convention.

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D75138
2020-03-19 15:35:16 -04:00
Matt Arsenault ed72bcae34 AMDGPU/GlobalISel: Fix mishandling SGPR v2s16 add/sub/mul
We weren't considering the packed case correctly, and this was passing
through to the selector. The selector only checked the size, so this
would incorrectly compile to a single 32-bit scalar add.

As usual, the LegalizerHelper is somewhat awkward to use from
applyMappingImpl. I think this is the first place we've needed
multi-step legalization here though.
2020-03-09 22:51:54 -04:00
Matt Arsenault 15bf916b54 AMDGPU: Remove VOP3OpSelMods0 complex pattern
Use default operand of 0 instead.
2020-03-04 17:18:22 -05:00
Matt Arsenault 0b46b078b6 AMDGPU/GlobalISel: Fix incorrect VOP3P fneg folding
We use some s32 values in VOP3P operands, and won't see any
intervening casts from a 32-bit fneg. Make sure it's really a packed
fneg before folding.
2020-02-24 21:20:35 -05:00
Matt Arsenault bf4933b4ea AMDGPU/GlobalISel: Remove dead code 2020-02-21 19:19:32 -05:00
Jay Foad b72f1448ce AMDGPU/GlobalISel: Better code for one case of G_SHUFFLE_VECTOR on v2i16
Reviewers: arsenm

Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, rovka, dstuttard, tpr, t-tye, hiraditya, kerbowa, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D74987
2020-02-21 21:16:39 +00:00
Matt Arsenault dfce5fd50a AMDGPU/GlobalISel: Select VOP3P instructions
This only handles the basic cases. More work is needed to make better
use of op_sel.
2020-02-21 13:35:40 -05:00
Matt Arsenault 72eef820d5 AMDGPU/GlobalISel: Select G_SHUFFLE_VECTOR
G_SHUFFLE_VECTOR is legal since it theoretically may help match op_sel
for VOP3P instructions. Expand it in some other way in case it doesn't
fold into the use instructions.
2020-02-21 13:35:40 -05:00
Matt Arsenault 043ed2e22a AMDGPU/GlobalISel: Fix xnor matching
We should try the generated matchers before the manual selection. This
means the patterns are now handling the common cases, but the manual
selection code is not yet dead. It's still handling the non-s32/s64
cases (like v2s16 and v2s32). Currently tablegen doesn't have a nice
way to have a single pattern that covers multiple types.
2020-02-21 11:42:49 -05:00
Matt Arsenault ac7abe0ba9 AMDGPU/GlobalISel: Manually select G_BUILD_VECTOR_TRUNC
We have patterns for s_pack* selection, but they assume the inputs are
a build_vector with 16-bit inputs, not a truncating build
vector. Since there's still outstanding work for how to handle
mismatched result and source element vector operations, and since I'm
trying a different packed vector strategy than SelectionDAG, just
manually select this for now.
2020-02-21 10:34:11 -05:00
Matt Arsenault b64aa8c715 AMDGPU/GlobalISel: Fix constant bus violation with source modifiers
This looked through copies to find the source modifiers, which may
have been SGPR->VGPR copies added to avoid potential constant bus
violations. Re-insert a copy to a VGPR if this happens.
2020-02-21 10:30:23 -05:00
Matt Arsenault ff4639f060 AMDGPU/GlobalISel: Select MUBUF path for global atomic cmpxchg
I'm not sure why this isn't a pattern, but the DAG manually selects
this.
2020-02-19 06:19:22 -08:00
Matt Arsenault 86813e2768 AMDGPU/GlobalISel: Select llvm.amdgcn.s.buffer.load
Doesn't try to fail on the dlc bit pre-gfx10 like the DAG lowering
does.
2020-02-17 08:02:40 -08:00
Matt Arsenault e5805529bf AMDGPU/GlobalISel: Select v2s32->v2s16 G_TRUNC
It would be nice if there was a way to avoid the tied operand, but as
far as I can tell there isn't a way to use or with op_sel to achieve
this
2020-02-17 09:20:13 -05:00
Matt Arsenault 8d8d46b57a AMDGPU/GlobalISel: Fix missing impdef of scc on boolean bit ops 2020-02-14 22:35:30 -05:00
Matt Arsenault dc3e499dd4 AMDGPU/GlobalISel: Fix G_EXTRACT of 96-bit results
This would assert on an unhandled size in getRegSplitParts.
2020-02-14 15:57:40 -08:00
Austin Kerbow 3a312c3ee5 [AMDGPU][GlobalISel] Refactor selectDS1Addr1Offset/selectDS64Bit4ByteAligned
Differential Revision: https://reviews.llvm.org/D74261
2020-02-11 16:57:13 -08:00
Stanislav Mekhanoshin 453a8f3af7 [AMDGPU] Remove AMDGPURegisterInfo
R600 and GCN do not have anything in common in terms of register
file organization anymore.

Differential Revision: https://reviews.llvm.org/D74426
2020-02-11 11:13:38 -08:00
Matt Arsenault 2126c70e3a AMDGPU/GlobalISel: Don't mis-select vector index on a constant
Vector indexing with a constant index should be folded out in the
legalizer, but this was accidentally falling through. This would
produce the indexing operation with $noreg. Handle this case as a
dynamic index just in case a bug like this happens again in the
future.
2020-02-09 18:02:37 -05:00
Matt Arsenault 12fe9b26ec AMDGPU/GlobalISel: Select G_SEXT_INREG 2020-02-04 13:23:53 -08:00
Matt Arsenault 49e424e08e AMDGPU/GlobalISel: Select global MUBUF atomicrmw 2020-01-31 06:05:41 -08:00
Matt Arsenault 0426c2d07d Reapply "AMDGPU: Cleanup and fix SMRD offset handling"
This reverts commit 6a4acb9d80.
2020-01-31 06:01:28 -08:00
Matt Arsenault 6a4acb9d80 Revert "AMDGPU: Cleanup and fix SMRD offset handling"
This reverts commit 17dbc6611d.

A test is failing on some bots
2020-01-30 15:39:51 -08:00
Matt Arsenault 17dbc6611d AMDGPU: Cleanup and fix SMRD offset handling
I believe this also fixes bugs with CI 32-bit handling, which was
incorrectly skipping offsets that look like signed 32-bit values. Also
validate the offsets are dword aligned before folding.
2020-01-30 15:04:21 -08:00
Matt Arsenault d6b83d6ba5 AMDGPU/GlobalISel: Don't use pointless getConstantVRegVal
This is always a G_CONSTANT already
2020-01-30 09:38:43 -05:00
Austin Kerbow 2605adb69c [AMDGPU][GlobalISel] Select 8-byte LDS Ops with 4-byte alignment
Reviewers: arsenm

Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, rovka, dstuttard, tpr, t-tye, hiraditya, Petar.Avramovic, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D73585
2020-01-29 10:42:12 -08:00
Matt Arsenault 96352e0a1b AMDGPU/GlobalISel: Handle LDS with relocations case 2020-01-29 08:18:55 -08:00
Matt Arsenault 94e8ef4d4c AMDGPU/GlobalISel: Look through copies for source modifiers
When all VOP instructions are legalized to VGPRs, any SGPR source
modifiers will have a copy in the way.
2020-01-29 08:08:13 -08:00
Matt Arsenault 02adfb5155 AMDGPU/GlobalISel: Manually select scalar f64 G_FNEG
This should be no problem to support with a pattern, but it turns out
there are just too many yaks to shave. The main problem is in the DAG
emitter, which I have no desire to sink effort into fixing.

If we had a bit to disable patterns in the DAG importer, fixing the
GlobalISelEmitter is more manageable.
2020-01-29 06:49:16 -08:00
Matt Arsenault d2a9739274 AMDGPU/GlobalISel: Eliminate SelectVOP3Mods_f32
Trivial type predicates should be moved into the tablegen pattern
itself, and not checked inside complex patterns. This eliminates a
redundant complex pattern, and fixes select source modifiers for
GlobalISel.

I have further patches which fully handle select in tablegen and
remove all of the C++ selection, although it requires the ugliness to
support the entire range of legal register types.
2020-01-27 17:53:54 -05:00
Matt Arsenault 533d650e94 AMDGPU/GlobalISel: Move llvm.amdgcn.raw.buffer.store handling
Treat this the same way as loads. There's less value to the
intermediate nodes, but it's good to be consistent.
2020-01-27 14:59:30 -05:00
Matt Arsenault 09ed0e44d9 AMDGPU/GlobalISel: Select llvm.amdgcn.raw.tbuffer.load 2020-01-27 13:40:37 -05:00