Commit Graph

428 Commits

Author SHA1 Message Date
Matt Arsenault 039c917b43 AMDGPU/GlobalISel: Fix asserting on gather4 intrinsics 2020-03-17 11:07:30 -04:00
Matt Arsenault d0fe13ecf9 AMDGPU/GlobalISel: Fully handle 0 dmask case during legalize
For normal loads, fully eliminate the load. For the TFE case, adjust
the dmask value in the instruction so the selector doesn't need to
handle it. For the TFE special case, I guess it would be possible to
replace the loaded data register with undef, but as-is this will start
treating it as a well defined value.
2020-03-17 10:15:30 -04:00
Matt Arsenault d9a012ed8a AMDGPU/GlobalISel: Adjust image load register type based on dmask
Trim elements that won't be written. The equivalent still needs to be
done for writes. Also start widening 3 elements to 4
elements. Selection will get the count from the dmask.
2020-03-17 10:09:18 -04:00
Matt Arsenault 83ffbf2618 AMDGPU/GlobalISel: Legalize non-a16 non-NSA images 2020-03-17 10:02:09 -04:00
Matt Arsenault 2aba9b6cf8 AMDGPU/GlobalISel: Legalize a16 images
Pack the address registers in the legalizer. Avoid introducing a huge
family of new intermediate operations by filling dead operands with
noreg.
2020-03-17 10:02:09 -04:00
Matt Arsenault 57d896e838 AMDGPU/GlobalISel: Make some large merges legal
We allow up to 1024-bit registers, so we should support merges all the
way to the maximum.
2020-03-16 10:49:10 -04:00
Matt Arsenault bb8622094d AMDGPU: Don't handle kernarg.segment.ptr in functions
Just lower this to null. Pass implicitarg.ptr in its place in the
argument list.
2020-03-13 12:51:12 -07:00
Matt Arsenault 1e0c540360 AMDGPU: Don't hard error on LDS globals in functions
Instead, emit a trap and a warning. We force inlining of this
situation, so any function where this happens should be dead as
indirect or external calls are not yet supported. This should avoid
erroring on dead code.
2020-03-11 15:34:11 -04:00
Matt Arsenault edd0dfca0d AMDGPU/GlobalISel: Refine G_TRUNC legality rules
Scalarize most truncates. Avoid touching cases that could end up in
unresolvable infinite loops.
2020-03-10 15:32:22 -07:00
Matt Arsenault ce8a1f7294 GlobalISel: Implement fewerElementsVector for G_TRUNC
Extend fewerElementsVectorBasic to handle operands with different
element types.
2020-03-10 15:17:20 -07:00
hsmahesha 3fda1fde8f AMDGPU/GlobalISel: Support llvm.trap and llvm.debugtrap intrinsics
Summary: Lower trap and debugtrap intrinsics to AMDGPU machine instruction(s).

Reviewers: arsenm, nhaehnle, kerbowa, cdevadas, t-tye, kzhuravl

Reviewed By: arsenm

Subscribers: kzhuravl, jvesely, wdng, yaxunl, rovka, dstuttard, tpr, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D74688
2020-03-05 08:16:57 +05:30
Matt Arsenault 86e13ec194 AMDGPU/GlobalISel: Use packed for G_ADD/G_SUB/G_MUL v2s16 2020-02-25 11:20:35 -05:00
Jay Foad 33cbd5ee08 AMDGPU/GlobalISel: Legalize s64 min/max by lowering
Reviewers: arsenm, rampitec

Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, rovka, dstuttard, tpr, t-tye, hiraditya, kerbowa, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D75108
2020-02-25 16:00:43 +00:00
Jay Foad 0ed4744bb5 AMDGPU/GlobalISel: Lower 64-bit uaddo/usubo
Summary: Add more test cases for signed and unsigned add/sub with overflow.

Reviewers: arsenm, rampitec, kerbowa

Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, rovka, dstuttard, tpr, t-tye, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D75051
2020-02-24 23:08:14 +00:00
Matt Arsenault 72eef820d5 AMDGPU/GlobalISel: Select G_SHUFFLE_VECTOR
G_SHUFFLE_VECTOR is legal since it theoretically may help match op_sel
for VOP3P instructions. Expand it in some other way in case it doesn't
fold into the use instructions.
2020-02-21 13:35:40 -05:00
Matt Arsenault 79ff188add AMDGPU/GlobalISel: Legalize G_FPOW
There are few differences from the DAG handling. First, the DAG
handling uses a primitive selection pattern instead of custom
legalizing it. Because of this, this makes use of source modifiers
while the DAG does not.

Also instead of promoting f16, try to use the f16 log/exp. There's no
f16 fmul_legacy, so widen just for the multiply, although I'm not sure
that's the best solution.
2020-02-21 10:31:13 -05:00
Matt Arsenault 37c452a289 AMDGPU/GlobalISel: Adjust branch target when lowering loop intrinsic
This needs to steal the branch target like the other control flow
intrinsics.
2020-02-18 06:35:40 -08:00
Matt Arsenault f742a28ae3 AMDGPU/GlobalISel: Custom lower 32-bit G_SDIV/G_SREM 2020-02-17 15:09:51 -05:00
Matt Arsenault e240b27d6d AMDGPU/GlobalISel: Allow arbitrary global values
Treat unknown address spaces as global
2020-02-17 11:32:28 -08:00
Matt Arsenault 96db12d507 AMDGPU/GlobalISel: Custom lower 32-bit G_UDIV/G_UREM
AMDGPUCodeGenPrepare expands this most of the time, but not always. We
will always at least need a fallback option here. This is the 3rd
implementation of the same expansion in the backend. Eventually I
would like to eliminate the IR expansion (and the DAG version
obviously).

Currently the new legalizer path produces a better result, since the
IR expansion results in extra operations which need to be combined
out. Notably, the IR expansion results in multiplies by 0.
2020-02-17 11:05:50 -08:00
Michael Liao 487fcc8d3d Fix `-Wpedantic` warning. NFC. 2020-02-17 00:18:01 -05:00
Matt Arsenault 295bbea3ed AMDGPU/GlobalISel: Fix non-power-of-2 G_SITOFP/G_UITOFP
This wouldn't work for s33-s63 sources.
2020-02-16 22:48:57 -05:00
Matt Arsenault 044d40ed46 AMDGPU/GlobalISel: Move lambdas to normal function
These aren't using any local state
2020-02-16 22:48:32 -05:00
Matt Arsenault 60fea2713d AMDGPU/GlobalISel: Improve 16-bit bswap
Match the new DAG behavior and use v_perm_b32 when available. Also
does better on SI/CI by expanding 16-bit swaps. Also fix
non-power-of-2 cases.
2020-02-14 15:57:39 -08:00
Matt Arsenault bfbfa18591 GlobalISel: Lower s64->s16 G_FPTRUNC
This is more or less directly ported from the AMDGPU custom lowering
for FP_TO_FP16. I made a few minor fixups (using G_UNMERGE_VALUES
instead of creating shift/trunc to extract the two halves, and zexting
an inverted compare instead of select_cc).

This also does not include the fast math expansion the DAG which
converts to f32 and then to f16. I think that belongs in a
pre-legalize combine instead.
2020-02-14 10:46:58 -08:00
Matt Arsenault a257bde420 AMDGPU/GlobalISel: Handle G_BSWAP 2020-02-14 09:09:44 -08:00
Matt Arsenault 5adbf7d57f AMDGPU/GlobalISel: Make G_TRUNC legal
This is required to be legal. I'm not sure how we were getting away
without defining any rules for it.
2020-02-13 15:25:52 -05:00
Matt Arsenault fa61e200e5 AMDGPU/GlobalISel: Widen non-power-of-2 load results
Load extra bits if suitably aligned. This allows using widened
3-vector loads on SI, and fixes legalization for <9 x s32> (which LSV
apparently forms frequently on lowered kernel argument lists).

Fix incorrectly treating these as legal on SI. This should emit a
64-bit store and a 32-bit store.

I think all of the load and store rules are just about complete, but
due for a rewrite.
2020-02-12 09:35:10 -05:00
Matt Arsenault f4a38c114e AMDGPU/GlobalISel: Look through casts when legalizing vector indexing
We were failing to find constants that were casted. I feel like the
artifact combiner should have folded the constant in the trunc before
the custom lowering, but that doesn't happen.
2020-02-09 18:02:10 -05:00
Petar Avramovic 7df5fc9e03 [GlobalISel] Add buildMerge with SrcOp initializer list
Allows more flexible use of buildMerge in places where
use operands are available as SrcOp since it does not
require explicit conversion to Register.
Simplify code with new buildMerge.

Differential Revision: https://reviews.llvm.org/D74223
2020-02-07 18:43:45 +01:00
Matt Arsenault 8de2dad9e0 GlobalISel: Fix lowering of G_CTLZ/G_CTTZ
The type passed to lower was invalid, so I'm not sure how this was
even working before. The source and destination type also do not have
to match, so make sure to use the right ones.
2020-02-07 06:54:12 -08:00
Matt Arsenault 6a570dc548 AMDGPU/GlobalISel: Fix non-pow-2 add/sub/mul for 16-bit insts
These wouldn't legalize between 16-bits and 32-bits on targets with
16-bit instructions.
2020-02-06 21:43:54 -05:00
Matt Arsenault baafe82b07 AMDGPU/GlobalISel: Remove bitcast legality hack 2020-02-05 16:24:24 -05:00
Matt Arsenault 364326ce66 AMDGPU/GlobalISel: Add mem operand to s.buffer.load intrinsic
Really the intrinsic definition is wrong, but work around this
here. The DAG lowering introduces an MMO. We have to introduce a new
operation to avoid the verifier complaining about the missing mayLoad.
2020-02-05 15:04:42 -05:00
Matt Arsenault 5aa6e246a1 AMDGPU/GlobalISel: Legalize f64 G_FFLOOR for SI
Use cmp ord instead of cmp_class compared to the DAG version for the
nan check, but mostly try to match the existsing pattern.

I think the sign doesn't matter for fract, so we could do a little
better with the source modifier matching.

I think this is also still broken as in D22898, but I'm leaving it
as-is for now while I don't have an SI system to test on.
2020-02-05 14:32:01 -05:00
Matt Arsenault 7bffa97285 AMDGPU/GlobalISel: Prefer merge/unmerge ops to legalize TFE
These have a better chance of combining with other operations and are
currently much better supported than G_EXTRACT.
2020-02-05 12:56:10 -05:00
Matt Arsenault e65e6d052e AMDGPU/GlobalISel: Legalize TFE image result loads
Rewrite the result register pair into the expected sinigle register
format in the legalizer.

I'm also operating under the assumption that TFE doesn't apply to
stores or atomics, but don't know if this is true or not.
2020-02-05 12:40:20 -05:00
Jordan Rupprecht 9f507bfd8d NFC: fix unused var warnings in no-assert builds 2020-02-05 09:26:59 -08:00
Matt Arsenault 69cc9f3046 AMDGPU/GlobalISel: Legalize llvm.amdgcn.s.buffer.load
The 96-bit results need to be widened.

I find the interaction between LegalizerHelper and MIRBuilder somewhat
awkward. The custom legalization is called by the LegalizerHelper, but
then does not have access to the helper. You have to construct a new
helper, which then does not own the MachineIRBuilder, but does modify
it. Maybe custom legalization should be passed the helper?
2020-02-05 12:01:34 -05:00
Matt Arsenault dfa9420f09 AMDGPU/GlobalISel: Don't use legal v2s16 G_BUILD_VECTOR
If we have s_pack_* instructions, legalize this to
G_BUILD_VECTOR_TRUNC from s32 elements. This is closer to how how the
s_pack_* instructions really behave.

If we don't have s_pack_ instructions, expand this by creating a merge
to s32 and bitcasting. This expands to the expected bit operations. I
think this eventually should go in a new bitcast legalize action type
in LegalizerHelper.

We already directly emit the shift operations in RegBankSelect for the
vector case. This could possibly be cleaned up, but I also may want to
defer doing this expansion to selection anyway. I'll see about that
when I try to actually match VOP3P instructions.

This breaks the selection of the build_vector since tablegen doesn't
know how to match G_BUILD_VECTOR_TRUNC yet, so just xfail it for now.
2020-02-05 11:52:18 -05:00
Matt Arsenault 05f2a04ba7 AMDGPU/GlobalISel: Legalize G_SEXT_INREG
Split the VALU 64-bit case in RegBankSelect.
2020-02-04 13:23:53 -08:00
Matt Arsenault 9b0ce8edfa AMDGPU/GlobalISel: Remove extension legality hacks
The legalization has improved since this was added, and the tests
relying on this no longer need it.
2020-02-04 12:50:47 -08:00
Matt Arsenault 5d2749938c AMDGPU/GlobalISel: Custom lower G_FEXP 2020-02-04 11:50:55 -08:00
Matt Arsenault b461436d01 AMDGPU/GlobalISel: Legalize s16 G_FEXP2 2020-02-04 11:50:55 -08:00
Matt Arsenault 1024b73ef5 AMDGPU: Split denormal mode tracking bits
Prepare to accurately track the future denormal-fp-math attribute
changes. The way to actually set these separately is not wired in yet.

This is just a mechanical change, and mostly still assumes the input
and output mode match. This should be refined for some cases. For
example, fcanonicalize lowering should use the flushing variant if
either input or output flushing is enabled
2020-02-04 10:44:21 -08:00
Matt Arsenault cd7650c186 GlobalISel: Implement fewerElementsVector for G_SEXT_INREG
Start using a new strategy with a combination of merge and unmerges.

This allows scalarizing before lowering, which in cases like
<2 x s128> avoids producing giant illegal shifts.
2020-02-03 11:47:33 -08:00
Matt Arsenault c0b12916a7 AMDGPU/GlobalISel: Use more wide vector load/stores
This improves the type breakdown for some large vectors. For example,
we now get a <4 x s32> and s32 store instead of 5 s32 stores for
<5 x s32>.
2020-02-01 10:47:21 -05:00
Matt Arsenault e3117e5c30 AMDGPU/GlobalISel: Improve legalization of wide stores
This fixes legalizations of global stores > 128-bits. It seems work is
needed on how this split actually occurs. For example, we get the
right code for s160, with an s128 and s32 load, but get 5 s32 loads
for <5 x s32>.
2020-02-01 10:47:03 -05:00
Jay Foad 2a1b5af299 [GlobalISel] Tidy up unnecessary calls to createGenericVirtualRegister
Summary:
As a side effect some redundant copies of constant values are removed by
CSEMIRBuilder.

Reviewers: aemerson, arsenm, dsanders, aditya_nandakumar

Subscribers: sdardis, jvesely, wdng, nhaehnle, rovka, hiraditya, jrtc27, atanasyan, volkan, Petar.Avramovic, kerbowa, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D73789
2020-01-31 17:07:16 +00:00
Jay Foad 31e29d4afe AMDGPU/GlobalISel: Make use of MachineIRBuilder helper functions. NFC. 2020-01-31 13:53:39 +00:00
Matt Arsenault ea956685a1 GlobalISel: Implement s32->s64 G_FPTOSI lowering
Port directly from DAG version.

The lowering for G_FPTOUI used to fail on AMDGPU because it uses
G_FPTOSI.
2020-01-30 08:47:07 -05:00
Matt Arsenault b21571f4d5 AMDGPU/GlobalISel: Handle s64->s64 G_FPTOSI/G_FPTOUI 2020-01-30 08:46:37 -05:00
Matt Arsenault 8184176efd AMDGPU/GlobalISel: Custom lower G_LOG/G_LOG10
I'm pretty sure this is wrong and we should expand these in a correct
way, but this matches the existing behavior.
2020-01-30 08:38:50 -05:00
Matt Arsenault 872e899b75 AMDGPU/GlobalISel: Legalize unpacked d16 image operations
On targets that don't have the normal packed f16 layout, handle these
during legalization. Directly modify the register types. We can infer
this was a d16 load based on the mem operand size during selection.

A16 operands should possibly be handled here as well, but don't worry
about that yet.
2020-01-30 08:36:11 -05:00
Matt Arsenault b4a0766c8d AMDGPU/GlobalISel: Select llvm.amdgcn.buffer.atomic.cmpswap 2020-01-30 08:22:43 -05:00
Matt Arsenault c5fffa4da3 GlobalISel: Add observer argument to legalizeIntrinsic
This is passed to legalizeCustom, but not intrinsic. Also remove the
MRI argument, since you can get that from the MachineIRBuilder.

I'm not sure why MachineIRBuilder has a private observer member, and
this is passed separately.
2020-01-29 18:33:45 -05:00
Matt Arsenault 96352e0a1b AMDGPU/GlobalISel: Handle LDS with relocations case 2020-01-29 08:18:55 -08:00
Matt Arsenault c3075e6171 AMDGPU/GlobalISel: Select buffer atomics
The cmpswap handling is incomplete and fails to select.
2020-01-27 15:16:44 -05:00
Matt Arsenault 0eb62d5b3f AMDGPU/GlobalISel: Select llvm.amdgcn.raw.tbuffer.store 2020-01-27 15:16:21 -05:00
Matt Arsenault a69c26a927 AMDGPU/GlobalISel: Select llvm.amdgcn.struct.buffer.store[.format] 2020-01-27 15:00:21 -05:00
Matt Arsenault 533d650e94 AMDGPU/GlobalISel: Move llvm.amdgcn.raw.buffer.store handling
Treat this the same way as loads. There's less value to the
intermediate nodes, but it's good to be consistent.
2020-01-27 14:59:30 -05:00
Matt Arsenault 75d66f8434 AMDGPU/GlobalISel: Select llvm.amdcn.struct.tbuffer.load 2020-01-27 14:42:04 -05:00
Matt Arsenault 09ed0e44d9 AMDGPU/GlobalISel: Select llvm.amdgcn.raw.tbuffer.load 2020-01-27 13:40:37 -05:00
Matt Arsenault 97711228fd AMDGPU/GlobalISel: Select llvm.amdgcn.struct.buffer.load.format 2020-01-27 13:23:35 -05:00
Matt Arsenault ce7ca2caf2 AMDGPU/GlobalISel: Select llvm.amdgcn.struct.buffer.load 2020-01-27 13:05:55 -05:00
Matt Arsenault 198624c39d AMDGPU/GlobalISel: Select llvm.amdgcn.raw.buffer.load.format 2020-01-27 13:02:19 -05:00
Matt Arsenault fc90222a91 AMDGPU/GlobalISel: Select llvm.amdgcn.raw.buffer.load
Use intermediate instructions, unlike with buffer stores. This is
necessary because of the need to have an internal way to distinguish
between signed and unsigned extloads. This introduces some duplication
and near duplication with the buffer store selection path. The store
handling should maybe be moved into legalization to match and
eliminate the duplication.
2020-01-27 12:49:23 -05:00
Matt Arsenault a1d33ce73a AMDGPU/GlobalISel: Custom legalize v2s16 G_SHUFFLE_VECTOR
Try to keep simple v2s16 cases as-is. This will more naturally map to
how the VOP3P op_sel modifiers work compared to the expansion
involving bitcasts and bitshifts.

This could maybe try harder with wider source vector types, although
that could be handled with a pre-legalize combine.
2020-01-27 08:28:05 -08:00
Matt Arsenault 2a160ba5b0 GlobalISel: Reimplement widenScalar for G_UNMERGE_VALUES results
Only use shifts if the requested type exactly matches the source type,
and create sub-unmerges otherwise.
2020-01-27 06:18:26 -08:00
Matt Arsenault a722cbf77c AMDGPU/GlobalISel: Handle atomic_inc/atomic_dec
The intermediate instruction drops the extra volatile argument. We are
missing an atomic ordering on these.
2020-01-22 09:26:17 -05:00
Matt Arsenault e47965bf64 AMDGPU/GlobalISel: Merge trivial legalize rules
Also move constant-like rules together
2020-01-21 17:37:19 -05:00
Matt Arsenault 9a5a6e9465 AMDGPU/GlobalISel: Merge G_PTR_ADD/G_PTR_MASK rules 2020-01-21 16:57:01 -05:00
Matt Arsenault fd109308a7 AMDGPU/GlobalISel: Legalize G_PTR_ADD for arbitrary pointers
Pointers of unrecognized address spaces shoudl be treated as
global-like pointers. Even if loads and stores of them aren't handled,
dumb operations that just operate on the bits should work.
2020-01-21 16:35:36 -05:00
Matt Arsenault e12b840abf AMDGPU/GlobalISel: Improve lowering of G_SEXT_INREG
Clamping the scalar is much better than lowering with superwide shifts
for types > s64.
2020-01-16 14:29:37 -05:00
Matt Arsenault 936483fb7d GlobalISel: Implement lower for G_BITCAST
Bitcast only really applies between scalars and vectors. Implement as
an unmerge and remerge. The test needs to tolerate failure since one
of the unmerges currently fails to legalize.
2020-01-15 08:58:58 -05:00
Matt Arsenault ca19d7a399 AMDGPU/GlobalISel: Fix branch targets when emitting SI_IF
The branch target needs to be changed depending on whether there is an
unconditional branch or not.

Loops also need to be similarly fixed, but compiling a simple testcase
end to end requires another set of patches that aren't upstream yet.
2020-01-13 12:51:05 -05:00
Matt Arsenault bac995d978 AMDGPU/GlobalISel: Clamp G_ZEXT source sizes
Also clamps G_SEXT/G_ANYEXT, but the implementation is more limited so
fewer cases actually work.
2020-01-10 09:42:49 -05:00
Matt Arsenault 0ea3c7291f GlobalISel: Handle llvm.read_register
Compared to the attempt in bdcc6d3d26,
this uses intermediate generic instructions.
2020-01-09 17:37:52 -05:00
Matt Arsenault 35ad66fae8 AMDGPU/GlobalISel: Widen 16-bit shift amount sources
This should be legal, but will require future selection work. 16-bit
shift amounts were already removed from being legal, but this didn't
adjust the transformation rules.
2020-01-09 16:29:44 -05:00
Matt Arsenault 6652cc0cf7 AMDGPU/GlobalISel: Fix scalar G_SELECT for arbitrary pointers
4e85ca9562 missed updating the legal
condition type set for pointers with any unrecognized address space.
2020-01-07 16:36:31 -05:00
Matt Arsenault 52afc93c38 AMDGPU/GlobalISel: Legalize G_READCYCLECOUNTER 2020-01-06 19:16:32 -05:00
Matt Arsenault 4e85ca9562 AMDGPU/GlobalISel: Replace handling of boolean values
This solves selection failures with generated selection patterns,
which would fail due to inferring the SGPR reg bank for virtual
registers with a set register class instead of VCC bank. Use
instruction selection would constrain the virtual register to a
specific class, so when the def was selected later the bank no longer
was set to VCC.

Remove the SCC reg bank. SCC isn't directly addressable, so it
requires copying from SCC to an allocatable 32-bit register during
selection, so these might as well be treated as 32-bit SGPR values.

Now any scalar boolean value that will produce an outupt in SCC should
be widened during RegBankSelect to s32. Any s1 value should be a
vector boolean during selection. This makes the vcc register bank
unambiguous with a normal SGPR during selection.

Summary of how this should now work:

- G_TRUNC is always a no-op, and never should use a vcc bank result.

- SALU boolean operations should be promoted to s32 in RegBankSelect
  apply mapping

- An s1 value means vcc bank at selection. The exception is for
  legalization artifacts that use s1, which are never VCC. All other
  contexts should infer the VCC register classes for s1 typed
  registers. The LLT for the register is now needed to infer the
  correct register class. Extensions with vcc sources should be
  legalized to a select of constants during RegBankSelect.

- Copy from non-vcc to vcc ensures high bits of the input value are
  cleared during selection.

- SALU boolean inputs should ensure the inputs are 0/1. This includes
  select, conditional branches, and carry-ins.

There are a few somewhat dirty details. One is that G_TRUNC/G_*EXT
selection ignores the usual register-bank from register class
functions, and can't handle truncates with VCC result banks. I think
this is OK, since the artifacts are specially treated anyway. This
does require some care to avoid producing cases with vcc. There will
also be no 100% reliable way to verify this rule is followed in
selection in case of register classes, and violations manifests
themselves as invalid copy instructions much later.

Standard phi handling also only considers the bank of the result
register, and doesn't insert copies to make the source banks
match. This doesn't work for vcc, so we have to manually correct phi
inputs in this case. We should add a verifier check to make sure there
are no phis with mixed vcc and non-vcc register bank inputs.

There's also some duplication with the LegalizerHelper, and some code
which should live in the helper. I don't see a good way to share
special knowledge about what types to use for intermediate operations
depending on the bank for example. Using the helper to replace
extensions with selects also seems somewhat awkward to me.

Another issue is there are some contexts calling
getRegBankFromRegClass that apparently don't have the LLT type for the
register, but I haven't yet run into a real issue from this.

This also introduces new unnecessary instructions in most cases, since
we don't yet try to optimize out the zext when the source is known to
come from a compare.
2020-01-06 18:26:42 -05:00
Matt Arsenault f3de8ab5cc GlobalISel: Implement lower for G_INTRINSIC_ROUND
Mostly copied from AMDGPU lowering implementation, except used
G_SITOFP instead of directly creating a select on -1.0, 0.0.
2020-01-06 18:26:42 -05:00
Matt Arsenault ee6b8722ff GlobalISel: Fix unsupported legalize action
This would complain about invalid legalizer rules otherwise.

Mark some operations as unsupported for AMDGPU. This currently seems
to produce the same legalize error as when no rules are defined, but
eventually this should produce a proper user facing error.
2020-01-06 17:21:51 -05:00
Matt Arsenault d12f2a2998 GlobalISel: Scalarize all division operations
This only handled G_SDIV, but they all are trivially scalarizable.

Also define placeholder AMDGPU division legalizer rules.
2020-01-04 13:47:10 -05:00
Matt Arsenault d9b5063b25 AMDGPU/GlobalISel: Legalize more odd sized loads
The attempts to widen sufficently aligned, odd sized loads wasn't
consistently applied.
2020-01-04 12:38:39 -05:00
Matt Arsenault 9fd31fdbd3 GlobalISel: moreElementsVector for FP min/max 2019-12-30 10:39:53 -05:00
Matt Arsenault e088846712 AMDGPU/GlobalISel: Fix extra result register in fdiv64 lowering
There ended up being two result registers, which would fail on
select. It was really defing a new temp register in the correct def
position, instead of the correct result register.
2019-12-27 08:49:43 -05:00
Matt Arsenault df5c2159d0 AMDGPU/GlobalISel: Legalize some 16-bit round instructions 2019-12-24 09:53:01 -05:00
Matt Arsenault 9035fa6b54 AMDGPU/GlobalISel: Lower llvm.amdgcn.else 2019-12-24 09:53:01 -05:00
Jay Foad c7c05b0c8a [AMDGPU] Don't create MachinePointerInfos with an UndefValue pointer
Summary:
The only useful information the UndefValue conveys is the address space,
which MachinePointerInfo can represent directly without referring to an
IR value.

Reviewers: arsenm, rampitec

Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, Petar.Avramovic, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D71838
2019-12-23 15:58:19 +00:00
Matt Arsenault 42a26445f9 AMDGPU/GlobalISel: Fix misuse of div_scale intrinsics
Confusingly, the intrinsic operands do not match the
instruction/custom node. The order is shuffled, and the 3rd operand is
an immediate to select operands.

I'm not 100% sure I did this right, but fdiv still doesn't select end
to end and it will be easier to tell when it does. This at least
avoids an assertion in RegBankSelect and allows hitting the fallback
on selection.
2019-12-21 04:55:36 -05:00
Austin Kerbow f3225f2abe AMDGPU/GlobalISel: Legalize FDIV64
Reviewers: arsenm

Reviewed By: arsenm

Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, rovka, dstuttard, tpr, t-tye, hiraditya, Petar.Avramovic, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D70403
2019-11-19 21:02:27 -08:00
Matt Arsenault ea23b6428b AMDGPU: Be explicit about denormal mode in MIR tests
Start checking the machine function in GlobalISel instead of the
target directly.

This temporarily breaks fcanonicalize selection in GlobalISel.
2019-11-19 19:55:43 +05:30
Matt Arsenault bc276c6379 GlobalISel: Lower s1 source G_SITOFP/G_UITOFP 2019-11-15 13:37:20 +05:30
Daniel Sanders e74c5b9661 [globalisel] Rename G_GEP to G_PTR_ADD
Summary:
G_GEP is rather poorly named. It's a simple pointer+scalar addition and
doesn't support any of the complexities of getelementptr. I therefore
propose that we rename it. There's a G_PTR_MASK so let's follow that
convention and go with G_PTR_ADD

Reviewers: volkan, aditya_nandakumar, bogner, rovka, arsenm

Subscribers: sdardis, jvesely, wdng, nhaehnle, hiraditya, jrtc27, atanasyan, arphaman, Petar.Avramovic, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D69734
2019-11-05 10:31:17 -08:00
Austin Kerbow 2b88b344f2 AMDGPU/GlobalISel: Legalize FDIV32
Reviewers: arsenm

Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, rovka, dstuttard, tpr, t-tye, hiraditya, Petar.Avramovic, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D69581
2019-10-29 17:18:06 -07:00
Matt Arsenault 171cf5302f AMDGPU/GlobalISel: Handle flat/global G_ATOMIC_CMPXCHG
Custom lower this to a target instruction with the merge operands. I
think it might be better to directly select this and emit a
REG_SEQUENCE, but this would be more work since it would require
splitting the tablegen patterns for these cases from the other
atomics.
2019-10-25 13:11:09 -07:00
Austin Kerbow c35b358b74 AMDGPU/GlobalISel: Legalize FDIV16
Reviewers: arsenm

Reviewed By: arsenm

Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, rovka, dstuttard, tpr, t-tye, hiraditya, volkan, Petar.Avramovic, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D69347
2019-10-25 11:07:17 -07:00
Austin Kerbow 97263fa2dd AMDGPU/GlobalISel: Legalize fast unsafe FDIV
Reviewers: arsenm

Reviewed By: arsenm

Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, rovka, dstuttard, tpr, t-tye, hiraditya, Petar.Avramovic, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D69231

llvm-svn: 375460
2019-10-21 22:18:26 +00:00
Matt Arsenault 34ed76e180 GlobalISel: Implement lower for G_SADDO/G_SSUBO
Port directly from SelectionDAG, minus the path using
ISD::SADDSAT/ISD::SSUBSAT.

llvm-svn: 375042
2019-10-16 20:46:32 +00:00
Matt Arsenault 3cd3959fe2 GlobalISel: Implement fewerElementsVector for G_BUILD_VECTOR
Turn it into a G_CONCAT_VECTORS of G_BUILD_VECTOR.

llvm-svn: 374252
2019-10-09 22:44:43 +00:00
Matt Arsenault c8a6df7130 AMDGPU/GlobalISel: Clamp G_SITOFP/G_UITOFP sources
llvm-svn: 373989
2019-10-07 23:33:08 +00:00
Matt Arsenault 4bcdcad91b GlobalISel: Partially implement lower for G_INSERT
llvm-svn: 373946
2019-10-07 19:13:27 +00:00
Matt Arsenault 578fa2819f AMDGPU/GlobalISel: Widen 16-bit G_MERGE_VALUEs sources
Continue making a mess of merge/unmerge legality.

llvm-svn: 373942
2019-10-07 19:05:58 +00:00
Matt Arsenault bcd6b1d209 AMDGPU/GlobalISel: Lower G_ATOMIC_CMPXCHG_WITH_SUCCESS
llvm-svn: 373839
2019-10-06 01:37:37 +00:00
Matt Arsenault a5b9c75674 GlobalISel: Partially implement lower for G_EXTRACT
Turn into shift and truncate. Doesn't yet handle pointers.

llvm-svn: 373838
2019-10-06 01:37:35 +00:00
Matt Arsenault d7cad4fb41 AMDGPU/GlobalISel: Fix using wrong addrspace for aperture
This was always passing the destination flat address space, when it
should be picking between the two valid source options.

llvm-svn: 373716
2019-10-04 08:35:38 +00:00
Matt Arsenault 3d23e58dbe AMDGPU/GlobalISel: Fix mutationIsSane assert v8s8 and
This would try to do FewerElements to v9s8

llvm-svn: 373635
2019-10-03 17:50:29 +00:00
Matt Arsenault 1c135a39aa AMDGPU/GlobalISel: Expand G_BITCAST legality
llvm-svn: 373567
2019-10-03 05:46:08 +00:00
Matt Arsenault 86f864dace AMDGPU/GlobalISel: Use getIntrinsicID helper
llvm-svn: 373417
2019-10-02 01:02:27 +00:00
Matt Arsenault 05aa8a733e AMDGPU/GlobalISel: Legalize 1024-bit G_BUILD_VECTOR
This will be needed to support AGPR operations.

llvm-svn: 373413
2019-10-02 01:02:18 +00:00
Matt Arsenault 9dba603748 AMDGPU/GlobalISel: Increase max legal size to 1024
There are 1024 bit register classes defined for AGPRs. Additionally
OpenCL defines vectors up to 16 x i64, and this helps those tests
legalize.

llvm-svn: 373350
2019-10-01 16:35:06 +00:00
Matt Arsenault fdea5e02ce AMDGPU/GlobalISel: Select s1 src G_SITOFP/G_UITOFP
llvm-svn: 373298
2019-10-01 02:23:20 +00:00
Matt Arsenault 8f6bdb7668 AMDGPU/GlobalISel: Avoid creating shift of 0 in arg lowering
This is sort of papering over the fact that we don't run a combiner
anywhere, but avoiding creating 2 instructions in the first place is
easy.

llvm-svn: 373293
2019-10-01 01:44:46 +00:00
Matt Arsenault 54167ea316 AMDGPU/GlobalISel: Select G_UADDO/G_USUBO
llvm-svn: 373288
2019-10-01 01:23:13 +00:00
Matt Arsenault ed85b0cee6 GlobalISel: Implement widenScalar for G_SITOFP/G_UITOFP sources
Legalize 16-bit G_SITOFP/G_UITOFP for AMDGPU.

llvm-svn: 373287
2019-10-01 01:06:48 +00:00
Matt Arsenault 77ac400117 AMDGPU/GlobalISel: Legalize G_GLOBAL_VALUE
Handle other cases besides LDS. Mostly a straight port of the existing
handling, without the intermediate custom nodes.

llvm-svn: 373286
2019-10-01 01:06:43 +00:00
Matt Arsenault 3ecab8e455 Reapply r372285 "GlobalISel: Don't materialize immarg arguments to intrinsics"
This reverts r372314, reapplying r372285 and the commits which depend
on it (r372286-r372293, and r372296-r372297)

This was missing one switch to getTargetConstant in an untested case.

llvm-svn: 372338
2019-09-19 16:26:14 +00:00
Hans Wennborg 13bdae8541 Revert r372285 "GlobalISel: Don't materialize immarg arguments to intrinsics"
This broke the Chromium build, causing it to fail with e.g.

  fatal error: error in backend: Cannot select: t362: v4i32 = X86ISD::VSHLI t392, Constant:i8<15>

See llvm-commits thread of r372285 for details.

This also reverts r372286, r372287, r372288, r372289, r372290, r372291,
r372292, r372293, r372296, and r372297, which seemed to depend on the
main commit.

> Encode them directly as an imm argument to G_INTRINSIC*.
>
> Since now intrinsics can now define what parameters are required to be
> immediates, avoid using registers for them. Intrinsics could
> potentially want a constant that isn't a legal register type. Also,
> since G_CONSTANT is subject to CSE and legalization, transforms could
> potentially obscure the value (and create extra work for the
> selector). The register bank of a G_CONSTANT is also meaningful, so
> this could throw off future folding and legalization logic for AMDGPU.
>
> This will be much more convenient to work with than needing to call
> getConstantVRegVal and checking if it may have failed for every
> constant intrinsic parameter. AMDGPU has quite a lot of intrinsics wth
> immarg operands, many of which need inspection during lowering. Having
> to find the value in a register is going to add a lot of boilerplate
> and waste compile time.
>
> SelectionDAG has always provided TargetConstant for constants which
> should not be legalized or materialized in a register. The distinction
> between Constant and TargetConstant was somewhat fuzzy, and there was
> no automatic way to force usage of TargetConstant for certain
> intrinsic parameters. They were both ultimately ConstantSDNode, and it
> was inconsistently used. It was quite easy to mis-select an
> instruction requiring an immediate. For SelectionDAG, start emitting
> TargetConstant for these arguments, and using timm to match them.
>
> Most of the work here is to cleanup target handling of constants. Some
> targets process intrinsics through intermediate custom nodes, which
> need to preserve TargetConstant usage to match the intrinsic
> expectation. Pattern inputs now need to distinguish whether a constant
> is merely compatible with an operand or whether it is mandatory.
>
> The GlobalISelEmitter needs to treat timm as a special case of a leaf
> node, simlar to MachineBasicBlock operands. This should also enable
> handling of patterns for some G_* instructions with immediates, like
> G_FENCE or G_EXTRACT.
>
> This does include a workaround for a crash in GlobalISelEmitter when
> ARM tries to uses "imm" in an output with a "timm" pattern source.

llvm-svn: 372314
2019-09-19 12:33:07 +00:00
Matt Arsenault 494243597b AMDGPU/GlobalISel: Select llvm.amdgcn.raw.buffer.store.format
This needs special handling due to some subtargets that have a
nonstandard register layout for f16 vectors

Also reject some illegal types on other targets.

llvm-svn: 372293
2019-09-19 02:35:08 +00:00
Matt Arsenault 67f1f6ff8c AMDGPU/GlobalISel: Select llvm.amdgcn.raw.buffer.store
llvm-svn: 372292
2019-09-19 02:30:27 +00:00
Matt Arsenault 01c7f40de3 AMDGPU/GlobalISel: Legalize s1 source G_[SU]ITOFP
llvm-svn: 371952
2019-09-16 00:37:10 +00:00
Matt Arsenault 9f52c1ea58 AMDGPU/GlobalISel: Select S16->S32 fptoint
llvm-svn: 371950
2019-09-16 00:32:56 +00:00
Matt Arsenault a4be3eff5c AMDGPU/GlobalISel: Legalize s32->s16 G_SITOFP/G_UITOFP
llvm-svn: 371811
2019-09-13 04:04:55 +00:00
Matt Arsenault f457dd2bd4 AMDGPU/GlobalISel: Legalize G_FFLOOR
llvm-svn: 371803
2019-09-13 01:48:15 +00:00
Matt Arsenault 4d33918034 AMDGPU/GlobalISel: Legalize G_FMAD
Unlike SelectionDAG, treat this as a normally legalizable operation.
In SelectionDAG this is supposed to only ever formed if it's legal,
but I've found that to be restricting. For AMDGPU this is contextually
legal depending on whether denormal flushing is allowed in the use
function.

Technically we currently treat the denormal mode as a subtarget
feature, so custom lowering could be avoided. However I consider this
to be a defect, and this should be contextually dependent on the
controllable rounding mode of the parent function.

llvm-svn: 371800
2019-09-13 00:44:35 +00:00
Matt Arsenault e1895aba3d AMDGPU/GlobalISel: Select G_FABS/G_FNEG
f64 doesn't work yet because tablegen currently doesn't handlde
REG_SEQUENCE.

This does regress some multi use VALU fneg cases since now the
immediate remains in an SGPR, and more moves are used for legalizing
the xor. This is a SIFixSGPRCopies deficiency.

llvm-svn: 371540
2019-09-10 17:19:46 +00:00
Matt Arsenault da027275c6 AMDGPU/GlobalISel: RegBankSelect for G_ZEXTLOAD/G_SEXTLOAD
llvm-svn: 371536
2019-09-10 16:42:37 +00:00
Matt Arsenault ad6a8b83cd AMDGPU/GlobalISel: Legalize constant 32-bit loads
Legalize by casting to a 64-bit constant address. This isn't how the
DAG implements it, but it should.

llvm-svn: 371535
2019-09-10 16:42:31 +00:00
Matt Arsenault c0ceca5883 AMDGPU/GlobalISel: First pass at attempting to legalize load/stores
There's still a lot more to do, but this handles decomposing due to
alignment. I've gotten it to the point where nothing crashes or
infinite loops the legalizer.

llvm-svn: 371533
2019-09-10 16:20:14 +00:00
Matt Arsenault a91f017ae3 AMDGPU/GlobalISel: Fix insert point when lowering fminnum/fmaxnum
llvm-svn: 371471
2019-09-09 23:30:11 +00:00
Austin Kerbow 06c8cb03ca AMDGPU/GlobalISel: Rename MIRBuilder to B. NFC
Reviewers: arsenm

Reviewed By: arsenm

Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, rovka, dstuttard, tpr, t-tye, hiraditya, Petar.Avramovic, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D67374

llvm-svn: 371467
2019-09-09 23:06:13 +00:00
Matt Arsenault a0933e6df7 AMDGPU/GlobalISel: Legalize G_BUILD_VECTOR v2s16
Handle it the same way as G_BUILD_VECTOR_TRUNC. Arguably only
G_BUILD_VECTOR_TRUNC should be legal for this, but G_BUILD_VECTOR will
probably be more convenient in most cases.

llvm-svn: 371440
2019-09-09 18:57:51 +00:00
Matt Arsenault 64ecca90d4 AMDGPU/GlobalISel: Implement LDS G_GLOBAL_VALUE
Handle the simple case that lowers to a constant.

llvm-svn: 371424
2019-09-09 17:13:44 +00:00
Matt Arsenault 182f9248e8 AMDGPU/GlobalISel: Legalize G_BUILD_VECTOR_TRUNC
Treat this as legal on gfx9 since it can use S_PACK_* instructions for
this.

This isn't used by anything yet. The same will probably apply to
16-bit G_BUILD_VECTOR without the trunc.

llvm-svn: 371423
2019-09-09 17:04:18 +00:00
Matt Arsenault c34b4036ff AMDGPU/GlobalISel: Select G_PTR_MASK
llvm-svn: 371412
2019-09-09 15:46:13 +00:00
Matt Arsenault 8e3bc9b572 AMDGPU/GlobalISel: Legalize wavefrontsize intrinsic
llvm-svn: 371407
2019-09-09 15:20:49 +00:00
Matt Arsenault f581d575ce AMDGPU: Add intrinsics for address space identification
The library currently uses ptrtoint and directly checks the queue ptr
for this, which counts as a pointer capture.

llvm-svn: 371009
2019-09-05 02:20:39 +00:00
Matt Arsenault 69b1a2ae65 AMDGPU/GlobalISel: Restore insert point when getting aperture
Avoids SSA violations in a future patch.

llvm-svn: 371008
2019-09-05 02:20:32 +00:00
Matt Arsenault 25156ae7ea AMDGPU/GlobalISel: Fix placeholder value used for addrspacecast
llvm-svn: 371007
2019-09-05 02:20:29 +00:00
Matt Arsenault 5ff310e298 GlobalISel: Add basic legalization for G_BITREVERSE
llvm-svn: 370979
2019-09-04 20:46:15 +00:00
Matt Arsenault d9af712da4 AMDGPU/GlobalISel: Make 16-bit constants legal
This is mostly for the benefit of patterns which use 16-bit constants.

llvm-svn: 370921
2019-09-04 16:19:45 +00:00
Reid Kleckner fe47ed67fc Fix the build for MSVC builds using M_PI
llvm-svn: 370405
2019-08-29 20:32:53 +00:00
Matt Arsenault cbd1782c79 AMDGPU/GlobalISel: Legalize sin/cos
llvm-svn: 370402
2019-08-29 20:06:48 +00:00
Matt Arsenault 5c7e96dc26 AMDGPU/GlobalISel: Implement addrspacecast for 32-bit constant addrspace
llvm-svn: 370140
2019-08-28 00:58:24 +00:00
Matt Arsenault 954a012b4c GlobalISel: Implement moreElementsVector for G_UNMERGE_VALUES sources
This is necessary for handling <3 x s16> on AMDGPU, assuming this
should be handled as 2 separate legalization actions. The alternative
would be for fewerElementsVector to handle 3->2.

llvm-svn: 369547
2019-08-21 16:59:10 +00:00
Matt Arsenault 28215caa60 GlobalISel: Partially implement fewerElementsVector G_UNMERGE_VALUES
Odd sized vectors aren't handled yet.

llvm-svn: 368713
2019-08-13 16:26:28 +00:00
Matt Arsenault 690645bda0 GlobalISel: Implement lower for G_SHUFFLE_VECTOR
llvm-svn: 368709
2019-08-13 16:09:07 +00:00
Daniel Sanders e9a57c2b23 [globalisel] Add G_SEXT_INREG
Summary:
Targets often have instructions that can sign-extend certain cases faster
than the equivalent shift-left/arithmetic-shift-right. Such cases can be
identified by matching a shift-left/shift-right pair but there are some
issues with this in the context of combines. For example, suppose you can
sign-extend 8-bit up to 32-bit with a target extend instruction.
  %1:_(s32) = G_SHL %0:_(s32), i32 24 # (I've inlined the G_CONSTANT for brevity)
  %2:_(s32) = G_ASHR %1:_(s32), i32 24
  %3:_(s32) = G_ASHR %2:_(s32), i32 1
would reasonably combine to:
  %1:_(s32) = G_SHL %0:_(s32), i32 24
  %2:_(s32) = G_ASHR %1:_(s32), i32 25
which no longer matches the special case. If your shifts and extend are
equal cost, this would break even as a pair of shifts but if your shift is
more expensive than the extend then it's cheaper as:
  %2:_(s32) = G_SEXT_INREG %0:_(s32), i32 8
  %3:_(s32) = G_ASHR %2:_(s32), i32 1
It's possible to match the shift-pair in ISel and emit an extend and ashr.
However, this is far from the only way to break this shift pair and make
it hard to match the extends. Another example is that with the right
known-zeros, this:
  %1:_(s32) = G_SHL %0:_(s32), i32 24
  %2:_(s32) = G_ASHR %1:_(s32), i32 24
  %3:_(s32) = G_MUL %2:_(s32), i32 2
can become:
  %1:_(s32) = G_SHL %0:_(s32), i32 24
  %2:_(s32) = G_ASHR %1:_(s32), i32 23

All upstream targets have been configured to lower it to the current
G_SHL,G_ASHR pair but will likely want to make it legal in some cases to
handle their faster cases.

To follow-up: Provide a way to legalize based on the constant. At the
moment, I'm thinking that the best way to achieve this is to provide the
MI in LegalityQuery but that opens the door to breaking core principles
of the legalizer (legality is not context sensitive). That said, it's
worth noting that looking at other instructions and acting on that
information doesn't violate this principle in itself. It's only a
violation if, at the end of legalization, a pass that checks legality
without being able to see the context would say an instruction might not be
legal. That's a fairly subtle distinction so to give a concrete example,
saying %2 in:
  %1 = G_CONSTANT 16
  %2 = G_SEXT_INREG %0, %1
is legal is in violation of that principle if the legality of %2 depends
on %1 being constant and/or being 16. However, legalizing to either:
  %2 = G_SEXT_INREG %0, 16
or:
  %1 = G_CONSTANT 16
  %2:_(s32) = G_SHL %0, %1
  %3:_(s32) = G_ASHR %2, %1
depending on whether %1 is constant and 16 does not violate that principle
since both outputs are genuinely legal.

Reviewers: bogner, aditya_nandakumar, volkan, aemerson, paquette, arsenm

Subscribers: sdardis, jvesely, wdng, nhaehnle, rovka, kristof.beyls, javed.absar, hiraditya, jrtc27, atanasyan, Petar.Avramovic, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D61289

llvm-svn: 368487
2019-08-09 21:11:20 +00:00