Commit Graph

94 Commits

Author SHA1 Message Date
Ivan Kosarev 432cbd7827 [AMDGPU][CodeGen] Support (register + immediate) SMRD offsets.
Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D129381
2022-07-18 11:29:31 +01:00
Piotr Sobczak 4874838a63 [AMDGPU] gfx11 WMMA instruction support
gfx11 introduces new WMMA (Wave Matrix Multiply-accumulate)
instructions.

Reviewed By: arsenm, #amdgpu

Differential Revision: https://reviews.llvm.org/D128756
2022-06-30 11:13:45 -04:00
Joe Nash 20d20156f4 [AMDGPU] gfx11 VINTERP intrinsics and ISel support
Depends on D127664

Reviewed By: rampitec, #amdgpu

Differential Revision: https://reviews.llvm.org/D127756
2022-06-17 09:16:59 -04:00
Joe Nash 2d43de13df [AMDGPU] gfx11 new dot instruction codegen support
Reviewed By: rampitec, #amdgpu

Differential Revision: https://reviews.llvm.org/D127904
2022-06-16 14:19:34 -04:00
Stanislav Mekhanoshin c4500de255 [AMDGPU] gfx940: disable OP_SEL on V_DOT instructions
Differential Revision: https://reviews.llvm.org/D121634
2022-03-14 17:02:00 -07:00
Stanislav Mekhanoshin 36fe3f13a9 [AMDGPU] flat scratch SVS addressing mode for gfx940
Both VADDR and SADDR are used in SVS mode.

Differential Revision: https://reviews.llvm.org/D121254
2022-03-14 15:23:36 -07:00
Sebastian Neubauer 6527b2a4d5 [AMDGPU][NFC] Fix typos
Fix some typos in the amdgpu backend.

Differential Revision: https://reviews.llvm.org/D119235
2022-02-18 15:05:21 +01:00
Julien Pages dcb2da13f1 [AMDGPU] Add a new intrinsic to control fp_trunc rounding mode
Add a new llvm.fptrunc.round intrinsic to precisely control
the rounding mode when converting from f32 to f16.

Differential Revision: https://reviews.llvm.org/D110579
2022-02-11 12:08:23 -05:00
Petar Avramovic 0b34ffe4a6 AMDGPU/GlobalISel: Add clamp combine
Add clamp combine. Source is fminnum(fmaxnum(Val, 0.0), 1.0) or
fmaxnum(fminnum(Val, 1.0), 0.0) or fmed3 intrinsic with 0.0 and
1.0 as two out of three operands.

Differential Revision: https://reviews.llvm.org/D90052
2021-12-03 12:49:39 +01:00
Petar Avramovic ec54867d75 AMDGPU/GlobalISel: Add floating point med3 combine
Add floating point version of med3 combine.
Source is fminnum(fmaxnum(Val, K0), K1) or fmaxnum(fminnum(Val, K1), K0)
where K0 and K1 are constants and K0 <= K1.

Differential Revision: https://reviews.llvm.org/D90051
2021-12-03 12:49:39 +01:00
Jay Foad d77b43c385 [AMDGPU][GlobalISel] Add G_AMDGPU_FFBL_B32
This is the counterpart to G_AMDGPU_FFBH_U32 which already exists. These
instructions have a defined result of -1 when the input is zero.

Differential Revision: https://reviews.llvm.org/D107441
2021-08-06 09:40:48 +01:00
Matt Arsenault d7d2e4545e AMDGPU/GlobalISel: Fix selecting G_SEXTLOAD/G_ZEXTLOAD pre-gfx9
The patterns for the m0 glue patterns were failing to import.
2021-07-27 15:56:42 -04:00
Petar Avramovic 4a9bc59867 AMDGPU/GlobalISel: Add integer med3 combines
Add signed and unsigned integer version of med3 combine.
Source pattern is min(max(Val, K0), K1) or max(min(Val, K1), K0)
where K0 and K1 are constants and K0 <= K1. Destination is med3
that corresponds to signedness of min/max in source.

Differential Revision: https://reviews.llvm.org/D90050
2021-04-27 11:52:23 +02:00
Sebastian Neubauer cc7add5298 [AMDGPU] Use SIInstrFlags for flat variants. NFC
Use SIInstrFlags to differentiate between the different
variants of flat instructions (flat, global and scratch).
This should make it easier to bundle the immediate offset logic in a
single place and implement restrictions and bug workarounds.

Fixed version of D99587, which does not rely on the address space.

Differential Revision: https://reviews.llvm.org/D99743
2021-04-09 12:28:36 +02:00
Stanislav Mekhanoshin edd6da10d2 [AMDGPU] Remove cpol, tfe, and swz from MUBUF patterns
These are always selected as 0 anyway.

Differential Revision: https://reviews.llvm.org/D98663
2021-03-18 14:36:04 -07:00
Stanislav Mekhanoshin 3bffb1cd0e [AMDGPU] Use single cache policy operand
Replace individual operands GLC, SLC, and DLC with a single cache_policy
bitmask operand. This will reduce the number of operands in MIR and I hope
the amount of code. These operands are mostly 0 anyway.

Additional advantage that parser will accept these flags in any order unlike
now.

Differential Revision: https://reviews.llvm.org/D96469
2021-03-15 13:00:59 -07:00
Stanislav Mekhanoshin a8d9d50762 [AMDGPU] gfx90a support
Differential Revision: https://reviews.llvm.org/D96906
2021-02-17 16:01:32 -08:00
Jay Foad 7e9ceed9a2 [TableGen][GlobalISel] Allow duplicate RendererFns
Allow different GICustomOperandRenderers to use the same RendererFn.
This avoids the need for targets to define a bunch of identical C++
renderer functions with different names.

Without this fix TableGen would have emitted code that tried to define
the GICR enumeration with duplicate enumerators.

Differential Revision: https://reviews.llvm.org/D96587
2021-02-12 15:05:32 +00:00
Thomas Symalla f89f6d1e5d [AMDGPU]: Fixes an invalid clamp selection pattern.
When running the tests on PowerPC and x86, the lit test GlobalISel/trunc.ll fails at the memory sanitize step. This seems to be due to wrong invalid logic (which matches even if it shouldn't) and likely missing variable initialisation."

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D95878
2021-02-08 13:06:30 +01:00
Sebastian Neubauer d49efdc969 Revert "[AMDGPU] Add a new Clamp Pattern to the GlobalISel Path."
This reverts commits 62af0305b7cc..677a3529d3e6 from D93708.
They cause failures in the sanitizer builds because of uninitialized
values.

A fix is in D95878, but it might take some time until this is pushed,
so reverting the changes for now.
2021-02-03 11:03:34 +01:00
Thomas Symalla 602896b9d2 Renamed med3 opcode, removed superfluous copy. 2021-02-02 09:14:54 +01:00
Thomas Symalla c781c25412 Implemented a MED3_S32 GIR opcode. 2021-02-02 09:14:53 +01:00
Thomas Symalla 6604d81e1b Added and used new target pseudo for v_cvt_pk_i16_i32, changes due to code review. 2021-02-02 09:14:53 +01:00
Stanislav Mekhanoshin d15119a02d [AMDGPU][GlobalISel] GlobalISel for flat scratch
It does not seem to fold offsets but this is not specific
to the flat scratch as getPtrBaseWithConstantOffset() does
not return the split for these tests unlike its SDag
counterpart.

Differential Revision: https://reviews.llvm.org/D93670
2020-12-22 16:33:06 -08:00
Mirko Brkusanin d17ea67b92 [AMDGPU][GlobalISel] Fix 96 and 128 local loads and stores
Fix local ds_read/write_b96/b128 so they can be selected if the alignment
allows. Otherwise, either pick appropriate ds_read2/write2 instructions or break
them down.

Differential Revision: https://reviews.llvm.org/D81638
2020-08-21 12:26:31 +02:00
Matt Arsenault a9ee0589a8 AMDGPU/GlobalISel: Match global saddr addressing mode 2020-08-17 15:48:06 -04:00
Matt Arsenault 8cb022982a AMDGPU: Remove redundant FLAT complex patterns
These were identical to the non-atomic cases. I'm not sure why these
were ever separated.
2020-08-15 12:12:01 -04:00
Matt Arsenault 53f21e0fb7 TableGen/GlobalISel: Hack the operand order for atomic_store
ISD::ATOMIC_STORE arbitrarily has the operands in the opposite order
from regular ISD::STORE, which always introduced an annoying
duplication of patterns to handle both cases. Since in GlobalISel
there's just the one G_STORE, we need to swap the operands to
correctly emit the type check for the pointer operand.

Some work started in 20aafa3156 to
migrate SelectionDAG to use ISD::STORE for atomics, but that work
seems to have stalled. Since this is the pretty much the last
operation which matters which isn't supported for AMDGPU, use this
compatibility hack to unblock declaring it functionally complete.

Not sure what's going on with the pending_phis AArch64 test. It seems
it didn't always use atomics, and I'm not sure what it was originally
testing matters anymore.
2020-08-11 10:22:44 -04:00
Matt Arsenault 1a0c0944c6 AMDGPU: Define raw/struct variants of buffer atomic fadd
Somehow the new FP atomic buffer intrinsics ended up using the legacy
style for buffer intrinsics.
2020-08-06 13:36:19 -04:00
Matt Arsenault 63cdc9a49f AMDGPU/GlobalISel: Handle llvm.amdgcn.ds.{fadd|fmin|fmax}
These intrinsics are missing mangling for both the pointer and data
type.
2020-08-06 11:09:08 -04:00
Matt Arsenault dcf3ffb0a8 AMDGPU/GlobalISel: Move frame index selection to patterns
Doesn't really save any code until global value is handled too.
2020-08-06 10:42:15 -04:00
Matt Arsenault a4edc04693 AMDGPU/GlobalISel: Use clamp modifier for [us]addsat/[us]subsat
We also have never handled this for SelectionDAG, which needs
additional work.
2020-07-28 11:18:05 -04:00
Kazuaki Ishizaki 0312b9f550 [llvm] NFC: Fix trivial typo in rst and td files
Differential Revision: https://reviews.llvm.org/D77469
2020-04-23 14:26:32 +09:00
Matt Arsenault b27d255e1e AMDGPU/GlobalISel: Form CVT_F32_UBYTE0 2020-03-30 17:45:55 -04:00
Matt Arsenault 15bf916b54 AMDGPU: Remove VOP3OpSelMods0 complex pattern
Use default operand of 0 instead.
2020-03-04 17:18:22 -05:00
Matt Arsenault dfce5fd50a AMDGPU/GlobalISel: Select VOP3P instructions
This only handles the basic cases. More work is needed to make better
use of op_sel.
2020-02-21 13:35:40 -05:00
Matt Arsenault 86813e2768 AMDGPU/GlobalISel: Select llvm.amdgcn.s.buffer.load
Doesn't try to fail on the dlc bit pre-gfx10 like the DAG lowering
does.
2020-02-17 08:02:40 -08:00
Matt Arsenault 9ec668606b AMDGPU: Add option to disable CGP division expansion
The division expansions in AMDGPUCodeGenPrepare can't be relied on for
correctness, since they punt to later optimization and possibly
legalization in some cases. We still need a way to be able to write
tests for the legalizer versions of the expansion. This is mostly for
GlobalISel, since the expected optimzations is expecting aren't
implemented.

The interaction with the flag to expand 64-bit division in the IR is
pretty confusing, but these flags have different purposes.
2020-02-14 11:37:07 -08:00
Matt Arsenault 045a8921d7 AMDGPU/GlobalISel: Select G_CTLZ_ZERO_UNDEF
Directly select this rather than going through the intermediate
instruction, which may provide some combine value in the future.
2020-02-12 16:19:45 -08:00
Matt Arsenault 6fb544d1d2 AMDGPU/GlobalISel: Combine FMIN_LEGACY/FMAX_LEGACY
Try out using combine definition rules.

This really should be a post-legalizer combine, but the combiner pass
is currently pre-legalize. Most of the target combines are really
post-legalize, so we should probably move the pass.
2020-01-31 06:58:04 -08:00
Matt Arsenault 49e424e08e AMDGPU/GlobalISel: Select global MUBUF atomicrmw 2020-01-31 06:05:41 -08:00
Austin Kerbow 2605adb69c [AMDGPU][GlobalISel] Select 8-byte LDS Ops with 4-byte alignment
Reviewers: arsenm

Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, rovka, dstuttard, tpr, t-tye, hiraditya, Petar.Avramovic, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D73585
2020-01-29 10:42:12 -08:00
Matt Arsenault c3075e6171 AMDGPU/GlobalISel: Select buffer atomics
The cmpswap handling is incomplete and fails to select.
2020-01-27 15:16:44 -05:00
Matt Arsenault 0eb62d5b3f AMDGPU/GlobalISel: Select llvm.amdgcn.raw.tbuffer.store 2020-01-27 15:16:21 -05:00
Matt Arsenault 533d650e94 AMDGPU/GlobalISel: Move llvm.amdgcn.raw.buffer.store handling
Treat this the same way as loads. There's less value to the
intermediate nodes, but it's good to be consistent.
2020-01-27 14:59:30 -05:00
Matt Arsenault 09ed0e44d9 AMDGPU/GlobalISel: Select llvm.amdgcn.raw.tbuffer.load 2020-01-27 13:40:37 -05:00
Matt Arsenault 198624c39d AMDGPU/GlobalISel: Select llvm.amdgcn.raw.buffer.load.format 2020-01-27 13:02:19 -05:00
Matt Arsenault fc90222a91 AMDGPU/GlobalISel: Select llvm.amdgcn.raw.buffer.load
Use intermediate instructions, unlike with buffer stores. This is
necessary because of the need to have an internal way to distinguish
between signed and unsigned extloads. This introduces some duplication
and near duplication with the buffer store selection path. The store
handling should maybe be moved into legalization to match and
eliminate the duplication.
2020-01-27 12:49:23 -05:00
Matt Arsenault e60d658260 AMDGPU/GlobalISel: Handle VOP3NoMods 2020-01-27 09:03:44 -08:00
Matt Arsenault ac0b9b4ccf AMDPGPU/GlobalISel: Select more MUBUF global addressing modes
The handling of the high bits of the resource descriptor seem weird to
me, where the 3rd dword changes based on the instruction.
2020-01-27 07:28:36 -08:00