llvm-project

Commit Graph

Author	SHA1	Message	Date
Piotr Sobczak	4874838a63	[AMDGPU] gfx11 WMMA instruction support gfx11 introduces new WMMA (Wave Matrix Multiply-accumulate) instructions. Reviewed By: arsenm, #amdgpu Differential Revision: https://reviews.llvm.org/D128756	2022-06-30 11:13:45 -04:00
Jay Foad	ff85d61a6e	Update *_TMPRING_SIZE.WAVESIZE for GFX11 The encoding of COMPUTE_TMPRING_SIZE.WAVESIZE and SPI_TMPRING_SIZE.WAVESIZE has changed in GFX11: it is now in units of 64 dwords instead of 256 dwords, and the field has been widened from 13 bits to 15 bits. Depends on D126989 Reviewed By: rampitec, arsenm, #amdgpu Differential Revision: https://reviews.llvm.org/D127248	2022-06-10 13:24:00 -04:00
Joe Nash	e243ead6fc	Reland [AMDGPU] gfx11 vop3dpp instructions There was an issue with encoding wide (>64 bit) instructions on BigEndian hosts, which is fixed in D127195. Therefore reland this. gfx11 adds the ability to use dpp modifiers on vop3 instructions. This patch adds machine code layer support for that. The MCCodeEmitter is changed to use APInt instead of uint64_t to support these wider instructions. Patch 16/N for upstreaming of AMDGPU gfx11 architecture Differential Revision: https://reviews.llvm.org/D126483	2022-06-07 14:49:13 -04:00
Joe Nash	eaed07eb7e	Revert "[AMDGPU] gfx11 vop3dpp instructions" This reverts commit `99a83b1286`.	2022-06-06 17:12:09 -04:00
Joe Nash	99a83b1286	[AMDGPU] gfx11 vop3dpp instructions gfx11 adds the ability to use dpp modifiers on vop3 instructions. This patch adds machine code layer support for that. The MCCodeEmitter is changed to use APInt instead of uint64_t to support these wider instructions. Patch 16/N for upstreaming of AMDGPU gfx11 architecture Depends on D126475 Reviewed By: rampitec, #amdgpu Differential Revision: https://reviews.llvm.org/D126483	2022-06-06 09:34:59 -04:00
Joe Nash	ef1ea5ac01	[AMDGPU] gfx11 vinterp instructions MC support A new instruction encoding. Some of these instructions were previously VOP3 encoded. Contributors: Carl Ritson <carl.ritson@amd.com> Patch 11/N for upstreaming of AMDGPU gfx11 architecture. Depends on D125824 Reviewed By: critson Differential Revision: https://reviews.llvm.org/D125989	2022-05-25 14:59:16 -04:00
Joe Nash	1a51ab766f	[AMDGPU] gfx11 export instructions Contributors: Jay Foad <jay.foad@amd.com> Dmitry Preobrazhensky <d-pre@mail.ru> Patch 10/N for upstreaming of AMDGPU gfx11 architecture. Depends on D125822 Reviewed By: dp Differential Revision: https://reviews.llvm.org/D125824	2022-05-25 14:44:09 -04:00
Joe Nash	729467acef	[AMDGPU] gfx11 LDSDIR instructions MC support Contributors: Carl Ritson <carl.ritson@amd.com> Patch 8/N for upstreaming of AMDGPU gfx11 architecture. Depends on D125498 Reviewed By: critson, rampitec, #amdgpu Differential Revision: https://reviews.llvm.org/D125820	2022-05-19 10:08:47 -04:00
Dmitry Preobrazhensky	32ca9bd7b5	[AMDGPU][MC][GFX940] Correct tied operand decoding for smfmac opcodes Differential Revision: https://reviews.llvm.org/D125790	2022-05-18 15:39:30 +03:00
Joe Nash	d21b9b4946	[AMDGPU] gfx11 scalar alu instructions MC layer support for SOP(scalar alu operations) including encoding support for s_delay_alu and s_sendmsg_rtn. Contributors: Jay Foad <jay.foad@amd.com> Patch 7/N for upstreaming of AMDGPU gfx11 architecture. Depends on D125319 Reviewed By: #amdgpu, arsenm Differential Revision: https://reviews.llvm.org/D125498	2022-05-17 13:35:41 -04:00
Joe Nash	c70259405c	[AMDGPU] gfx11 BUF Instructions Includes MachineCode layer support and tests, and MIR tests not requiring CodeGen pass changes. Includes a small change in SMInstructions.td to correct encoded bits. Contributors: Petar Avramovic <Petar.Avramovic@amd.com> Dmitry Preobrazhensky <dmitry.preobrazhensky@amd.com> Depends on D125316 Patch 6/N for upstreaming of AMDGPU gfx11 architecture. Reviewed By: dp, Petar.Avramovic Differential Revision: https://reviews.llvm.org/D125319	2022-05-16 09:41:40 -04:00
Changpeng Fang	8edaf25986	AMDGPU: Emit metadata for the hidden_multigrid_sync_arg conditionally Summary: Introduce a new function attribute, amdgpu-no-multigrid-sync-arg, which is default. We use implicitarg_ptr + offset to check whether the multigrid synchronization pointer is used. If yes, we remove this attribute and also remove amdgpu-no-implicitarg-ptr. We generate metadata for the hidden_multigrid_sync_arg only when the amdgpu-no-multigrid-sync-arg attribute is removed from the function. Reviewers: arsenm, sameerds, b-sumner and foad Differential Revision: https://reviews.llvm.org/D123548	2022-04-12 12:36:30 -07:00
Dmitry Preobrazhensky	1d817a1448	[AMDGPU][MC][NFC] Refactored sendmsg(...) handling Differential Revision: https://reviews.llvm.org/D121995	2022-03-21 15:37:30 +03:00
Changpeng Fang	dd5895cc39	AMDGPU: Use the implicit kernargs for code object version 5 Summary: Specifically, for trap handling, for targets that do not support getDoorbellID, we load the queue_ptr from the implicit kernarg, and move queue_ptr to s[0:1]. To get aperture bases when targets do not have aperture registers, we load private_base or shared_base directly from the implicit kernarg. In clang, we use implicitarg_ptr + offsets to implement __builtin_amdgcn_workgroup_size_{xyz}. Reviewers: arsenm, sameerds, yaxunl Differential Revision: https://reviews.llvm.org/D120265	2022-03-17 14:12:36 -07:00
Dmitry Preobrazhensky	5977dfba64	[AMDGPU][MC][NFC] Refactored custom operands handling The original design of custom operands support assumed that most GPUs have the same or very similar operand names end encodings. This is no longer the case. As a result the support code becomes over-complicated and difficult to maintain. This change implements a different design with the following benefits: - support of aliases; - support of operands with overlapped encodings; - identification of defined but unsupported operands. Differential Revision: https://reviews.llvm.org/D121696	2022-03-16 16:04:55 +03:00
Stanislav Mekhanoshin	8dd3d1cf1f	[AMDGPU] Add symbolic names for gfx940 HWREGs The namespaces of HWREGs is now overlapping with gfx10. Thus the patch is longer than necessary to just support new names. It also need to handle proper error messages, i.e. to issue a "specified hardware register is not supported on this GPU" message. This may need a major refactoring in the future. Differential Revision: https://reviews.llvm.org/D121418	2022-03-14 16:13:33 -07:00
Stanislav Mekhanoshin	8992b50e2f	[AMDGPU] gfx940 uses new names for coherency bits Differential Revision: https://reviews.llvm.org/D120855	2022-03-07 11:50:07 -08:00
Stanislav Mekhanoshin	d3b87e4a1c	[AMDGPU] HWRegs TMA and TBA also supported on gfx9 Differential Revision: https://reviews.llvm.org/D118860	2022-02-03 09:36:10 -08:00
Dmitry Preobrazhensky	c7ca4c6365	[AMDGPU][GFX10][MC] Updated symbolic names of internal HW registers GFX10 no longer support HW_ID. It has been replaced with HW_ID1 and HW_ID2. See bug 52904: https://github.com/llvm/llvm-project/issues/52904 Differential Revision: https://reviews.llvm.org/D117313	2022-01-17 20:29:10 +03:00
Christudasan Devadasan	399b7de0ea	[AMDGPU] Add a regclass flag for scalar registers Along with vector RC flags, this scalar flag will make various regclass queries like `isVGPR` more accurate. Regclasses other than vectors are currently set with the new flag even though certain unallocatable classes aren't truly scalars. It would be ok as long as they remain unallocatable. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D110053	2021-12-01 23:31:07 -05:00
Joe Nash	b4b7e605a6	[AMDGPU] Support shared literals in FMAMK/FMAAK These instructions should allow src0 to be a literal with the same value as the mandatory other literal. Enable it by introducing an operand that defers adding its value to the MI when decoding till the mandatory literal is parsed. Reviewed By: dp, foad Differential Revision: https://reviews.llvm.org/D111067 Change-Id: I22b0ae0d35bad17b6f976808e48bffe9a6af70b7	2021-10-11 13:09:54 -04:00
Christudasan Devadasan	4dab15288d	[AMDGPU] Introduce RC flags for vector register classes Configure and use the TSFlags in TargetRegisterClass to have unique flags for VGPR and AGPR register classes. The vector register class queries like `hasVGPRs` will now become more efficient with just a bitwise operation. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D108815	2021-09-01 02:55:45 -04:00
Daniil Fukalov	48958d02d2	[NFC][AMDGPU] Reduce includes dependencies. 1. Splitted out some parts of R600 target to separate modules/headers. 2. Reduced some include lists in headers. 3. Found and fixed issue with override `GCNTargetMachine::getSubtargetImpl()` and `R600TargetMachine::getSubtargetImpl()` had different return value type than base class. 4. Minor forward declarations cleanup. Reviewed By: foad Differential Revision: https://reviews.llvm.org/D108596	2021-08-25 12:01:55 +03:00
Sebastian Neubauer	36138db116	[AMDGPU] IsFlatScratch/Global -> FlatScratch/Global Remove 'Is' from IsFlatScratch/Global. NFC Differential Revision: https://reviews.llvm.org/D100108	2021-04-09 11:20:31 +02:00
Jay Foad	9d08f276d7	[AMDGPU] Use reductions instead of scans in the atomic optimizer If the result of an atomic operation is not used then it can be more efficient to build a reduction across all lanes instead of a scan. Do this for GFX10, where the permlanex16 instruction makes it viable. For wave64 this saves a couple of dpp operations. For wave32 it saves one readlane (which are generally bad for performance) and one dpp operation. Differential Revision: https://reviews.llvm.org/D98953	2021-03-26 15:38:14 +00:00
Stanislav Mekhanoshin	3bffb1cd0e	[AMDGPU] Use single cache policy operand Replace individual operands GLC, SLC, and DLC with a single cache_policy bitmask operand. This will reduce the number of operands in MIR and I hope the amount of code. These operands are mostly 0 anyway. Additional advantage that parser will accept these flags in any order unlike now. Differential Revision: https://reviews.llvm.org/D96469	2021-03-15 13:00:59 -07:00
Dmitry Preobrazhensky	28f164bca7	[AMDGPU][MC][GFX9+] Corrected encoding of op_sel_hi for unused operands in VOP3P Corrected encoding of VOP3P op_sel_hi for unused operands. See bug 49363. Differential Revision: https://reviews.llvm.org/D97689	2021-03-02 13:02:25 +03:00
Jay Foad	67f0620831	[AMDGPU] Update s_sendmsg messages Update the list of s_sendmsg messages known to the assembler and disassembler and validate the ones that were added or removed in gfx9 and gfx10. Differential Revision: https://reviews.llvm.org/D97295	2021-02-24 13:07:00 +00:00
Stanislav Mekhanoshin	a8d9d50762	[AMDGPU] gfx90a support Differential Revision: https://reviews.llvm.org/D96906	2021-02-17 16:01:32 -08:00
Stanislav Mekhanoshin	5cf9292ce3	[AMDGPU] Add two TSFlags: IsAtomicNoRtn and IsAtomicRtn We are using AtomicNoRet map in multiple places to determine if an instruction atomic, rtn or nortn atomic. This method does not work always since we have some instructions which only has rtn or nortn version. One such instruction is ds_wrxchg_rtn_b32 which does not have nortn version. This has caused changes in memory legalizer tests. Differential Revision: https://reviews.llvm.org/D96639	2021-02-15 11:27:59 -08:00
Dmitry Preobrazhensky	745064e36b	[AMDGPU][MC] Refactored exp tgt handling Summary: - Separated tgt encoding from parsing; - Separated tgt decoding from printing; - Improved errors handling; - Disabled leading zeroes in index. The following code is no longer accepted: exp pos00 v3, v2, v1, v0 Reviewers: arsenm, rampitec, foad Differential Revision: https://reviews.llvm.org/D95216	2021-01-26 14:54:15 +03:00
Jay Foad	18cb7441b6	[AMDGPU] Simpler names for arch-specific ttmp registers. NFC. Rename the _gfx9_gfx10 ttmp registers to _gfx9plus for simplicity, and use the corresponding isGFX9Plus predicate to decide when to use them instead of the old *_vi versions. Differential Revision: https://reviews.llvm.org/D94975	2021-01-19 18:47:14 +00:00
Jay Foad	4926eed59c	[AMDGPU] Add a TRANS bit to TSFlags. NFC. This is used to mark transcendental instructions that execute on a separate pipeline from the normal VALU pipeline. Differential Revision: https://reviews.llvm.org/D92042	2020-11-24 17:49:56 +00:00
Jay Foad	000400ca0a	Fix speling in comments. NFC.	2020-11-23 14:43:24 +00:00
Jay Foad	6881a82e8c	[AMDGPU] Fix scheduling of exp pos4 Also fix a similar issue in SIInsertWaitcnts, but I don't think that fix has any effect in practice. Differential Revision: https://reviews.llvm.org/D91290	2020-11-12 19:57:14 +00:00
Jay Foad	d7d6ac5624	[AMDGPU] Define and use names for export targets. NFC. Differential Revision: https://reviews.llvm.org/D91289	2020-11-12 19:57:14 +00:00
Sebastian Neubauer	1124bf4ab7	[AMDGPU] Set rsrc1 flags for graphics shaders Before they were only set for compute kernels and compute shaders but not for other shaders. Differential Revision: https://reviews.llvm.org/D89399	2020-11-04 12:25:41 +01:00
Stanislav Mekhanoshin	c9d6fe6f7d	[AMDGPU] Improve FLAT scratch detection We were useing too broad check for isFLATScratch() which also includes FLAT global. Differential Revision: https://reviews.llvm.org/D90505	2020-11-02 11:37:33 -08:00
Dmitry Preobrazhensky	ecde200209	[AMDGPU][MC] Corrected parser to avoid generation of excessive error messages Summary of changes: - Changed parser to eliminate generation of excessive error messages; - Corrected lit tests to match all expected error messages; - Corrected lit tests to guard against unwanted extra messages (added option "--implicit-check-not=error:"); - Added missing checks and fixed some typos in tests. See bug 46907: https://bugs.llvm.org/show_bug.cgi?id=46907 Reviewers: arsenm, rampitec Differential Revision: https://reviews.llvm.org/D86940	2020-09-02 19:42:18 +03:00
Dmitry Preobrazhensky	6b8948922c	[AMDGPU][MC] Added support of SP3 syntax for MTBUF format modifier Currently supported LLVM MTBUF syntax is shown below. It is not compatible with SP3. op dst, addr, rsrc, FORMAT, soffset This change adds support for SP3 syntax: op dst, addr, rsrc, soffset SP3FORMAT In addition to being compatible with SP3, this syntax allows using symbolic names for data, numeric and unified formats. Below is a list of added syntax variants. format:<expression> format:[<numeric-format-name>,<data-format-name>] format:[<data-format-name>,<numeric-format-name>] format:[<data-format-name>] format:[<numeric-format-name>] format:[<unified-format-name>] The last syntax variant is supported for GFX10 only. See llvm bug 37738 Reviewers: arsenm, rampitec, vpykhtin Differential Revision: https://reviews.llvm.org/D84026	2020-07-24 16:41:03 +03:00
Dmitry Preobrazhensky	e122eba185	[AMDGPU][MC] Corrected MTBUF parsing and decoding MTBUF implementation has many issues and this change addresses most of these: - refactored duplicated code; - hardcoded constants moved out of high-level code; - fixed a decoding error when nfmt or dfmt are zero (bug 36932); - corrected parsing of operand separators (bug 46403); - corrected handling of missing operands (bug 46404); - corrected handling of out-of-range modifiers (bug 46421); - corrected default value (bug 46467). Reviewers: arsenm, rampitec, vpykhtin, artem.tamazov, kzhuravl Differential Revision: https://reviews.llvm.org/D83760	2020-07-15 19:46:00 +03:00
Stanislav Mekhanoshin	9ee272f13d	[AMDGPU] Add gfx1030 target Differential Revision: https://reviews.llvm.org/D81886	2020-06-15 16:18:05 -07:00
Matt Arsenault	0892a96a05	AMDGPU: Optimize s_setreg_b32 to s_denorm_mode/s_round_mode This is a custom inserter because it was less work than teaching tablegen a way to indicate that it is sometimes OK to have a no side effect instruction in the output of a side effecting pattern. The asm is needed to look like a read of the mode register to prevent it from being deleted. However, there seems to be a bug where the mode register def instructions are moved across the asm sideeffect by the post-RA scheduler. Another oddity is the immediate is formatted differently between s_denorm_mode and s_round_mode.	2020-05-29 21:11:36 -04:00
Stanislav Mekhanoshin	1fb584f7a2	[AMDGPU] Added MI bit IsDOT NFC, needed for future commit. Differential Revision: https://reviews.llvm.org/D67669 llvm-svn: 372151	2019-09-17 17:56:13 +00:00
Jay Foad	eac23862a8	[AMDGPU] gfx10 atomic optimizer changes. Summary: Add support for gfx10, where all DPP operations are confined to work within a single row of 16 lanes, and wave32. Reviewers: arsenm, sheredom, critson, rampitec Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, t-tye, hiraditya, jfb, dstuttard, tpr, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D65644 llvm-svn: 369745	2019-08-23 10:07:43 +00:00
Dmitry Preobrazhensky	5153b1723a	[AMDGPU][MC][GFX9][GFX10] Added support of GET_DOORBELL message Reviewers: artem.tamazov, arsenm Differential Revision: https://reviews.llvm.org/D64729 llvm-svn: 366071	2019-07-15 15:12:16 +00:00
Stanislav Mekhanoshin	50d7f46460	[AMDGPU] gfx908 mAI instructions, MC part Differential Revision: https://reviews.llvm.org/D64446 llvm-svn: 365563	2019-07-09 21:43:09 +00:00
Stanislav Mekhanoshin	9e77d0c6df	[AMDGPU] gfx908 register file changes Differential Revision: https://reviews.llvm.org/D64438 llvm-svn: 365546	2019-07-09 19:41:51 +00:00
Dmitry Preobrazhensky	1d572ce395	[AMDGPU][MC] Enabled constant expressions as operands of sendmsg See bug 40820: https://bugs.llvm.org/show_bug.cgi?id=40820 Reviewers: artem.tamazov, arsenm Differential Revision: https://reviews.llvm.org/D62735 llvm-svn: 364645	2019-06-28 14:14:02 +00:00
Stanislav Mekhanoshin	bdf7f81b89	[AMDGPU] hazard recognizer for fp atomic to s_denorm_mode This requires 3 wait states unless there is a wait or VALU in between. Differential Revision: https://reviews.llvm.org/D63619 llvm-svn: 364074	2019-06-21 16:30:14 +00:00

1 2 3

119 Commits