llvm-project

Commit Graph

Author	SHA1	Message	Date
jeff	f4e6149d82	[AMDGPU] Use V_PERM to match buildvectors when inputs are not canonicalized (i.e. can't use V_PACK) If we can not prove that f16 operands of a buildvector are canonicalized, then we can not lower into a V_PACK. In this scenario, we would previously lower into some combination of and(sdwa), shr, or. This patch allows for matching into V_PERM instead. Change-Id: Ifa4a74fdb81ef44f22ba490c7fdf81ec8aebc945	2022-10-03 12:58:29 -07:00
Dmitry Preobrazhensky	485c539391	[AMDGPU][MC][GFX11] Disable non-null src0 for s_waitcnt_*cnt Differential Revision: https://reviews.llvm.org/D134809	2022-09-29 19:56:03 +03:00
Anshil Gandhi	a0c53524a5	[AMDGPU] Fix size of SOPK instructions to 4 bytes Instructions in SOPK format may not have 32-bit literal constants following the instruction. Differential Revision: https://reviews.llvm.org/D133972	2022-09-20 14:27:09 -06:00
Alexander Timofeev	fbdea5a2e9	[AMDGPU] Always select s_cselect_b32 for uniform 'select' SDNode This patch contains changes necessary to carry physical condition register (SCC) dependencies through the SDNode scheduler. It adds the edge in the SDNodeScheduler dependency graph instead of inserting the SCC copy between each definition and use. This approach lets the scheduler place instructions in an optimal way placing the copy only when the dependency cannot be resolved. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D133593	2022-09-15 22:03:56 +02:00
Jay Foad	6c372daa84	[AMDGPU] New GFX11 intrinsic llvm.amdgcn.s.sendmsg.rtn Add new intrinsic and codegen support for the s_sendmsg_rtn_b32 and s_sendmsg_rtn_b64 instructions. Differential Revision: https://reviews.llvm.org/D127315	2022-06-10 08:15:23 +01:00
Jay Foad	86b55edab6	[AMDGPU] Mark s_getreg as having side effects instead of reading memory s_getreg does not interact with anything else that is modelled as a memory access either in IR or MachineIR. Differential Revision: https://reviews.llvm.org/D125968	2022-05-19 21:25:46 +01:00
Joe Nash	d21b9b4946	[AMDGPU] gfx11 scalar alu instructions MC layer support for SOP(scalar alu operations) including encoding support for s_delay_alu and s_sendmsg_rtn. Contributors: Jay Foad <jay.foad@amd.com> Patch 7/N for upstreaming of AMDGPU gfx11 architecture. Depends on D125319 Reviewed By: #amdgpu, arsenm Differential Revision: https://reviews.llvm.org/D125498	2022-05-17 13:35:41 -04:00
Joe Nash	6ef17f20d9	[AMDGPU] Mark sendmsg hasSideEffects. NFC Address the FIXME by marking the sendmsg instructions with hasSideEffects. Reviewed By: foad Differential Revision: https://reviews.llvm.org/D125569	2022-05-16 09:59:27 -04:00
Dmitry Preobrazhensky	1f6aa90386	[AMDGPU][MC][GFX10] Added syntactic sugar for s_waitcnt_depctr operand Added the following helpers: depctr_hold_cnt(...) depctr_sa_sdst(...) depctr_va_vdst(...) depctr_va_sdst(...) depctr_va_ssrc(...) depctr_va_vcc(...) depctr_vm_vsrc(...) Differential Revision: https://reviews.llvm.org/D123022	2022-04-07 17:03:44 +03:00
Austin Kerbow	62bcfcb5a5	[AMDGPU] Add llvm.amdgcn.s.setprio intrinsic Reviewed By: rampitec, arsenm Differential Revision: https://reviews.llvm.org/D120976	2022-03-12 22:15:42 -08:00
Venkata Ramanaiah Nalamothu	04fff547e2	[AMDGPU] Move call clobbered return address registers s[30:31] to callee saved range Currently the return address ABI registers s[30:31], which fall in the call clobbered register range, are added as a live-in on the function entry to preserve its value when we have calls so that it gets saved and restored around the calls. But the DWARF unwind information (CFI) needs to track where the return address resides in a frame and the above approach makes it difficult to track the return address when the CFI information is emitted during the frame lowering, due to the involvment of understanding the control flow. This patch moves the return address ABI registers s[30:31] into callee saved registers range and stops adding live-in for return address registers, so that the CFI machinery will know where the return address resides when CSR save/restore happen during the frame lowering. And doing the above poses an issue that now the return instruction uses undefined register `sgpr30_sgpr31`. This is resolved by hiding the return address register use by the return instruction through the `SI_RETURN` pseudo instruction, which doesn't take any input operands, until the `SI_RETURN` pseudo gets lowered to the `S_SETPC_B64_return` during the `expandPostRAPseudo()`. As an added benefit, this patch simplifies overall return instruction handling. Note: The AMDGPU CFI changes are there only in the downstream code and another version of this patch will be posted for review for the downstream code. Reviewed By: arsenm, ronlieb Differential Revision: https://reviews.llvm.org/D114652	2022-03-09 12:18:02 +05:30
Jay Foad	05d79e3562	[AMDGPU] Divergence-driven instruction selection for bitreverse Differential Revision: https://reviews.llvm.org/D119702	2022-02-24 20:21:59 +00:00
alex-t	c23198ec13	[AMDGPU] Divergence-driven abs instruction selection This change enables "abs" SDNodes selection by the node divergence. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D119581	2022-02-14 21:36:32 +03:00
alex-t	d88a146f2b	[AMDGPU] Missed sign/zero extend patterns for divergence-driven instruction selection This change includes tablegen patterns that were missed by https://reviews.llvm.org/D110950 and https://reviews.llvm.org/D76230 Reviewed By: foad Differential Revision: https://reviews.llvm.org/D119302	2022-02-10 19:36:12 +03:00
alex-t	5157f984ae	[AMDGPU] Enable divergence-driven XNOR selection Currently not (xor_one_use) pattern is always selected to S_XNOR irrelative od the node divergence. This relies on further custom selection pass which converts to VALU if necessary and replaces with V_NOT_B32 ( V_XOR_B32) on those targets which have no V_XNOR. Current change enables the patterns which explicitly select the not (xor_one_use) to appropriate form. We assume that xor (not) is already turned into the not (xor) by the combiner. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D116270	2022-01-26 15:33:10 +03:00
David Salinas	c0581f7df6	Revert D109159 : Revert "[amdgpu] Enable selection of `s_cselect_b64`." This reverts commit `640beb38e7`. That commit caused performance degradtion in Quicksilver test QS:sGPU and a functional test failure in (rocPRIM rocprim.device_segmented_radix_sort). Reverting until we have a better solution to s_cselect_b64 codegen cleanup Change-Id: Ifc167b3c2dae7a65920676f22a97ba76485f3456 Reviewed By: kzhuravl Differential Revision: https://reviews.llvm.org/D116686 Change-Id: I1abf49b74a7e2ba0e0205f747a4154a468b9d7f2	2022-01-11 21:14:09 +00:00
alex-t	5d46263a5a	[AMDGPU] Enable divergence-driven 'ctpop' selection This change adds the patterns and divergence predicates for the ctpop (bitcount) nodes to make them selected according to the divergence. Reviewed By: foad Differential Revision: https://reviews.llvm.org/D116284	2022-01-07 16:07:38 +03:00
Nico Weber	085f078307	Revert "Revert D109159 "[amdgpu] Enable selection of `s_cselect_b64`."" This reverts commit `859ebca744`. The change contained many unrelated changes and e.g. restored unit test failes for the old lld port.	2022-01-05 13:10:25 -05:00
David Salinas	859ebca744	Revert D109159 "[amdgpu] Enable selection of `s_cselect_b64`." This reverts commit `640beb38e7`. That commit caused performance degradtion in Quicksilver test QS:sGPU and a functional test failure in (rocPRIM rocprim.device_segmented_radix_sort). Reverting until we have a better solution to s_cselect_b64 codegen cleanup Change-Id: Ibf8e397df94001f248fba609f072088a46abae08 Reviewed By: kzhuravl Differential Revision: https://reviews.llvm.org/D115960 Change-Id: Id169459ce4dfffa857d5645a0af50b0063ce1105	2022-01-05 17:57:32 +00:00
Ron Lieberman	09b53296cf	Revert "[AMDGPU] Move call clobbered return address registers s[30:31] to callee saved range" This reverts commit `9075009d1f`. Failed amdgpu runtime buildbot # 3514	2021-12-22 11:39:28 -05:00
RamNalamothu	9075009d1f	[AMDGPU] Move call clobbered return address registers s[30:31] to callee saved range Currently the return address ABI registers s[30:31], which fall in the call clobbered register range, are added as a live-in on the function entry to preserve its value when we have calls so that it gets saved and restored around the calls. But the DWARF unwind information (CFI) needs to track where the return address resides in a frame and the above approach makes it difficult to track the return address when the CFI information is emitted during the frame lowering, due to the involvment of understanding the control flow. This patch moves the return address ABI registers s[30:31] into callee saved registers range and stops adding live-in for return address registers, so that the CFI machinery will know where the return address resides when CSR save/restore happen during the frame lowering. And doing the above poses an issue that now the return instruction uses undefined register `sgpr30_sgpr31`. This is resolved by hiding the return address register use by the return instruction through the `SI_RETURN` pseudo instruction, which doesn't take any input operands, until the `SI_RETURN` pseudo gets lowered to the `S_SETPC_B64_return` during the `expandPostRAPseudo()`. As an added benefit, this patch simplifies overall return instruction handling. Note: The AMDGPU CFI changes are there only in the downstream code and another version of this patch will be posted for review for the downstream code. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D114652	2021-12-22 20:51:12 +05:30
alex-t	19727e31fb	[AMDGPU] Enable divergence predicates for ctlz/cttz ctlz/cttz get lowered to the set of target opcodes This change enables the ISel to select SALU or VALU form according to the SDNode divergence. CTLZ - S_FLBIT_I32_B32 if uniform and V_FFBH_U32_e64 if divergent CTTZ - S_FF1_I32_B32 if uniform and V_FFBL_B32_e64 if divergent Also @llvm.amdgcn.sffbh.i32 gets lowered to S_FLBIT_I32 if uniform and V_FFBH_I32_e64 if divergent NOTE: 64bit versions S_FF1_I32_B64 and S_FLBIT_I32_B64 are not currently supported by the DAG ISel. ctlz/cttz with i64 input are split into two 32bit instructions. Nevertheless, they already have the patterns and were equipped with the divergence predicates to make sure they will be selected correctly when enabled. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D116044	2021-12-20 20:53:48 +03:00
alex-t	98d09705e1	[AMDGPU] Re-enabling divergence predicates for min/max This patch enables divergence predicates for min/max nodes. It makes ISD::MIN/MAX selected to S_MIN_I(U)32/S_MAX_I(U)32 or V_MIN_I(U)32_e64/V_MAX_I(U)32_e64 Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D115954	2021-12-20 16:10:55 +03:00
alex-t	1448aa9dbd	[AMDGPU] Expand not pattern according to the XOR node divergence The "not" is defined as XOR $src -1. We need to transform this pattern to either S_NOT_B32 or V_NOT_B32_e32 dependent on the "xor" node divergence. Reviewed By: rampitec, foad Differential Revision: https://reviews.llvm.org/D115884	2021-12-20 14:41:38 +03:00
Jay Foad	47d15170f6	[AMDGPU] Remove redundant mayLoad = 0, mayStore = 0. NFC. Almost everything in this file is mayLoad = 0, mayStore = 0 by default anyway.	2021-12-07 09:55:05 +00:00
Thomas Symalla	76cbe62262	[AMDGPU] Changes the AMDGPU_Gfx calling convention by making the SGPRs 4..29 callee-save. This is to avoid superfluous s_movs when executing amdgpu_gfx function calls as the callee is likely not going to change the argument values. This patch changes the AMDGPU_Gfx calling convention. It defines the SGPR registers s[4:29] as callee-save and leaves some SGPRs usable for callers. The intention is to avoid unneccessary s_mov instructions for arguments the caller would otherwise save and restore in these registers. Reviewed By: sebastian-ne Differential Revision: https://reviews.llvm.org/D111637	2021-11-04 21:50:18 +01:00
Abinav Puthan Purayil	61e3b9fefe	[AMDGPU] Add constrained shift pattern matches. The motivation for this is due to clang's conformance to https://www.khronos.org/registry/OpenCL/specs/3.0-unified/html/OpenCL_C.html#operators-shift which makes clang emit (<shift> a, (and b, <width> - 1)) for `a <shift> b` in OpenCL where a is an int of bit width <width>. Differential revision: https://reviews.llvm.org/D110231	2021-10-26 19:07:19 +05:30
Stanislav Mekhanoshin	7cdb1df8c7	[AMDGPU] Divergence driven selection for fused bitlogic The change adds divergence predicates for fused logical operations. The problem with selecting a scalar fused op such as S_NOR_B32 is that it does not have a VALU counterpart and will be split in moveToVALU. At the same time it prevents selection of a better opcode on the VALU side (such as V_OR3_B32) which does not have a counterpart on SALU side. XNOR opcodes are left as is and selected as scalar to get advantage of the SIInstrInfo::lowerScalarXnor() code which can commute operations to keep one of two opcodes on SALU if possible. See xnor.ll test for this. Differential Revision: https://reviews.llvm.org/D111907	2021-10-18 01:44:25 -07:00
Jay Foad	3828ea6181	[AMDGPU] Divergence-driven instruction selection for mul i32 Differential Revision: https://reviews.llvm.org/D109881	2021-09-22 09:36:34 +01:00
Jay Foad	128a49727a	[AMDGPU] Fix upcoming TableGen warnings on unused template arguments. NFC. The warning is implemented by D109359 which is still in review. Differential Revision: https://reviews.llvm.org/D109826	2021-09-16 09:07:18 +01:00
Michael Liao	640beb38e7	[amdgpu] Enable selection of `s_cselect_b64`. Differential Revision: https://reviews.llvm.org/D109159	2021-09-07 10:45:07 -04:00
Dmitry Preobrazhensky	09c9f4dc7d	[AMDGPU][MC] Added missing isCall/isBranch flags Added isCall for S_CALL_B64; added isBranch for S_SUBVECTOR_LOOP_*. Differential Revision: https://reviews.llvm.org/D106072	2021-07-16 14:59:10 +03:00
Stanislav Mekhanoshin	e5b0fe1b83	[AMDGPU] Mark more SOP instructions as rematerializable The rest of the SOP instructions implicitly set SCC and not suitable for the rematerialization. Differential Revision: https://reviews.llvm.org/D105670	2021-07-08 16:00:45 -07:00
Jay Foad	7f3ac6714a	[AMDGPU] Set SALU, VALU and other instruction type flags on Real instructions This does not affect codegen but might benefit llvm-mca.	2021-06-16 13:36:02 +01:00
Jay Foad	323b3e645d	[AMDGPU] Set mayLoad and mayStore on Real instructions This does not affect codegen but might benefit llvm-mca.	2021-06-16 12:10:23 +01:00
Jay Foad	37109974af	[AMDGPU] Use defvar in SOPInstructions.td. NFC. Factor out repeated !cast<SOP*_Pseudo>(NAME) into a new "defvar ps", just to improve readability and maintainability. Differential Revision: https://reviews.llvm.org/D104306	2021-06-16 09:16:45 +01:00
Mirko Brkusanin	35ef4c940b	[AMDGPU][GlobalISel] Legalize G_ABS Legalize and select G_ABS so that we can use llvm.abs intrinsic Differential Revision: https://reviews.llvm.org/D102391	2021-06-04 14:46:43 +02:00
Konstantin Zhuravlyov	844012940e	AMDGPU: Add isBranch=1 to SOPP branch instructions Differential Revision: https://reviews.llvm.org/D99955	2021-04-06 10:59:30 -04:00
Jay Foad	fc7e3e7dd9	[AMDGPU] Set SchedRW on real instructions Coyp SchedRW from pseudos to real instructions so that llvm-mca has access to it. This is NFC for normal compiler codegen, which schedules pseudos not real instructions. Add an llvm-mca test for some high latency double-precision instructions as a smoke test. Differential Revision: https://reviews.llvm.org/D99187	2021-03-23 15:38:11 +00:00
Jay Foad	796a60d2ea	[AMDGPU] New intrinsic void llvm.amdgcn.s.sethalt(i32) The expected use case is for frontends to insert this into shaders that are to be run under a debugger. The shader can then be resumed or single stepped from the point of the call under debugger control. Differential Revision: https://reviews.llvm.org/D97670	2021-03-01 14:30:23 +00:00
Jay Foad	3ad5216ed8	[AMDGPU] Better codegen for i64 bitreverse Differential Revision: https://reviews.llvm.org/D97547	2021-02-26 15:51:36 +00:00
Jay Foad	9f69c1bc54	[AMDGPU] Rename pseudo S_WAITCNT_IDLE to S_WAIT_IDLE. NFC.	2020-11-18 14:03:43 +00:00
Michael Liao	23c6d1501d	[amdgpu] Add `llvm.amdgcn.endpgm` support. - `llvm.amdgcn.endpgm` is added to enable "abort" support. Differential Revision: https://reviews.llvm.org/D90809	2020-11-05 19:06:50 -05:00
Jay Foad	a442fad911	[AMDGPU] Fix double space in disassembly of s_set_gpr_idx_mode Differential Revision: https://reviews.llvm.org/D90374	2020-10-29 14:54:33 +00:00
Matt Arsenault	d61996473d	AMDGPU: Increase branch size estimate with offset bug This will be relaxed to insert a nop if the offset hits the bad value, so over estimate branch instruction sizes.	2020-10-23 10:34:24 -04:00
Joe Nash	f6d7832f4c	[AMDGPU] Refactor SOPC & SOPP .td for extension We use the Real vs Pseudo instruction abstraction for other types of instructions to facilitate changes in opcode between gpu generations. This patch introduces that abstraction to SOPC and SOPP. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D89738 Change-Id: I59d53c2c7058b49d05b60350f4062a9b542d3138	2020-10-21 12:35:52 -04:00
Stanislav Mekhanoshin	caeb13aba8	[AMDGPU] Allow SOP asm mnemonic to differ Allows the creation of real SOP1 instructions with assembler mnemonics that differ from their pseudo-instruction mnemonics. The default behavior keeps the mnemonics matching. Corrects a subtarget label typo in a comment. Authored By: Joe_Nash Differential Revision: https://reviews.llvm.org/D88708	2020-10-01 16:00:04 -07:00
Jay Foad	2806f586dc	[AMDGPU] Make bfi patterns divergence-aware This tends to increase code size but more importantly it reduces vgpr usage, and could avoid costly readfirstlanes if the result needs to be in an sgpr. Differential Revision: https://reviews.llvm.org/D88245	2020-09-28 10:16:51 +01:00
Jay Foad	90777e2924	[AMDGPU] Enable scheduling around FP MODE-setting instructions Pre-gfx10 all MODE-setting instructions were S_SETREG_B32 which is marked as having unmodeled side effects, which makes the machine scheduler treat it as a barrier. Now that we have proper implicit $mode operands we can use a no-side-effects S_SETREG_B32_mode pseudo instead for setregs that only touch the FP MODE bits, to give the scheduler more freedom. Differential Revision: https://reviews.llvm.org/D87446	2020-09-16 16:10:47 +01:00
Matt Arsenault	40a142fa57	AMDGPU/GlobalISel: Match andn2/orn2 for more types Unfortunately this ends up not working as expected on targets with 16-bit operations due to AMDGPUCodeGenPrepare's promotion of uniform 16-bit ops to i32. The vector case annoyingly requires switching the checked opcode, since constants for vectors aren't directly handled. I also need to think more carefully about whether this is valid for i1.	2020-08-14 13:18:03 -04:00

1 2 3 4

155 Commits