llvm-project

Commit Graph

Author	SHA1	Message	Date
Piotr Sobczak	a4db7025a9	[AMDGPU] Remove assert Remove assert introduced in D101177, following post-commit feedback.	2021-05-12 14:52:37 +02:00
Piotr Sobczak	68137ef568	[AMDGPU] Skip invariant loads when avoiding WAR conflicts No need to handle invariant loads when avoiding WAR conflicts, as there cannot be a vector store to the same memory location. Reviewed By: foad Differential Revision: https://reviews.llvm.org/D101177	2021-05-12 10:57:05 +02:00
Austin Kerbow	4433f4601e	[AMDGPU] Fix extra waitcnt being added with BUFFER_INVL2 The waitcnt pass would increment the number of vmem events for some buffer invalidates that were not handled by the pass. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D102252	2021-05-11 13:17:33 -07:00
Austin Kerbow	6617a5a5ea	[AMDGPU] Move insertion of function entry waitcnt later This allows tracking these as preexisting waitcnt. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D101380	2021-05-05 17:58:38 -07:00
Austin Kerbow	f5199d7ae0	[AMDGPU] Revise handling of preexisting waitcnt Preexisting waitcnt may not update the scoreboard if the instruction being examined needed to wait on fewer counters than what was encoded in the old waitcnt instruction. Fixing this results in the elimination of some redudnat waitcnt. These changes also enable combining consecutive waitcnt into a single S_WAITCNT or S_WAITCNT_VSCNT instruction. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D100281	2021-05-05 17:21:33 -07:00
Stanislav Mekhanoshin	a8d9d50762	[AMDGPU] gfx90a support Differential Revision: https://reviews.llvm.org/D96906	2021-02-17 16:01:32 -08:00
Stanislav Mekhanoshin	5cf9292ce3	[AMDGPU] Add two TSFlags: IsAtomicNoRtn and IsAtomicRtn We are using AtomicNoRet map in multiple places to determine if an instruction atomic, rtn or nortn atomic. This method does not work always since we have some instructions which only has rtn or nortn version. One such instruction is ds_wrxchg_rtn_b32 which does not have nortn version. This has caused changes in memory legalizer tests. Differential Revision: https://reviews.llvm.org/D96639	2021-02-15 11:27:59 -08:00
Dmitry Preobrazhensky	745064e36b	[AMDGPU][MC] Refactored exp tgt handling Summary: - Separated tgt encoding from parsing; - Separated tgt decoding from printing; - Improved errors handling; - Disabled leading zeroes in index. The following code is no longer accepted: exp pos00 v3, v2, v1, v0 Reviewers: arsenm, rampitec, foad Differential Revision: https://reviews.llvm.org/D95216	2021-01-26 14:54:15 +03:00
dfukalov	560d7e0411	[NFC][AMDGPU] Split AMDGPUSubtarget.h to R600 and GCN subtargets ... to reduce headers dependency. Reviewed By: rampitec, arsenm Differential Revision: https://reviews.llvm.org/D95036	2021-01-20 22:22:45 +03:00
dfukalov	6a87e9b08b	[NFC][AMDGPU] Reduce include files dependency. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D93813	2021-01-07 22:22:05 +03:00
Jay Foad	7ecf19697e	[AMDGPU] Fix and extend vccz workarounds We have workarounds for two different cases where vccz can get out of sync with the value in vcc. This fixes them in two ways: 1. Fix the case where the def of vcc was in a previous basic block, by pessimistically assuming that vccz might be incorrect at a basic block boundary. 2. Fix the handling of pre-existing waitcnt instructions by calling generateWaitcntInstBefore before examining ScoreBrackets to determine whether there's an outstanding smem read operation. Differential Revision: https://reviews.llvm.org/D91636	2020-11-18 15:26:06 +00:00
Jay Foad	a6ecb2eb3d	[AMDGPU] Add comments. NFC.	2020-11-16 16:34:13 +00:00
Jay Foad	6881a82e8c	[AMDGPU] Fix scheduling of exp pos4 Also fix a similar issue in SIInsertWaitcnts, but I don't think that fix has any effect in practice. Differential Revision: https://reviews.llvm.org/D91290	2020-11-12 19:57:14 +00:00
Jay Foad	d7d6ac5624	[AMDGPU] Define and use names for export targets. NFC. Differential Revision: https://reviews.llvm.org/D91289	2020-11-12 19:57:14 +00:00
Jay Foad	f94fd1c8ca	[AMDGPU] Make use of SIInstrInfo::isEXP. NFC.	2020-11-11 17:01:20 +00:00
Joe Nash	58adab34c4	[AMDGPU] Resolve pseudo registers at encoding uses Pseudo-registers allow different register encodings between gpu generations. Make sure we resolve the pseudo regs to real regs whenever we get their hardware encoding. Using the correct encodings revealed a register bank conflict and an unnecessary write dependency. Tests have been updated to match. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D90721 Change-Id: I73c154cd24aecc820993b50bebaf4df97a5710ca	2020-11-04 12:52:32 -05:00
Tony	1bc7bfffdb	[AMDGPU] Optimize waitcnt insertion for flat memory operations Change waitcnt insertion to check the memory operand tokens to see if flat memory operations access VMEM in the same way it does to check if accessing LDS. This avoids adding waitcnt for counters for address spaces that are not accessed. In addition, only generate the pessimistic waitcnt 0 if a flat memory operation appears to access both VMEM and LDS. This benefits flat memory operations that explicitly specify the address space as GLOBAL or LOCAL. Differential Revision: https://reviews.llvm.org/D89618	2020-10-20 22:55:12 +00:00
Sebastian Neubauer	a343b9b032	Revert "[AMDGPU] Insert waitcnt after returning from call" This reverts commit `ca907bfb57`. According to michel.daenzer, > This completely broke the Mesa radeonsi driver on Navi 14. Xorg + > xterm come up with major corruption & psychedelic colours.	2020-09-23 17:16:39 +02:00
Sebastian Neubauer	ca907bfb57	[AMDGPU] Insert waitcnt after returning from call When memory operations are outstanding on function calls, either the caller or the callee can insert a waitcnt to ensure that all reads are finished. Calls need some time to be executed, so if the callee inserts the waitcnt, filling the instruction buffer and waiting for memory will be interleaved, hiding some latency. This comes at the cost of having a waitcnt inside functions that may not be needed as no memory operations are outstanding. For function calls, this is already implemented. The same principal applies to returns: If the caller inserts a waitcnt after the call, the callee does not have to wait and the return and memory operation can be run in parallel. This commit implements waiting in the caller after returning from a function call. Differential Revision: https://reviews.llvm.org/D87674	2020-09-23 12:17:59 +02:00
Matt Arsenault	e15215e041	AMDGPU: Hoist check for VGPRs	2020-09-09 19:45:40 -04:00
Matt Arsenault	82cbc9330a	AMDGPU: Fix inserting waitcnts before kill uses	2020-09-09 19:45:40 -04:00
Stanislav Mekhanoshin	b7760c3e5d	[AMDGPU] Remove unsound dependency on ISA version in waitcnt Differential Revision: https://reviews.llvm.org/D86566	2020-08-25 14:01:42 -07:00
Stanislav Mekhanoshin	817c831f02	[AMDGPU] Switch to named simm16 in vscnt insertion Differential Revision: https://reviews.llvm.org/D86568	2020-08-25 13:05:27 -07:00
Matt Arsenault	068808d102	AMDGPU: Don't assume call targets are registers GlobalISel let through a call to null, which would then fold into the source operand like any other inline immediate. The SelectionDAG lowering deletes calls to null and undef as a workaround from before calls were supported. We should probably drop the special handling case in the DAG lowering now, since the middle end optimizers delete null calls anyway.	2020-07-28 20:46:06 -04:00
Scott Linder	691ff4682f	[AMDGPU] Skip CFIInstructions in SIInsertWaitcnts Summary: CFI emitted during PEI at the beginning of the prologue needs to apply to any inserted waitcnts on function entry. Reviewers: arsenm, t-tye, RamNalamothu Reviewed By: arsenm Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, kerbowa, llvm-commits Tags: #llvm, #debug-info Differential Revision: https://reviews.llvm.org/D76881	2020-06-17 12:41:03 -04:00
vpykhtin	92f3828dc5	[AMDGPU] Fix wait counts in the presence of 16bit subregisters Differential Revision: https://reviews.llvm.org/D80033	2020-05-26 12:19:27 +03:00
Jay Foad	5f7ea85e78	[AMDGPU] Remove unnecessary s_waitcnt between VMEM loads VMEM loads of the same type (sampler vs no sampler) are guaranteed to write their result registers in order, so there is no need for an s_waitcnt even if they write to overlapping vgprs. Differential Revision: https://reviews.llvm.org/D79176	2020-05-01 10:10:23 +01:00
Jay Foad	1bf7ccb706	[AMDGPU] Use int and unsigned instead of other 32-bit integer types. NFC.	2020-04-30 15:21:36 +01:00
Jay Foad	462b960de8	Fix silly mistake in `31c09d03a1` [AMDGPU] Remove WaitcntBrackets::MixedPendingEvents[]. NFC.	2020-04-30 11:41:14 +01:00
Jay Foad	86545bf72d	[AMDGPU] Simplify loops in SIInsertWaitcnts::generateWaitcntInstBefore The loops over use operands and def operands were mostly identical. Combine them, and likewise for load memoperands and store memoperands. NFC.	2020-04-30 08:53:12 +01:00
Jay Foad	9f59d1931c	[AMDGPU] Remove Def argument from WaitcntBrackets::getRegInterval. NFC. It's cleaner to check this in the callers instead.	2020-04-30 08:53:12 +01:00
Jay Foad	31c09d03a1	[AMDGPU] Remove WaitcntBrackets::MixedPendingEvents[]. NFC. It's trivial to derive this information from other state.	2020-04-29 19:58:19 +01:00
Jay Foad	120572072e	[AMDGPU] Initialize gpr upper bounds to -1. NFC. These upper bounds are inclusive, so -1 (rather than 0) is the natural way to express an empty range.	2020-04-29 19:58:06 +01:00
Jay Foad	777f91f47e	[AMDGPU] Simplify MergeInfo calculations. NFC. This makes the definition and uses of NewUB more symmetrical, and makes it clear that ScoreLBs[T] does not change.	2020-04-29 19:58:06 +01:00
Jay Foad	4649da119a	[AMDGPU] Use a MapVector instead of a DenseMap and a std::vector. NFC.	2020-04-29 16:02:24 +01:00
Jay Foad	2a10957f62	[AMDGPU] Minor cleanups. NFC.	2020-04-29 16:02:24 +01:00
Jay Foad	3c1f21cdf6	[AMDGPU] Remove some redundant variables. NFC.	2020-04-29 09:24:41 +01:00
Jay Foad	498795829b	[AMDGPU] Remove odd blank line in debug output.	2020-04-27 17:10:36 +01:00
Jay Foad	4a331beadc	[AMDGPU] Fix vccz after v_readlane/v_readfirstlane to vcc_lo/hi Summary: Up to gfx9, writes to vcc_lo and vcc_hi by instructions like v_readlane and v_readfirstlane do not update vccz to reflect the new value of vcc. Fix it by reusing part of the existing vccz bug handling code, which inserts an "s_mov_b64 vcc, vcc" instruction to restore vccz just before an instruction that needs the correct value. Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D69661	2020-01-28 10:52:17 +00:00
alex-t	ca8b20ca3b	[AMDGPU] need to insert wait between the scalar load and vector store to the same address to avoid WAR conflict. Reviewers: rampitec, vpykhtin, nhaehnle Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D71934	2020-01-04 18:23:14 +03:00
Michael Liao	79d401905f	[amdgpu] Fix scoreboard updating on `s_waitcnt_vscnt`. Summary: - Other counters are accidentally cleared. Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D71866	2019-12-31 14:20:30 -05:00
Jay Foad	c5c935ab66	Make more use of MachineInstr::mayLoadOrStore.	2019-12-19 11:51:52 +00:00
Jay Foad	357bd914a1	[AMDGPU] Fix function name in debug output	2019-11-25 15:22:04 +00:00
Austin Kerbow	fef69706dc	AMDGPU: Handle waitcnt overflow Summary: The waitcnt pass can overflow the counters when the number of outstanding events for a type exceed the capacity of the counter. This can lead to inefficient insertion of waitcnts, or to waitcnt instructions with max values for each type. The last situation can cause an instruction which when disassembled appears to be an illegal waitcnt without an operand. In these cases we should add a wait for the 'counter maximum' - 1, and update the waitcnt brackets accordingly. Reviewers: rampitec, arsenm Reviewed By: rampitec Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D70418	2019-11-23 09:34:23 -08:00
Jay Foad	e5972f2a04	[AMDGPU] Simplify VCCZ bug handling Summary: VCCZBugHandledSet was used to make sure we don't apply the same workaround more than once to a single cbranch instruction, but it's not necessary because the workaround involves inserting an s_waitcnt instruction, which is enough for subsequent iterations to detect that no further workaround is necessary. Also beef up the test case to check that the workaround was only applied once. I have also manually verified that the test still passes even if I hack the big do-while loop in runOnMachineFunction to run a minimum of five iterations. Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D69621	2019-10-30 17:09:07 +00:00
Jay Foad	b592253ec6	[AMDGPU] Consolidate one more getGeneration check This one should have been done in r363902 when hasReadVCCZBug was introduced.	2019-10-30 11:16:42 +00:00
Austin Kerbow	d11b93ec6a	AMDGPU: Avoid overwriting saved PC Summary: An outstanding load with same destination sgpr as call could cause PC to be updated with junk value on return. Reviewers: arsenm, rampitec Reviewed By: arsenm Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D69474	2019-10-28 10:02:22 -07:00
Jonas Devlieghere	0eaee545ee	[llvm] Migrate llvm::make_unique to std::make_unique Now that we've moved to C++14, we no longer need the llvm::make_unique implementation from STLExtras.h. This patch is a mechanical replacement of (hopefully) all the llvm::make_unique instances across the monorepo. llvm-svn: 369013	2019-08-15 15:54:37 +00:00
Stanislav Mekhanoshin	e67cc380a8	[AMDGPU] gfx908 mfma support Differential Revision: https://reviews.llvm.org/D64584 llvm-svn: 365824	2019-07-11 21:19:33 +00:00
Matt Arsenault	c04aab9c06	AMDGPU: Look through bundles for existing waitcnts These aren't produced now, but will be in a future patch. llvm-svn: 364983	2019-07-03 00:30:44 +00:00

1 2 3

117 Commits