llvm-project

Commit Graph

Author	SHA1	Message	Date
Matt Arsenault	1912ace968	AMDGPU: Move handling of AGPR copies to a separate function This is in preparation for fixing multiple problems with the way AGPR copies are handled, but this change is NFC itself. First, it's relying on recursively calling copyPhysReg, which is losing information necessary to get correct super register handling. Second, it's constructing a new RegScavenger and doing a O(N^2) walk on every single sub-spill for every AGPR tuple copy. Third, it's using the forward form of the scavenger, and not using the preferred backwards scan.	2020-07-16 14:32:24 -04:00
Matt Arsenault	79f67cae91	AMDGPU: Rename add/sub with carry out instructions The hardware has created a real mess in the naming for add/sub, which have been renamed basically every generation. Switch the carry out pseudos to have the gfx9/gfx10 names. We were using the original SI/CI v_add_i32/v_sub_i32 names. Later targets reintroduced these names as carryless instructions with a saturating clamp bit, which we do not define. Do this rename so we can unambiguously add these missing instructions. The carry-in versions should also be renamed, but at least those had a consistent _u32 name to begin with. The 16-bit instructions were also renamed, but aren't ambiguous. This does regress assembler error message quality in some cases. In mismatched wave32/wave64 situations, this will switch from "unsupported instruction" to "invalid operand", with the error pointing at the wrong position. I couldn't quite follow how the assembler selects these, but the previous behavior seemed accidental to me. It looked like there was a partial attempt to handle this which was never completed (i.e. there is an AMDGPUOperand::isBoolReg but it isn't used for anything).	2020-07-16 13:16:30 -04:00
Matt Arsenault	d2e74fad20	AMDGPU: Set more mov flags on V_ACCVGPR_{READ\|WRITE}_B32 This fixes extra copies when materializing constants in AGPRs. This made it a lot harder to trigger the spilling in spill-agpr.ll	2020-07-01 18:58:59 -04:00
Matt Arsenault	14fe4607f1	AMDGPU: Support commuting register and global operand	2020-07-01 13:59:13 -04:00
Matt Arsenault	a21544ad11	AMDGPU: Fix handling of target flags when commuting instruction If the original register operand had a subregister, it wasn't getting cleared. This resulted in reinterpreted the subreg index as unrecognized target flags, which produced unparseable MIR.	2020-07-01 13:59:13 -04:00
James Y Knight	4b0aa5724f	Change the INLINEASM_BR MachineInstr to be a non-terminating instruction. Before this instruction supported output values, it fit fairly naturally as a terminator. However, being a terminator while also supporting outputs causes some trouble, as the physreg->vreg COPY operations cannot be in the same block. Modeling it as a non-terminator allows it to be handled the same way as invoke is handled already. Most of the changes here were created by auditing all the existing users of MachineBasicBlock::isEHPad() and MachineBasicBlock::hasEHPadSuccessor(), and adding calls to isInlineAsmBrIndirectTarget or mayHaveInlineAsmBr, as appropriate. Reviewed By: nickdesaulniers, void Differential Revision: https://reviews.llvm.org/D79794	2020-07-01 12:51:50 -04:00
Adam Balogh	71c6a36018	[AMDGPU][NFC] Remove redundant condition Condition `LiteralCount` is checked both in an outer and in an inner `if` statement in `SIInstrInfo::verifyInstruction()`. This patch removes the redundant inner check. The issue was found using `clang-tidy` check under review `misc-redundant-condition`. See https://reviews.llvm.org/D81272. Differential Revision: https://reviews.llvm.org/D82555	2020-07-01 09:04:25 +02:00
Piotr Sobczak	0045786f14	[AMDGPU] Select s_cselect Summary: Add patterns to select s_cselect in the isel. Handle more cases of implicit SCC accesses in si-fix-sgpr-copies to allow new patterns to work. Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, asbirlea, kerbowa, llvm-commits Tags: #llvm Re-commit D81925 with a bugfix D82370. Differential Revision: https://reviews.llvm.org/D81925 Differential Revision: https://reviews.llvm.org/D82370	2020-06-25 10:38:23 +02:00
dstuttar	e8775c8d81	[AMDGPU] Make sure to fix implicit operands on insertBranch Summary: Without fixImplicitOperands we may end up creating default implicit operands that are the wrong wave size Includes simple test that provokes insertBranch in the correct way to expose the issue being fixed. Change-Id: I92bdcdee9fcb7b4d91529b84e76a48ac8218483e Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, tpr, t-tye, hiraditya, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D82459	2020-06-24 16:50:48 +01:00
Matt Arsenault	778351df77	Revert "[AMDGPU] Enable compare operations to be selected by divergence" This reverts commit `521ac0b5ce`. Reported to break thousands of piglit tests.	2020-06-24 11:21:30 -04:00
alex-t	521ac0b5ce	[AMDGPU] Enable compare operations to be selected by divergence Summary: Details: This patch enables SETCC to be selected to S_CMP_* if uniform and V_CMP_* if divergent. Reviewers: rampitec, arsenm Reviewed By: rampitec Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D82194	2020-06-24 11:50:40 +03:00
Your Name	cc9d693856	[AMDGPU/MemOpsCluster] Implement new heuristic for computing max mem ops cluster size Summary: Make use of both the - (1) clustered bytes and (2) cluster length, to decide on the max number of mem ops that can be clustered. On an average, when loads are dword or smaller, consider `5` as max threshold, otherwise `4`. This heuristic is purely based on different experimentation conducted, and there is no analytical logic here. Reviewers: foad, rampitec, arsenm, vpykhtin Reviewed By: rampitec Subscribers: llvm-commits, kerbowa, hiraditya, t-tye, Anastasia, tpr, dstuttard, yaxunl, nhaehnle, wdng, jvesely, kzhuravl, thakis Tags: #llvm Differential Revision: https://reviews.llvm.org/D82393	2020-06-24 00:39:41 +05:30
hsmahesha	5832950adb	[AMDGPU/MemOpsCluster] Compute `width` for `MIMG` instruction class. Summary: `width` computation is missing for newly added `MIMG` instruction class. Add it. Reviewers: foad, rampitec, arsenm Reviewed By: foad Subscribers: MatzeB, javed.absar, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D81649	2020-06-23 17:32:17 +05:30
Carl Ritson	4a7de36afc	[AMDGPU] Avoid use of V_READLANE into EXEC in SGPR spills Always prefer to clobber input SGPRs and restore them after the spill. This applies to both spills to VGPRs and scratch. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D81914	2020-06-20 12:10:47 +09:00
Piotr Sobczak	6d9565d6d5	Revert "[AMDGPU] Select s_cselect" This caused some failures detected by the buildbot with expensive checks enabled. This reverts commit `4067de569f`.	2020-06-19 16:41:04 +02:00
Piotr Sobczak	4067de569f	[AMDGPU] Select s_cselect Summary: Add patterns to select s_cselect in the isel. Handle more cases of implicit SCC accesses in si-fix-sgpr-copies to allow new patterns to work. Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, asbirlea, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D81925	2020-06-19 16:17:46 +02:00
Matt Arsenault	5f5f566b26	AMDGPU: Don't use 16-bit FP inline constants in integer operands It seems to be a hardware defect that the half inline constants do not work as expected for the 16-bit integer operations (the inverse does work correctly). Experimentation seems to show these are really reading the 32-bit inline constants, which can be observed by writing inline asm using op_sel to see what's in the high half of the constant. Theoretically we could fold the high halves of the 32-bit constants using op_sel. The *_asm_all.s MC tests are broken, and I don't know where the script to autogenerate these are. I started manually fixing it, but there's just too many cases to fix. This also does break the assembler/disassembler support for these values, and I'm not sure what to do about it. These are still valid encodings, so it seems like you should be able to use them in some way. If you wrote assembly using them, you could have really meant it (perhaps to read the high bits with op_sel?). The disassembler will print the invalid literal constant which will fail to re-assemble. The behavior is also different depending on the use context. Consider this example, which was previously accepted and encoded using the inline constant: v_mad_i16 v5, v1, -4.0, v3 ; encoding: [0x05,0x00,0xec,0xd1,0x01,0xef,0x0d,0x04] In contexts where an inline immediate is required (such as on gfx8/9), this will now be rejected. For gfx10, this will produce the literal encoding and change the printed format: v_mad_i16 v5, v1, 0xc400, v3 ; encoding: [0x05,0x00,0x5e,0xd7,0x01,0xff,0x0d,0x04,0x00,0xc4,0x00,0x00] This is just another variation of the issue that we don't perfectly handle round trip assembly/disassembly due to not tracking how immediates were encoded. This doesn't matter much in practice, since compilers don't emit the suboptimal encoding. I doubt any users are relying on this behavior (although I did make use of the old behavior to figure out what was wrong). Fixes bug 46302.	2020-06-17 19:14:10 -04:00
Matt Arsenault	46579471fd	AMDGPU: Fix spill/restore of 192-bit registers I tried to use an IR inline asm test, but that doesn't work since the inline asm handling asserts without an MVT to use.	2020-06-14 13:12:01 -04:00
Sebastian Neubauer	29a6ad94fd	[AMDGPU] Add G16 support to image instructions Add G16 feature for GFX10 and support A16 and G16 in GlobalISel. Differential Revision: https://reviews.llvm.org/D76836	2020-06-12 11:26:31 +02:00
hsmahesha	7410571ce9	Revert "[AMDGPU/MemOpsCluster] Implement new heuristic for computing max mem ops cluster size" This reverts commit `40a632a335`.	2020-06-09 19:27:17 +05:30
hsmahesha	40a632a335	[AMDGPU/MemOpsCluster] Implement new heuristic for computing max mem ops cluster size Summary: Make use of both the - (1) clustered bytes and (2) cluster length, to decide on the max number of mem ops that can be clustered. On an average, when loads are dword or smaller, consider `5` as max threshold, otherwise `4`. This heuristic is purely based on different experimentation conducted, and there is no analytical logic here. Reviewers: foad, rampitec, arsenm, vpykhtin Reviewed By: foad, rampitec Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, Anastasia, t-tye, hiraditya, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D81085	2020-06-09 14:09:14 +05:30
Jay Foad	275ecaae16	[AMDGPU] Cluster MIMG instructions Differential Revision: https://reviews.llvm.org/D74035	2020-06-08 14:01:53 +01:00
hsmahesha	29c17ed96e	[AMDGPU/MemOpsCluster] Code clean-up around accessing of memory operand width Summary: Clean-up the width computing logic given a memory operand, and re-arrange code to avoid code duplication. Reviewers: foad, rampitec, arsenm, vpykhtin, javedabsar Reviewed By: foad Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D80946	2020-06-03 14:03:52 +05:30
Matt Arsenault	452e0d9023	AMDGPU: Don't run mode switches with exec 0 These are scalar instructions that change vector instructions, so they should not be executed without any active lanes. The implementation of -amdgpu-skip-threshold also seem to be backwards from expected, since decreasing it prevents removal.	2020-06-02 13:47:48 -04:00
hsmahesha	0ed2c04636	[AMDGPU/MemOpsCluster] Let mem ops clustering logic also consider number of clustered bytes Summary: While clustering mem ops, AMDGPU target needs to consider number of clustered bytes to decide on max number of mem ops that can be clustered. This patch adds support to pass number of clustered bytes to target mem ops clustering logic. Reviewers: foad, rampitec, arsenm, vpykhtin, javedabsar Reviewed By: foad Subscribers: MatzeB, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, javed.absar, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D80545	2020-06-01 22:52:34 +05:30
Matt Arsenault	f012c58abd	AMDGPU: Move MIMG MMO check to verifier	2020-05-29 20:58:23 -04:00
Matt Arsenault	1a9e0d7092	AMDGPU: Make S_DENORM_MODE not be a scheduling boundary Now that the mode register uses/defs should be properly modeled, we don't need to treat the FP mode switch as an arbitrary side effect.	2020-05-28 10:39:33 -04:00
Stanislav Mekhanoshin	7392bbc301	AMDGPU/GlobalISel: Fixed insert element for non-standard vectors Differential Revision: https://reviews.llvm.org/D80653	2020-05-27 16:26:22 -07:00
alex-t	eb1092ada3	[AMDGPU] Fix for the lost CarryOut/CarryIn register operands in S_ADD/SUB_CO_PSEUDO. Summary: This fixes the `5b898bddff` bug when the carry-in and carry-out registers became lost in lowering S_ADD/SUB_CO_PSEUDO. Reviewers: rampitec, arsenm Reviewed By: arsenm Subscribers: msearles, arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D80158	2020-05-27 22:41:04 +03:00
Matt Arsenault	d37ce53ad3	AMDGPU: Set StackPointerRegisterToSaveRestore This will enable selecting non-entry block allocas. Skip the SP write check in the base isSchedulingBoundary implementation to preserve the previous scheduling behavior and avoid test churn. It's apparently for compile time reasons, but if we were to use this more work would be needed since in some of the failing tests, we seem to incorrectly get hazard nops inserted.	2020-05-27 13:44:05 -04:00
Matt Arsenault	07cd19efa2	AMDGPU: Fix dropping MI flags when rewriting instructions All 3 passes that change instruction encodings were dropping MI flags. This avoids scheduling regressions caused by setting mayRaiseFPExceptions on FP instructions for non-strictfp functions.	2020-05-27 13:27:06 -04:00
Matt Arsenault	833996cef1	AMDGPU: Fix backwards s_cselect_* operands The vector equivalent has backwards operands, but the scalar version does not. The passes that use these hooks aren't enabled by default, so this doesn't really change anything.	2020-05-27 09:26:09 -04:00
David Blaikie	025cd300cd	Collapse variable into assert to remove non-assert unused variable	2020-05-05 11:04:43 -07:00
Stanislav Mekhanoshin	9ef166e657	[AMDGPU] Fix FoldImmediate for 16 bit operand Differential Revision: https://reviews.llvm.org/D79362	2020-05-05 10:19:14 -07:00
Stanislav Mekhanoshin	c85eda74b8	[AMDGPU] fix copies between 32 and 16 bit This a hack to fix illegal 32 to 16 bit copies. The problem is when we make 16 bit subregs legal it creates a huge amount of failures which can only be resolved at once without a temporary hack like this. The next step is to change operands, instruction definitions and patterns until this hack is not needed. Differential Revision: https://reviews.llvm.org/D79119	2020-05-04 08:54:22 -07:00
alex-t	5b898bddff	[AMDGPU] Enable carry out ADD/SUB operations divergence driven instruction selection. Summary: This change enables all kind of carry out ISD opcodes to be selected according to the node divergence. Reviewers: rampitec, arsenm, vpykhtin Reviewed By: rampitec Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D78091	2020-05-04 16:42:25 +03:00
Stanislav Mekhanoshin	26777ad7a0	[AMDGPU] Adapt GCNRegBankReassign for 16 bit subregs It allows it not to crash and analyze 16 bit subregs if those appear in the instructions. At the same time it does not attempt to reassign these. It still can correctly identify register banks to let larger registers to be reassigned. More work will be needed here when real instructions will use these registers and more tests as well. Differential Revision: https://reviews.llvm.org/D78772	2020-04-28 16:16:04 -07:00
Stanislav Mekhanoshin	8a30460697	[AMDGPU] Define AGPR subregs These are only needed as VGPR counterpart. Differential Revision: https://reviews.llvm.org/D78597	2020-04-28 15:30:43 -07:00
Stanislav Mekhanoshin	46a75436f8	[AMDGPU] Define special SGPR subregs These are used in SReg_32 and when we start to use SGPR_LO16 there will be compaints that not all registers in RC support all subreg indexes. For now it is NFC. Unused regunits are reserved so that verifier does not complain about missing phys reg live-ins. Differential Revision: https://reviews.llvm.org/D78591	2020-04-28 14:57:46 -07:00
Stanislav Mekhanoshin	395d93358e	Revert "[AMDGPU] Define special SGPR subregs" This reverts commit `1baaa080e0`.	2020-04-28 13:53:15 -07:00
Stanislav Mekhanoshin	1baaa080e0	[AMDGPU] Define special SGPR subregs These are used in SReg_32 and when we start to use SGPR_LO16 there will be compaints that not all registers in RC support all subreg indexes. For now it is NFC. Unused regunits are reserved so that verifier does not complain about missing phys reg live-ins. Differential Revision: https://reviews.llvm.org/D78591	2020-04-28 13:34:24 -07:00
Stanislav Mekhanoshin	992fbce4e9	[AMDGPU] copyPhysReg() for 16 bit SGPR subregs Differential Revision: https://reviews.llvm.org/D78255	2020-04-17 11:59:39 -07:00
Stanislav Mekhanoshin	fde2aefa22	[AMDGPU] Use SDWA for 16 bit subreg copy This simplifies the logic and allows to use it on GFX8. Differential Revision: https://reviews.llvm.org/D78150	2020-04-17 11:45:44 -07:00
Michael Liao	b54b4ecac3	Fix `-Wextra` warning. NFC.	2020-04-10 03:22:02 -04:00
Stanislav Mekhanoshin	96e51ed005	[AMDGPU] Implement copyPhysReg for 16 bit subregs Differential Revision: https://reviews.llvm.org/D74937	2020-04-07 14:22:46 -07:00
Matt Arsenault	30ebafaa56	CodeGen: Convert some TII hooks to use Register	2020-04-03 14:52:54 -04:00
Matt Arsenault	178050c3ba	AMDGPU: Use Register in more places	2020-04-03 14:52:54 -04:00
Guillaume Chatelet	c9d5c19597	[Alignment][NFC] Transitionning more getMachineMemOperand call sites Summary: This is patch is part of a series to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790 Reviewers: courbet Subscribers: arsenm, dylanmckay, sdardis, nemanjai, jvesely, nhaehnle, hiraditya, kbarton, jrtc27, atanasyan, Jim, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D77121	2020-03-31 08:36:18 +00:00
David Stuttard	a74b33f612	AMDGPU: Fix SMRD test in trivially disjoint mem access code Summary: This seems like an obvious error - cut and paste issue? The change does make a change to one of the lit tests - it stops s_buffer_load re-ordering past an MUBUF instruction (which is not surprising). Change-Id: I80be99de5b62af4f42e91af2591b76a52ac9efa6 Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, tpr, t-tye, hiraditya, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D75686	2020-03-05 17:14:01 +00:00
Sander de Smalen	8fbc925807	Add OffsetIsScalable to getMemOperandWithOffset Summary: Making `Scale` a `TypeSize` in AArch64InstrInfo::getMemOpInfo, has the effect that all places where this information is used (notably, TargetInstrInfo::getMemOperandWithOffset) will need to consider Scale - and derived, Offset - possibly being scalable. This patch adds a new operand `bool &OffsetIsScalable` to TargetInstrInfo::getMemOperandWithOffset and fixes up all the places where this function is used, to consider the offset possibly being scalable. In most cases, this means bailing out because the algorithm does not (or cannot) support scalable offsets in places where it does some form of alias checking for example. Reviewers: rovka, efriedma, kristof.beyls Reviewed By: efriedma Subscribers: wuzish, kerbowa, MatzeB, arsenm, nemanjai, jvesely, nhaehnle, hiraditya, kbarton, javed.absar, asb, rbar, johnrusso, simoncook, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, rogfer01, MartinMosbeck, brucehoult, the_o, PkmX, jocewei, jsji, Jim, lenary, s.egerton, pzheng, sameer.abuasal, apazos, luismarques, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D72758	2020-02-18 15:53:29 +00:00
Sebastian Neubauer	8756869170	[AMDGPU] Add a16 feature to gfx10 Based on D72931 This adds a new feature called A16 which is enabled for gfx10. gfx9 keeps the R128A16 feature so it can share all the instruction encodings with gfx7/8. Differential Revision: https://reviews.llvm.org/D73956	2020-02-10 09:04:23 +01:00
Stanislav Mekhanoshin	cacc3b7a55	[AMDGPU] Cleanup assumptions about generated subregs We are using countPopulation on a LaneBitmask to determine a number of registers it covers. This is the assumption which does not necessarily need to be true. It is not changed but factored into a single call SIRegisterInfo::getNumCoveredRegs(). Some other places are cleaned up with respect to assumptions about subreg indexes values and tablegen behavior. Differential Revision: https://reviews.llvm.org/D74177	2020-02-06 17:39:24 -08:00
Jay Foad	2252cac694	[ANDGPU] getMemOperandsWithOffset: support BUF non-stack-access instructions with resource but no vaddr Summary: This enables clustering for many more BUF instructions. Reviewers: rampitec, arsenm, nhaehnle Subscribers: jvesely, wdng, hiraditya, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D73868	2020-02-03 22:49:30 +00:00
Jay Foad	05297b7cbe	[AMDGPU] getMemOperandsWithOffset: add resource operand for BUF instructions Summary: This prevents unwanted clustering of BUF instructions with the same vaddr but different resource descriptors. Reviewers: rampitec, arsenm, nhaehnle Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, hiraditya, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D73867	2020-02-03 17:06:09 +00:00
Simon Moll	5c8ba508b2	[NFC] unsigned->Register in storeRegTo/loadRegFromStack Summary: This patch makes progress on the 'unsigned -> Register' rewrite for `TargetInstrInfo::loadRegFromStack` and `TII::storeRegToStack`. Reviewers: arsenm, craig.topper, uweigand, jpienaar, atanasyan, venkatra, robertlytton, dylanmckay, t.p.northover, kparzysz, tstellar, k-ishizaka Reviewed By: arsenm Subscribers: wuzish, merge_guards_bot, jyknight, sdardis, nemanjai, jvesely, wdng, nhaehnle, hiraditya, kbarton, fedor.sergeev, asb, rbar, johnrusso, simoncook, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, rogfer01, MartinMosbeck, brucehoult, the_o, PkmX, jocewei, jsji, Jim, lenary, s.egerton, pzheng, sameer.abuasal, apazos, luismarques, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D73870	2020-02-03 14:22:16 +01:00
Jay Foad	d07a789579	[AMDGPU] Cluster FLAT instructions with both vaddr and saddr Reviewers: rampitec, arsenm Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D73634	2020-01-29 17:01:35 +00:00
Stanislav Mekhanoshin	c2ad7ee1a9	[AMDGPU] override isHighLatencyDef SIMachineScheduler uses isHighLatencyInstruction with the same sematincs, but TargetInstrInfo has virtual isHighLatencyDef method, so override it instead. Added FLAT to the list of high latency opcodes and a check for mayLoad since stores are not technically high latency in terms of data dependency. This change did not produce any visible impact on our tests. Differential Revision: https://reviews.llvm.org/D73582	2020-01-29 08:01:29 -08:00
Jay Foad	ad08c01d6c	[AMDGPU] Simplify DS and SM cases in getMemOperandsWithOffset Summary: This removes a couple of unnecessary isReg checks, now that memOpsHaveSameBasePtr can handle FI operands, but is otherwise NFC. Reviewers: arsenm, rampitec Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D73485	2020-01-29 09:43:24 +00:00
Jay Foad	1bf00219fc	[AMDGPU] Handle multiple base operands in areMemAccessesTriviallyDisjoint Summary: This is in preparation for getMemOperandsWithOffset returning more base operands. Depends on D73455. Reviewers: arsenm, rampitec Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D73456	2020-01-27 14:45:21 +00:00
Jay Foad	6461eadf8f	[AMDGPU] Handle multiple base operands in shouldClusterMemOps Summary: This is in preparation for getMemOperandsWithOffset returning more base operands. Depends on D73454. Reviewers: arsenm, rampitec Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D73455	2020-01-27 14:45:21 +00:00
Jay Foad	fcf5254fa7	[AMDGPU] Handle frame index base operands in memOpsHaveSameBasePtr Summary: This is in preparation for getMemOperandsWithOffset returning more base operands. Reviewers: arsenm, rampitec Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, arphaman, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D73454	2020-01-27 14:45:21 +00:00
Stanislav Mekhanoshin	be8e38cbd9	Correct NumLoads in clustering Scheduler sends NumLoads argument into shouldClusterMemOps() one less the actual cluster length. So for 2 instructions it will pass just 1. Correct this number. This is NFC for in tree targets. Differential Revision: https://reviews.llvm.org/D73292	2020-01-24 12:45:28 -08:00
Stanislav Mekhanoshin	555d8f4ef5	[AMDGPU] Bundle loads before post-RA scheduler We are relying on atrificial DAG edges inserted by the MemOpClusterMutation to keep loads and stores together in the post-RA scheduler. This does not work all the time since it allows to schedule a completely independent instruction in the middle of the cluster. Removed the DAG mutation and added pass to bundle already clustered instructions. These bundles are unpacked before the memory legalizer because it does not work with bundles but also because it allows to insert waitcounts in the middle of a store cluster. Removing artificial edges also allows a more relaxed scheduling. Differential Revision: https://reviews.llvm.org/D72737	2020-01-24 11:33:38 -08:00
Matt Arsenault	d1dbb5e471	AMDGPU/GlobalISel: Select G_INSERT_VECTOR_ELT	2020-01-22 11:00:49 -05:00
Jay Foad	e0f0d0e55c	[MachineScheduler] Allow clustering mem ops with complex addresses The generic BaseMemOpClusterMutation calls into TargetInstrInfo to analyze the address of each load/store instruction, and again to decide whether two instructions should be clustered. Previously this had to represent each address as a single base operand plus a constant byte offset. This patch extends it to support any number of base operands. The old target hook getMemOperandWithOffset is now a convenience function for callers that are only prepared to handle a single base operand. It calls the new more general target hook getMemOperandsWithOffset. The only requirements for the base operands returned by getMemOperandsWithOffset are: - they can be sorted by MemOpInfo::Compare, such that clusterable ops get sorted next to each other, and - shouldClusterMemOps knows what they mean. One simple follow-on is to enable clustering of AMDGPU FLAT instructions with both vaddr and saddr (base register + offset register). I've left a FIXME in the code for this case. Differential Revision: https://reviews.llvm.org/D71655	2020-01-22 14:28:24 +00:00
Amara Emerson	67a8775322	[AArch64] Don't generate gpr CSEL instructions in early-ifcvt if regclasses aren't compatible. In GlobalISel we may in some unfortunate circumstances generate PHIs with operands that are on separate banks. If-conversion doesn't currently check for that case and ends up generating a CSEL on AArch64 with incorrect register operands. Differential Revision: https://reviews.llvm.org/D72961	2020-01-21 16:51:31 -08:00
Matt Arsenault	8615eeb455	AMDGPU: Partially merge indirect register write handling `a785209bc2` switched to using a pseudos instead of manually tying operands on the regular instruction. The VGPR indexing mode path should have the same problems that change attempted to avoid, so these should use the same strategy. Use a single pseudo for the VGPR indexing mode and movreld paths, and expand it based on the subtarget later. These have essentially the same constraints, reading the index from m0. Switch from using an offset to the subregister index directly, instead of computing an offset and re-adding it back. Also add missing pseudos for existing register class sizes.	2020-01-20 17:19:16 -05:00
Stanislav Mekhanoshin	eca4474587	[AMDGPU] Fix getInstrLatency() always returning 1 We do not have InstrItinerary so generic getInstLatency() was always defaulting to return 1 cycle. We need to use TargetSchedModel instead to compute an instruction's latency. Differential Revision: https://reviews.llvm.org/D72655	2020-01-14 01:08:30 -08:00
Stanislav Mekhanoshin	cd69e4c74c	[AMDGPU] Fix bundle scheduling Bundles coming to scheduler considered free, i.e. zero latency. Fixed. Differential Revision: https://reviews.llvm.org/D72487	2020-01-09 15:56:36 -08:00
Matt Arsenault	e29ae3799b	TII: Fix using Register for a subregister index argument	2019-12-27 16:53:29 -05:00
Matt Arsenault	a37e958558	AMDGPU: Use correct DebugLoc	2019-12-27 08:49:43 -05:00
Jay Foad	c5c935ab66	Make more use of MachineInstr::mayLoadOrStore.	2019-12-19 11:51:52 +00:00
Jay Foad	0412f518dc	[AMDGPU] Fix typo in SIInstrInfo::memOpsHaveSameBasePtr Summary: The typo has been present since memOpsHaveSameBasePtr was introduced in r313208. It caused SIInstrInfo::shouldClusterMemOps to cluster more mem ops than it was supposed to. Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D71616	2019-12-17 18:54:27 +00:00
Kristof Beyls	870f39d310	Fix assertion failure in getMemOperandWithOffsetWidth This fixes an assertion failure that triggers inside getMemOperandWithOffset when Machine Sinking calls it on a MachineInstr that is not a memory operation. Different backends implement getMemOperandWithOffset differently: some return false on non-memory MachineInstrs, others assert. The Machine Sinking pass in at least SinkingPreventsImplicitNullCheck relies on getMemOperandWithOffset to return false on non-memory MachineInstrs, instead of asserting. This patch updates the documentation on getMemOperandWithOffset that it should return false on any MachineInstr it cannot handle, instead of asserting. It also adapts the in-tree backends accordingly where necessary. Differential Revision: https://reviews.llvm.org/D71359	2019-12-17 10:56:09 +00:00
Austin Kerbow	256ad954a9	AMDGPU: Reuse carry out register during FI elimination Summary: Pre gfx9 we need to scavenge a 64-bit SGPR to use as the carry out for an Add. If only one SGPR was available this crashed when trying to scavenge another 32bit SGPR to materialize the offset. Instead, reuse a 32-bit SGPR from the carry out as the offset register. Also prefer to use vcc for the unused carry out when it is available. Reviewers: arsenm, rampitec Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D70614	2019-11-28 10:13:48 -08:00
Dmitry Preobrazhensky	6778a62eb0	[AMDGPU][GFX10] Disabled v_movrel*[sdwa\|dpp] opcodes in codegen These opcodes use indirect register addressing so they need special handling by codegen (currently missing). Reviewers: vpykhtin, arsenm, rampitec Differential Revision: https://reviews.llvm.org/D70400	2019-11-20 17:57:50 +03:00
Matt Arsenault	31479d868e	AMDGPU: Change boolean content type to 0 or 1 The usage of target boolean checks is overly inflexible, since sext and zext of a compare are equally cheap. The choice is arbitrary, but using 0/1 to some degree is the choice of lower resistance since that's what most targets use. This enables a few combines that don't bother to support ZeroOrNegativeOneBooleanContent.	2019-11-15 13:43:47 +05:30
Matt Arsenault	e6c9a9af39	Use MCRegister in copyPhysReg	2019-11-11 14:42:33 +05:30
Matt Arsenault	d9e0a2942a	AMDGPU: Disallow spill folding with m0 copies readlane and writelane instructions are not allowed to use m0 as the data operand, so spilling them is tricky and would require an intermediate SGPR to spill it. Constrain the virtual register class in this caes to disallow the inline spiller from folding the m0 operand directly into the spill instruction. I copied this hack from AArch64 which has the same problem for $sp.	2019-10-30 14:56:33 -07:00
Stanislav Mekhanoshin	4c0251da14	[AMDGPU] Enable SGPR copy folding That used to fail in the last testcase function because after %0:sreg_64.sub0 was folded into %3:sreg_32_xm0_xexec COPY, it was further folded into S_STORE_DWORD_IMM. Its legal effective subreg class is SReg_32 while instruction expects more restricted SReg_32_XM0_EXEC. However, SIInstrInfo::isLegalRegOperand() passed the legality check and it was caught in the verifier. Borrowed code from the verifier to check for RC legality. Differential Revision: https://reviews.llvm.org/D69445	2019-10-25 15:08:30 -07:00
Changpeng Fang	1ce552f3ef	AMDGPU: Fix the broken dominator tree when creating waterfall loop for resource descriptor Summary: In loadSRsrcFromVGPR, if MBB is the same as Succ, Remiander is not the immediate dominator of Succ. Reviewer: arsenm Differential Revision: https://reviews.llvm.org/D69358	2019-10-25 13:08:04 -07:00
Matt Arsenault	dd6cf159ba	AMDGPU: Stop adding m0 implicit def to SGPR spills r375293 removed the SGPR spilling with scalar stores path, so this is no longer necessary. This also always had the defect of adding the def even when this path wasn't in use. llvm-svn: 375448	2019-10-21 19:42:29 +00:00
Stanislav Mekhanoshin	33092194f2	[AMDGPU] Select AGPR in PHI operand legalization If a PHI defines AGPR legalize its operands to AGPR. At the moment we can get an AGPR PHI with VGPR operands. I am not aware of any problems as it seems to be handled gracefully in RA, but this is not right anyway. It also slightly decreases VGPR pressure in some cases because we do not have to a copy via VGPR. Differential Revision: https://reviews.llvm.org/D69206 llvm-svn: 375446	2019-10-21 19:25:27 +00:00
Matt Arsenault	7cd57dcd5b	AMDGPU: Split flat offsets that don't fit in DAG We handle it this way for some other address spaces. Since r349196, SILoadStoreOptimizer has been trying to do this. This is after SIFoldOperands runs, which can change the addressing patterns. It's simpler to just split this earlier. llvm-svn: 375366	2019-10-20 17:34:44 +00:00
Matt Arsenault	f9a42ed0a7	AMDGPU: Relax 32-bit SGPR register class Mostly use SReg_32 instead of SReg_32_XM0 for arbitrary values. This will allow the register coalescer to do a better job eliminating copies to m0. For GlobalISel, as a terrible hack, use SGPR_32 for things that should use SCC until booleans are solved. llvm-svn: 375267	2019-10-18 18:26:37 +00:00
David Stuttard	2d6a2303f8	[AMDGPU] Fix-up cases where writelane has 2 SGPR operands Summary: Even though writelane doesn't have the same constraints as other valu instructions it still can't violate the >1 SGPR operand constraint Due to later register propagation (e.g. fixing up vgpr operands via readfirstlane) changing writelane to only have a single SGPR is tricky. This implementation puts a new check after SIFixSGPRCopies that prevents multiple SGPRs being used in any writelane instructions. The algorithm used is to check for trivial copy prop of suitable constants into one of the SGPR operands and perform that if possible. If this isn't possible put an explicit copy of Src1 SGPR into M0 and use that instead (this is allowable for writelane as the constraint is for SGPR read-port and not constant-bus access). Reviewers: rampitec, tpr, arsenm, nhaehnle Reviewed By: rampitec, arsenm, nhaehnle Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, mgorny, yaxunl, tpr, t-tye, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D51932 Change-Id: Ic7553fa57440f208d4dbc4794fc24345d7e0e9ea llvm-svn: 375004	2019-10-16 14:37:39 +00:00
Austin Kerbow	527e9f9a3f	AMDGPU: Fix infinite searches in SIFixSGPRCopies Summary: Two conditions could lead to infinite loops when processing PHI nodes in SIFixSGPRCopies. The first condition involves a REG_SEQUENCE that uses registers defined by both a PHI and a COPY. The second condition arises when a physical register is copied to a virtual register which is then used in a PHI node. If the same virtual register is copied to the same physical register, the result is an endless loop. %0:sgpr_64 = COPY $sgpr0_sgpr1 %2 = PHI %0, %bb.0, %1, %bb.1 $sgpr0_sgpr1 = COPY %0 Reviewers: alex-t, rampitec, arsenm Reviewed By: rampitec Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D68970 llvm-svn: 374944	2019-10-15 19:59:45 +00:00
Stanislav Mekhanoshin	1184c27fa5	[AMDGPU] Support mov dpp with 64 bit operands We define mov/update dpp intrinsics as overloaded but do not support i64, which is a practically useful type. Fix the selection and lowering. Differential Revision: https://reviews.llvm.org/D68673 llvm-svn: 374910	2019-10-15 16:41:15 +00:00
Alexander Timofeev	c4d256a590	[AMDGPU] Come back patch for the 'Assign register class for cross block values according to the divergence.' Detailed description: After https://reviews.llvm.org/D59990 submit several issues were discovered. Changes in common code were preserved but AMDGPU specific part was reverted to keep the backend working correctly. Discovered issues were addressed in the following commits: https://reviews.llvm.org/D67662 https://reviews.llvm.org/D67101 https://reviews.llvm.org/D63953 https://reviews.llvm.org/D63731 This change brings back AMDGPU specific changes. Reviewed by: rampitec, arsenm Differential Revision: https://reviews.llvm.org/D68635 llvm-svn: 374767	2019-10-14 12:01:10 +00:00
Matt Arsenault	12994a70cf	AMDGPU: Use SGPR_128 instead of SReg_128 for vregs SGPR_128 only includes the real allocatable SGPRs, and SReg_128 adds the additional non-allocatable TTMP registers. There's no point in allocating SReg_128 vregs. This shrinks the size of the classes regalloc needs to consider, which is usually good. llvm-svn: 374284	2019-10-10 07:11:33 +00:00
Matt Arsenault	27269054d2	GlobalISel: Add target pre-isel instructions Allows targets to introduce regbankselectable pseudo-instructions. Currently the closet feature to this is an intrinsic. However this requires creating a public intrinsic declaration. This litters the public intrinsic namespace with operations we don't necessarily want to expose to IR producers, and would rather leave as private to the backend. Use a new instruction bit. A previous attempt tried to keep using enum value ranges, but it turned into a mess. llvm-svn: 373937	2019-10-07 18:43:29 +00:00
Stanislav Mekhanoshin	1384c3a5b8	[AMDGPU] Fix illegal agpr use by VALU When SIFixSGPRCopies attempts to fix an illegal copy from vector to scalar register it calls moveToVALU(). A copy from an agpr to sgpr becomes a copy from agpr to agpr, which may result in the illegal register class at a use of this copy. Solution is to copy it always into a vgpr. This may result in a subsequent copy into an agpr if that is what really needed, however should not happen too often and likely will be folded later. The opposite situation may not happen because an sgpr is always illegal where agpr is legal, so such user instructions may not exist. Differential Revision: https://reviews.llvm.org/D68358 llvm-svn: 373544	2019-10-02 23:23:46 +00:00
Piotr Sobczak	265e94e657	[AMDGPU] Extend buffer intrinsics with swizzling Summary: Extend cachepolicy operand in the new VMEM buffer intrinsics to supply information whether the buffer data is swizzled. Also, propagate this information to MIR. Intrinsics updated: int_amdgcn_raw_buffer_load int_amdgcn_raw_buffer_load_format int_amdgcn_raw_buffer_store int_amdgcn_raw_buffer_store_format int_amdgcn_raw_tbuffer_load int_amdgcn_raw_tbuffer_store int_amdgcn_struct_buffer_load int_amdgcn_struct_buffer_load_format int_amdgcn_struct_buffer_store int_amdgcn_struct_buffer_store_format int_amdgcn_struct_tbuffer_load int_amdgcn_struct_tbuffer_store Furthermore, disable merging of VMEM buffer instructions in SI Load/Store optimizer, if the "swizzled" bit on the instruction is on. The default value of the bit is 0, meaning that data in buffer is linear and buffer instructions can be merged. There is no difference in the generated code with this commit. However, in the future it will be expected that front-ends use buffer intrinsics with correct "swizzled" bit set. Reviewers: arsenm, nhaehnle, tpr Reviewed By: nhaehnle Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, arphaman, jfb, Petar.Avramovic, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D68200 llvm-svn: 373491	2019-10-02 17:22:36 +00:00
Changpeng Fang	f5524f0451	Remove the AliasAnalysis argument in function areMemAccessesTriviallyDisjoint Reviewers: arsenm Differential Revision: https://reviews.llvm.org/D58360 llvm-svn: 373024	2019-09-26 22:53:44 +00:00
Simon Pilgrim	5f2d8b2618	[TargetInstrInfo] Let findCommutedOpIndices take const MachineInstr& Neither the base implementation of findCommutedOpIndices nor any in-tree target modifies the instruction passed in and there is no reason why they would in the future. Committed on behalf of @hvdijk (Harald van Dijk) Differential Revision: https://reviews.llvm.org/D66138 llvm-svn: 372882	2019-09-25 14:55:57 +00:00
Alexander Timofeev	6524a7a2b9	[AMDGPU]: PHI Elimination hooks added for custom COPY insertion. Fixed Defferential Revision: https://reviews.llvm.org/D67101 Reviewers: rampitec, vpykhtin llvm-svn: 372086	2019-09-17 09:08:58 +00:00
Alexander Timofeev	9ff70132bf	Revert for: [AMDGPU]: PHI Elimination hooks added for custom COPY insertion. llvm-svn: 371873	2019-09-13 17:37:30 +00:00
Matt Arsenault	8382ce5f1b	AMDGPU: Inline constant when materalizing FI with add on gfx9 This was relying on the SGPR usable for the carry out clobber to also be used for the input. There was no carry out on gfx9. With no carry out clobber to worry about, so the literal can just be directly used with a VOP2 add. llvm-svn: 371791	2019-09-12 23:46:46 +00:00
Michael Liao	7957d4c015	[AMDGPU] Fix crash in phi-elimination hook. Summary: - Pre-check in case there's just a single PHI insn. Reviewers: alex-t, rampitec, arsenm Subscribers: kzhuravl, jvesely, wdng, nhaehnle, dstuttard, tpr, t-tye, hiraditya, llvm-commits, yaxunl Tags: #llvm Differential Revision: https://reviews.llvm.org/D67451 llvm-svn: 371649	2019-09-11 19:55:20 +00:00
Alexander Timofeev	c2d292f839	[AMDGPU]: PHI Elimination hooks added for custom COPY insertion. Reviewers: rampitec, vpykhtin Differential Revision: https://reviews.llvm.org/D67101 llvm-svn: 371508	2019-09-10 10:58:57 +00:00
Matt Arsenault	60c8b8bcf2	AMDGPU: Allow getMemOperandWithOffset to analyze stack accesses Report soffset as a base register if the scratch resource can be ignored. llvm-svn: 371149	2019-09-05 23:54:35 +00:00
Matt Arsenault	84489b34f6	AMDGPU: Handle frame index expansion with no free SGPRs pre gfx9 Since an add instruction must produce an unused carry out, this requires additional SGPRs. This can be avoided by keeping the entire offset computation in SGPRs. If one SGPR is still available, this only costs one extra mov. If none are available, the entire computation can be done in place and reversed. This does assume the use is a VGPR operand. This was already assumed, and we currently only select frame indexes to VALU instructions. This should probably be fixed at some point to handle more possible MIR. llvm-svn: 370929	2019-09-04 17:12:57 +00:00
Matt Arsenault	216d8ff60b	AMDGPU: Don't use frame virtual registers SGPR spills aren't really handled after SILowerSGPRSpills. In order to directly control what happens if the scavenger needs to spill, the scavenger needs to be used directly. There is an alternative to spilling in these contexts anyway since the frame register can be increment and restored. This does present another possible issue if spilling is needed for the unused carry out if an add is needed. I think this can be avoided by using a scalar add (although that clobbers SCC, which happens anyway). llvm-svn: 370281	2019-08-29 01:13:47 +00:00
Stanislav Mekhanoshin	e6e1c4eac0	[AMDGPU] w/a for gfx908 mfma SrcC literal HW bug gfx908 ignores an mfma if SrcC is a literal. Differential Revision: https://reviews.llvm.org/D66670 llvm-svn: 369818	2019-08-23 22:22:29 +00:00
Alexander Timofeev	78347c979e	[AMDGPU] Prevent VGPR copies from moving across the EXEC mask definitions Differential Revision: https://reviews.llvm.org/D63731 Reviewers: qcolombet, rampitec llvm-svn: 369532	2019-08-21 15:15:04 +00:00
Matt Arsenault	4b7fc85c0b	Revert "AMDGPU: Fix iterator error when lowering SI_END_CF" This reverts r367500 and r369203. This is causing various test failures. llvm-svn: 369417	2019-08-20 17:45:25 +00:00
Daniel Sanders	0c47611131	Apply llvm-prefer-register-over-unsigned from clang-tidy to LLVM Summary: This clang-tidy check is looking for unsigned integer variables whose initializer starts with an implicit cast from llvm::Register and changes the type of the variable to llvm::Register (dropping the llvm:: where possible). Partial reverts in: X86FrameLowering.cpp - Some functions return unsigned and arguably should be MCRegister X86FixupLEAs.cpp - Some functions return unsigned and arguably should be MCRegister X86FrameLowering.cpp - Some functions return unsigned and arguably should be MCRegister HexagonBitSimplify.cpp - Function takes BitTracker::RegisterRef which appears to be unsigned& MachineVerifier.cpp - Ambiguous operator==() given MCRegister and const Register PPCFastISel.cpp - No Register::operator-=() PeepholeOptimizer.cpp - TargetInstrInfo::optimizeLoadInstr() takes an unsigned& MachineTraceMetrics.cpp - MachineTraceMetrics lacks a suitable constructor Manual fixups in: ARMFastISel.cpp - ARMEmitLoad() now takes a Register& instead of unsigned& HexagonSplitDouble.cpp - Ternary operator was ambiguous between unsigned/Register HexagonConstExtenders.cpp - Has a local class named Register, used llvm::Register instead of Register. PPCFastISel.cpp - PPCEmitLoad() now takes a Register& instead of unsigned& Depends on D65919 Reviewers: arsenm, bogner, craig.topper, RKSimon Reviewed By: arsenm Subscribers: RKSimon, craig.topper, lenary, aemerson, wuzish, jholewinski, MatzeB, qcolombet, dschuff, jyknight, dylanmckay, sdardis, nemanjai, jvesely, wdng, nhaehnle, sbc100, jgravelle-google, kristof.beyls, hiraditya, aheejin, kbarton, fedor.sergeev, javed.absar, asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, tpr, PkmX, jocewei, jsji, Petar.Avramovic, asbirlea, Jim, s.egerton, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D65962 llvm-svn: 369041	2019-08-15 19:22:08 +00:00
Austin Kerbow	a05c384132	Re-commit: [AMDGPU] Use S_DENORM_MODE for gfx10 Summary: During fdiv32 lowering use S_DENORM_MODE to select denorm mode in gfx10. Reviewers: arsenm, rampitec Reviewed By: arsenm, rampitec Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D65620 llvm-svn: 367969	2019-08-06 02:16:11 +00:00
Dmitri Gribenko	37aa8ad663	Revert "[AMDGPU] Use S_DENORM_MODE for gfx10" This reverts commit r367882. It broke the test MC/Disassembler/AMDGPU/gfx10_dasm_all.txt. llvm-svn: 367904	2019-08-05 18:36:43 +00:00
Austin Kerbow	8d229dbb47	[AMDGPU] Use S_DENORM_MODE for gfx10 Summary: During fdiv32 lowering use S_DENORM_MODE to select denorm mode in gfx10. Reviewers: arsenm, rampitec Reviewed By: arsenm, rampitec Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D65620 llvm-svn: 367882	2019-08-05 16:09:49 +00:00
Daniel Sanders	2bea69bf65	Finish moving TargetRegisterInfo::isVirtualRegister() and friends to llvm::Register as started by r367614. NFC llvm-svn: 367633	2019-08-01 23:27:28 +00:00
Matt Arsenault	d48324ff6f	Reapply "AMDGPU: Split block for si_end_cf" This reverts commit r359363, reapplying r357634 llvm-svn: 367500	2019-08-01 01:25:27 +00:00
Jay Foad	3bdcedbf3d	[AMDGPU] Fix typo in error message llvm-svn: 367235	2019-07-29 16:17:13 +00:00
Carl Ritson	00e89b428b	[AMDGPU] Add llvm.amdgcn.softwqm intrinsic Add llvm.amdgcn.softwqm intrinsic which behaves like llvm.amdgcn.wqm only if there is other WQM computation in the shader. Reviewers: nhaehnle, tpr Reviewed By: nhaehnle Subscribers: arsenm, kzhuravl, jvesely, wdng, yaxunl, dstuttard, t-tye, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D64935 llvm-svn: 367097	2019-07-26 09:54:12 +00:00
Matt Arsenault	85f3890126	AMDGPU: Force s_waitcnt after GWS instructions This is apparently required to be the immediately following instruction, so force it into a bundle with a waitcnt. llvm-svn: 366607	2019-07-19 19:47:30 +00:00
Matt Arsenault	35c96598b1	AMDGPU/GlobalISel: Select flat loads Now that the patterns use the new PatFrag address space support, the only blocker to importing most load patterns is the addressing mode complex patterns. llvm-svn: 366237	2019-07-16 18:05:29 +00:00
Jay Foad	27ec195f39	[AMDGPU] Fix DPP combiner check for exec modification Summary: r363675 changed the exec modification helper function, now called execMayBeModifiedBeforeUse, so that if no UseMI is specified it checks all instructions in the basic block, even beyond the last use. That meant that the DPP combiner no longer worked in any basic block that ended with a control flow instruction, and in particular it didn't work on code sequences generated by the atomic optimizer. Fix it by reinstating the old behaviour but in a new helper function execMayBeModifiedBeforeAnyUse, and limiting the number of instructions scanned. Reviewers: arsenm, vpykhtin Subscribers: kzhuravl, nemanjai, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, kbarton, MaskRay, jfb, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D64393 llvm-svn: 365910	2019-07-12 15:59:40 +00:00
Fangrui Song	b251cc0d91	Delete dead stores llvm-svn: 365903	2019-07-12 14:58:15 +00:00
Stanislav Mekhanoshin	937ff6e701	[AMDGPU] gfx908 agpr spilling Differential Revision: https://reviews.llvm.org/D64594 llvm-svn: 365833	2019-07-11 21:54:13 +00:00
Stanislav Mekhanoshin	e67cc380a8	[AMDGPU] gfx908 mfma support Differential Revision: https://reviews.llvm.org/D64584 llvm-svn: 365824	2019-07-11 21:19:33 +00:00
Matt Arsenault	71dfb7ec5c	AMDGPU: Make s34 the FP register Make the FP register callee saved. This is tricky because now the FP needs to be spilled in the prolog relative to the incoming SP register, rather than the frame register used throughout the rest of the function. I don't like how this bypassess the standard mechanism for CSR spills just to get the correct insert point. I may look for a better solution, since all CSR VGPRs may also need to have all lanes activated. Another option might be to make getFrameIndexReference change the base register if the frame index is a CSR, and then try to figure out the right insertion point in emitProlog. If there is a free VGPR lane available for SGPR spilling, try to use it for the FP. If that would require intrtoducing a new VGPR spill, try to use a free call clobbered SGPR. Only fallback to introducing a new VGPR spill as a last resort. This also doesn't attempt to handle SGPR spilling with scalar stores. llvm-svn: 365372	2019-07-08 19:03:38 +00:00
Nicolai Haehnle	7cfd99ab15	AMDGPU/GFX10: fix scratch resource descriptor Summary: The stride should depend on the wave size, not the hardware generation. Also, the 32_FLOAT format is 0x16, not 16; though that shouldn't be relevant. Change-Id: I088f93bf6708974d085d1c50967f119061da6dc6 Reviewers: arsenm, rampitec, mareko Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D63808 llvm-svn: 364788	2019-07-01 15:43:00 +00:00
Nicolai Haehnle	2710171a15	AMDGPU: Write LDS objects out as global symbols in code generation Summary: The symbols use the processor-specific SHN_AMDGPU_LDS section index introduced with a previous change. The linker is then expected to resolve relocations, which are also emitted. Initially disabled for HSA and PAL environments until they have caught up in terms of linker and runtime loader. Some notes: - The llvm.amdgcn.groupstaticsize intrinsics can no longer be lowered to a constant at compile times, which means some tests can no longer be applied. The current "solution" is a terrible hack, but the intrinsic isn't used by Mesa, so we can keep it for now. - We no longer know the full LDS size per kernel at compile time, which means that we can no longer generate a relevant error message at compile time. It would be possible to add a check for the size of individual variables, but ultimately the linker will have to perform the final check. Change-Id: If66dbf33fccfbf3609aefefa2558ac0850d42275 Reviewers: arsenm, rampitec, t-tye, b-sumner, jsjodin Subscribers: qcolombet, kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D61494 llvm-svn: 364297	2019-06-25 11:52:30 +00:00
Matt Arsenault	c67c484f36	AMDGPU: Don't clobber VCC in MUBUF addr64 emulation Introducing VCC defs during SIFixSGPRCopies is generally problematic. Avoid it by starting with the VOP3 form with the general condition register. This is the easiest to fix instance, but doesn't solve any specific problems I'm looking at. llvm-svn: 363904	2019-06-20 00:51:28 +00:00
Matt Arsenault	e4c2e9b016	AMDGPU: Consolidate some getGeneration checks This is incomplete, and ideally these would all be removed, but it's better to localize them to the subtarget first with comments about what they're for. llvm-svn: 363902	2019-06-19 23:54:58 +00:00
Matt Arsenault	4d000d2488	AMDGPU: Fix folding immediate into readfirstlane through reg_sequence The def instruction for the vreg may not match, because it may be folding through a reg_sequence. The assert was overly conservative and not necessary. It's not actually important if DefMI really defined the register, because the fold that will be done cares about the def of the value that will be folded. For some reason copies aren't making it through the reg_sequence, although they should. llvm-svn: 363876	2019-06-19 20:44:15 +00:00
Matt Arsenault	4d55d024be	Reapply "AMDGPU: Add ds_gws_init / ds_gws_barrier intrinsics" This reapplies r363678, using the correct chain for the CopyToReg for v0. glueCopyToM0 counterintuitively changes the operands of the original node. llvm-svn: 363870	2019-06-19 19:55:27 +00:00
Simon Pilgrim	128ce93c60	Revert rL363678 : AMDGPU: Add ds_gws_init / ds_gws_barrier intrinsics There may or may not be additional work to handle this correctly on SI/CI. ........ Breaks EXPENSIVE_CHECKS buildbots - http://lab.llvm.org:8011/builders/llvm-clang-x86_64-expensive-checks-win/builds/78/ llvm-svn: 363797	2019-06-19 13:00:54 +00:00
Matt Arsenault	8d35dcd703	AMDGPU: Add ds_gws_init / ds_gws_barrier intrinsics There may or may not be additional work to handle this correctly on SI/CI. llvm-svn: 363678	2019-06-18 13:19:57 +00:00
Matt Arsenault	f39f3bd056	AMDGPU: Change API for checking for exec modification Invert the name and return value to better reflect the imprecise nature. Force passing in the DefMI, since it's known in the 2 users and could possibly fail for an arbitrary vreg. Allow specifying a specific user instruction. Scan through use instructions, instead of use operands. Add scan thresholds instead of searching infinitely. Stop using a set to track seen uses. I didn't understand this usage, or why it would not check the last use. I don't think the use list has any particular order. llvm-svn: 363675	2019-06-18 12:48:36 +00:00
Sander de Smalen	5d6ee76c16	Describe stack-id as an enum This patch changes MIR stack-id from an integer to an enum, and adds printing/parsing support for this in MIR files. The default stack-id '0' is now renamed to 'default'. This should make MIR tests that have stack objects with different stack-ids more descriptive. It also clarifies code operating on StackID. Reviewers: arsenm, thegameg, qcolombet Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D60137 llvm-svn: 363533	2019-06-17 09:13:29 +00:00
Nicolai Haehnle	41abf2766e	AMDGPU: Prepare for explicit absolute relocations in code generation Summary: We will use absolute relocations for LDS symbols. Change-Id: I9a32795ed0ea835e433a787129cfe3c57ee9a325 Reviewers: arsenm, rampitec Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D61492 llvm-svn: 363517	2019-06-16 17:43:37 +00:00
Nicolai Haehnle	6d71be4e67	AMDGPU: Be explicit about whether the high-word in SI_PC_ADD_REL_OFFSET is 0 Summary: Instead of encoding a high-word of 0 using a fake TargetGlobalAddress, just use a literal target constant. This simplifies some subsequent changes. The generated assembly is now more explicit about the kind of relocation that is to be used. Change-Id: I066835202d23b5941fa7a358eb4b89e9b71ab6f8 Reviewers: arsenm, rampitec Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D61491 llvm-svn: 363516	2019-06-16 17:32:01 +00:00
Stanislav Mekhanoshin	5250021672	[AMDGPU] gfx10 conditional registers handling This is cpp source part of wave32 support, excluding overriden getRegClass(). Differential Revision: https://reviews.llvm.org/D63351 llvm-svn: 363513	2019-06-16 17:13:09 +00:00
Matt Arsenault	642f39c93e	AMDGPU: Fix missing const llvm-svn: 363383	2019-06-14 13:26:23 +00:00
Stanislav Mekhanoshin	245b5ba344	[AMDGPU] gfx1010 dpp16 and dpp8 Differential Revision: https://reviews.llvm.org/D63203 llvm-svn: 363186	2019-06-12 18:02:41 +00:00
Stanislav Mekhanoshin	5f581c9f08	[AMDGPU] gfx1010 premlane instructions Differential Revision: https://reviews.llvm.org/D63202 llvm-svn: 363185	2019-06-12 17:52:51 +00:00
Matt Arsenault	ddd2c9ac86	AMDGPU: Force skips around traps llvm-svn: 362852	2019-06-07 23:02:52 +00:00
Matt Arsenault	b6cfa129cc	AMDGPU: Insert skip branches over return blocks SIInsertSkips really doesn't understand the control flow, and makes very stupid assumptions about the block layout. This was able to get away with not skipping return blocks, since usually after structurization there is only one placed at the end of the function. Tail duplication can break this assumption. llvm-svn: 362754	2019-06-06 22:51:51 +00:00
Alexander Timofeev	37bd9bd137	[AMDGPU] Partial revert for the `ba447bae74` "Divergence driven ISel. Assign register class for cross block values according to the divergence." that discovered the design flaw leading to several issues that required to be solved before. This change reverts AMDGPU specific changes and keeps common part unaffected. llvm-svn: 362749	2019-06-06 21:13:02 +00:00
Matt Arsenault	b812b7a45e	AMDGPU: Invert frame index offset interpretation Since the beginning, the offset of a frame index has been consistently interpreted backwards. It was treating it as an offset from the scratch wave offset register as a frame register. The correct interpretation is the offset from the SP on entry to the function, before the prolog. Frame index elimination then should select either SP or another register as an FP. Treat the scratch wave offset on kernel entry as the pre-incremented SP. Rely more heavily on the standard hasFP and frame pointer elimination logic, and clean up the private reservation code. This saves a copy in most callee functions. The kernel prolog emission code is still kind of a mess relying on checking the uses of physical registers, which I would prefer to eliminate. Currently selection directly emits MUBUF instructions, which require using a reference to some register. Use the register chosen for SP, and then ignore this later. This should probably be cleaned up to use pseudos that don't refer to any specific base register until frame index elimination. Add a workaround for shaders using large numbers of SGPRs. I'm not sure these cases were ever working correctly, since as far as I can tell the logic for figuring out which SGPR is the scratch wave offset doesn't match up with the shader input initialization in the shader programming guide. llvm-svn: 362661	2019-06-05 22:20:47 +00:00
Matt Arsenault	0f8a764e8f	AMDGPU: Fix using 2 different enums for same operand flags These enums are really for the same namespace of flags set on arbitrary MachineOperands, so merge them to avoid value collisions. llvm-svn: 362640	2019-06-05 20:32:25 +00:00
Dmitry Preobrazhensky	9111f35f02	[AMDGPU][MC] Added support of SCC, VCCZ and EXECZ operands See bug 39292: https://bugs.llvm.org/show_bug.cgi?id=39292 Reviewers: rampitec, arsenm Differential Revision: https://reviews.llvm.org/D62660 llvm-svn: 362400	2019-06-03 13:51:24 +00:00
Alexander Timofeev	ba447bae74	[AMDGPU] Divergence driven ISel. Assign register class for cross block values according to the divergence. Details: To make instruction selection really divergence driven it is necessary to assign the correct register classes to the cross block values beforehand. For the divergent targets same value type requires different register classes dependent on the value divergence. Reviewers: rampitec, nhaehnle Differential Revision: https://reviews.llvm.org/D59990 This commit was reverted because of the build failure. The reason was mlformed patch. Build failure fixed. llvm-svn: 361741	2019-05-26 20:33:26 +00:00
Peter Collingbourne	3b93737446	Revert r361644, "[AMDGPU] Divergence driven ISel. Assign register class for cross block values according to the divergence." Broke sanitizer bots: http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux/builds/21694/steps/bootstrap%20clang/logs/stdio http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux-fast/builds/32478/steps/check-llvm%20asan/logs/stdio llvm-svn: 361688	2019-05-25 01:52:38 +00:00
Alexander Timofeev	dffedea014	[AMDGPU] Divergence driven ISel. Assign register class for cross block values according to the divergence. Details: To make instruction selection really divergence driven it is necessary to assign the correct register classes to the cross block values beforehand. For the divergent targets same value type requires different register classes dependent on the value divergence. Reviewers: rampitec, nhaehnle Differential Revision: https://reviews.llvm.org/D59990 llvm-svn: 361644	2019-05-24 15:32:18 +00:00
Matt Arsenault	ca64ef2043	MC: Allow getMaxInstLength to depend on the subtarget Keep it optional in cases this is ever needed in some global context. Currently it's only used for getting an upper bound inline asm code size. For AMDGPU, gfx10 increases the maximum instruction size to 20-bytes. This avoids penalizing older subtargets when estimating code size, and making some annoying branch relaxation test adjustments. llvm-svn: 361405	2019-05-22 16:28:41 +00:00
Matt Arsenault	2cba91b8db	AMDGPU: Assume calls read exec llvm-svn: 361333	2019-05-21 23:23:16 +00:00
Matt Arsenault	6dd08e335f	AMDGPU: Force skip branches over calls Unfortunately the way SIInsertSkips works is backwards, and is required for correctness. r338235 added handling of some special cases where skipping is mandatory to avoid side effects if no lanes are active. It conservatively handled asm correctly, but the same logic needs to apply to calls. Usually the call sequence code is larger than the skip threshold, although the way the count is computed is really broken, so I'm not sure if anything was likely to really hit this. llvm-svn: 361202	2019-05-20 22:04:42 +00:00
Stanislav Mekhanoshin	05791d90c9	[AMDGPU] Fixed handling of imemdiate i1 literals This bug was exposed by the rL360395. Differential Revision: https://reviews.llvm.org/D61812 llvm-svn: 360689	2019-05-14 16:18:00 +00:00
Nicolai Haehnle	79ea85c6af	AMDGPU: Verify that SOP2/SOPC instructions have at most one immediate operand Summary: No test case because I don't know of a way to trigger this, but I accidentally caused this to fail while working on a different change. Change-Id: I8015aa447fe27163cc4e4902205a203bd44bf7e3 Reviewers: arsenm, rampitec Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D61490 llvm-svn: 360123	2019-05-07 09:19:09 +00:00
Stanislav Mekhanoshin	491746a584	[AMDGPU] gfx1010 verifier changes Differential Revision: https://reviews.llvm.org/D61521 llvm-svn: 360095	2019-05-06 22:49:45 +00:00
Stanislav Mekhanoshin	971cb8b633	[AMDGPU] gfx1010: prefer V_MUL_LO_U32 over V_MUL_LO_I32 GFX10 deprecates v_mul_lo_i32 instruction, so choose u32 form for all targets. Differential Revision: https://reviews.llvm.org/D61525 llvm-svn: 360094	2019-05-06 22:27:05 +00:00
Stanislav Mekhanoshin	28a1936f6d	[AMDGPU] gfx1010: use fmac instructions Differential Revision: https://reviews.llvm.org/D61527 llvm-svn: 359959	2019-05-04 04:20:37 +00:00
Stanislav Mekhanoshin	5cf8167735	[AMDGPU] gfx1010 allows VOP3 to have a literal Differential Revision: https://reviews.llvm.org/D61413 llvm-svn: 359756	2019-05-02 04:01:39 +00:00
Stanislav Mekhanoshin	f2baae0abb	[AMDGPU] gfx1010 constant bus limit Constant bus limit has increased to 2 with GFX10. Differential Revision: https://reviews.llvm.org/D61404 llvm-svn: 359754	2019-05-02 03:47:23 +00:00
Stanislav Mekhanoshin	692560dc98	[AMDGPU] gfx1010 MIMG implementation Differential Revision: https://reviews.llvm.org/D61339 llvm-svn: 359698	2019-05-01 16:32:58 +00:00
Stanislav Mekhanoshin	a6322941ff	[AMDGPU] gfx1010 VMEM and SMEM implementation Differential Revision: https://reviews.llvm.org/D61330 llvm-svn: 359621	2019-04-30 22:08:23 +00:00
Mark Searles	76c5b62988	Revert "AMDGPU: Split block for si_end_cf" This reverts commit 7a6ef3004655dd86d722199c471ae78c28e31bb4. We discovered some internal test failures, so reverting for now. Differential Revision: https://reviews.llvm.org/D61213 llvm-svn: 359363	2019-04-27 00:51:18 +00:00
Stanislav Mekhanoshin	61beff020e	[AMDGPU] gfx1010 VOP3 and VOP3P implementation Differential Revision: https://reviews.llvm.org/D61202 llvm-svn: 359328	2019-04-26 17:56:03 +00:00
Stanislav Mekhanoshin	8f3da70eed	[AMDGPU] gfx1010 VOP2 changes Differential Revision: https://reviews.llvm.org/D61156 llvm-svn: 359316	2019-04-26 16:37:51 +00:00
Stanislav Mekhanoshin	cee607e414	[AMDGPU] Add gfx1010 target definitions Differential Revision: https://reviews.llvm.org/D61041 llvm-svn: 359113	2019-04-24 17:03:15 +00:00
Bjorn Pettersson	238c9d6308	[CodeGen] Add "const" to MachineInstr::mayAlias Summary: The basic idea here is to make it possible to use MachineInstr::mayAlias also when the MachineInstr is const (or the "Other" MachineInstr is const). The addition of const in MachineInstr::mayAlias then rippled down to the need for adding const in several other places, such as TargetTransformInfo::getMemOperandWithOffset. Reviewers: hfinkel Reviewed By: hfinkel Subscribers: hfinkel, MatzeB, arsenm, jvesely, nhaehnle, hiraditya, javed.absar, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D60856 llvm-svn: 358744	2019-04-19 09:08:38 +00:00
Matt Arsenault	396653f8a1	AMDGPU: Split block for si_end_cf Relying on no spill or other code being inserted before this was precarious. It relied on code diligently checking isBasicBlockPrologue which is likely to be forgotten. Ideally this could be done earlier, but this doesn't work because of phis. Any other instruction can't be placed before them, so we have to accept the position being incorrect during SSA. This avoids regressions in the fast register allocator rewrite from inverting the direction. llvm-svn: 357634	2019-04-03 20:53:20 +00:00
Neil Henning	0a30f33ce2	[AMDGPU] Pre-allocate WWM registers to reduce VGPR pressure. This change incorporates an effort by Connor Abbot to change how we deal with WWM operations potentially trashing valid values in inactive lanes. Previously, the SIFixWWMLiveness pass would work out which registers were being trashed within WWM regions, and ensure that the register allocator did not have any values it was depending on resident in those registers if the WWM section would trash them. This worked perfectly well, but would cause sometimes severe register pressure when the WWM section resided before divergent control flow (or at least that is where I mostly observed it). This fix instead runs through the WWM sections and pre allocates some registers for WWM. It then reserves these registers so that the register allocator cannot use them. This results in a significant register saving on some WWM shaders I'm working with (130 -> 104 VGPRs, with just this change!). Differential Revision: https://reviews.llvm.org/D59295 llvm-svn: 357400	2019-04-01 15:19:52 +00:00
Matt Arsenault	a353fd572a	AMDGPU: Make exec mask optimzations more resistant to block splits Also improve the check for SALU instructions to also ignore implicit_def and other fake instructions. llvm-svn: 357170	2019-03-28 14:01:39 +00:00
Matt Arsenault	28f97f1dbc	AMDGPU: Don't hardcode num defs for MUBUF instructions This shouldn't change anything since the no-ret atomics are selected later. llvm-svn: 357084	2019-03-27 16:12:29 +00:00
Matt Arsenault	bbc59d8d0d	AMDGPU: Fix areLoadsFromSameBasePtr for DS atomics The offset operand index is different for atomics. llvm-svn: 357073	2019-03-27 15:41:00 +00:00
Tim Renouf	033f99a2e5	[AMDGPU] Added v5i32 and v5f32 register classes They are not used by anything yet, but a subsequent commit will start using them for image ops that return 5 dwords. Differential Revision: https://reviews.llvm.org/D58903 Change-Id: I63e1904081e39a6d66e4eb96d51df25ad399d271 llvm-svn: 356735	2019-03-22 10:11:21 +00:00
Tim Renouf	361b5b2193	[AMDGPU] Support for v3i32/v3f32 Added support for dwordx3 for most load/store types, but not DS, and not intrinsics yet. SI (gfx6) does not have dwordx3 instructions, so they are not enabled there. Some of this patch is from Matt Arsenault, also of AMD. Differential Revision: https://reviews.llvm.org/D58902 Change-Id: I913ef54f1433a7149da8d72f4af54dbb13436bd9 llvm-svn: 356659	2019-03-21 12:01:21 +00:00
Michael Liao	efb4f9e568	[AMDGPU] Enable code selection using `s_mul_hi_u32`/`s_mul_hi_i32`. Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D59501 llvm-svn: 356405	2019-03-18 20:40:09 +00:00
Tim Renouf	cfdfba996b	[AMDGPU] Asm/disasm clamp modifier on vop3 int arithmetic Allow the clamp modifier on vop3 int arithmetic instructions in assembly and disassembly. This involved adding a clamp operand to the affected instructions in MIR and MC, and thus having to fix up several places in codegen and MIR tests. Differential Revision: https://reviews.llvm.org/D59267 Change-Id: Ic7775105f02a985b668fa658a0cd7837846a534e llvm-svn: 356399	2019-03-18 19:35:44 +00:00
Tim Renouf	2e94f6e584	[AMDGPU] Asm/disasm v_cndmask_b32_e64 with abs/neg source modifiers This commit allows v_cndmask_b32_e64 with abs, neg source modifiers on src0, src1 to be assembled and disassembled. This does appear to be allowed, even though they are floating point modifiers and the operand type is b32. To do this, I added src0_modifiers and src1_modifiers to the MachineInstr, which involved fixing up several places in codegen and mir tests. Differential Revision: https://reviews.llvm.org/D59191 Change-Id: I69bf4a8c73ebc65744f6110bb8fc4e937d79fbea llvm-svn: 356398	2019-03-18 19:25:39 +00:00
Michael Liao	6883d7e192	[AMDGPU] Fix SGPR fixing through SCC chaining Summary: - During the fixing of SGPR copying from VGPR, ensure users of SCC is properly propagated, i.e. * only propagate through live def of SCC, * skip the SCC-def inst itself, and * stop the propagation on the other SCC-def inst after checking its SCC-use first. Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D59362 llvm-svn: 356258	2019-03-15 12:42:21 +00:00
David Stuttard	20ea21c6ed	[AMDGPU] Add support for immediate operand for S_ENDPGM Summary: Add support for immediate operand in S_ENDPGM Change-Id: I0c56a076a10980f719fb2a8f16407e9c301013f6 Reviewers: alexshap Subscribers: qcolombet, arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, tpr, t-tye, eraman, arphaman, Petar.Avramovic, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D59213 llvm-svn: 355902	2019-03-12 09:52:58 +00:00
Matt Arsenault	f587fd9ce1	AMDGPU: Don't bother checking the chain in areLoadsFromSameBasePtr This is only called in contexts that are verifying the chain itself, and the query itself is only asking about the address. llvm-svn: 355723	2019-03-08 20:30:51 +00:00
Matt Arsenault	07f904befb	AMDGPU: Correct DS implementation of areLoadsFromSameBasePtr This was checking the wrong operands for the base register and the offsets. The indexes are shifted by the number of output registers from the machine instruction definition, and the chain is moved to the end. llvm-svn: 355722	2019-03-08 20:30:50 +00:00
Changpeng Fang	4cabf6d3b5	AMDGPU: Use MachineInstr::mayAlias to replace areMemAccessesTriviallyDisjoint in LoadStoreOptimizer pass. Summary: This is to fix a memory dependence bug in LoadStoreOptimizer. Reviewers: arsenm, rampitec Differential Revision: https://reviews.llvm.org/D58295 llvm-svn: 354295	2019-02-18 23:00:26 +00:00
Craig Topper	784929d045	Implementation of asm-goto support in LLVM This patch accompanies the RFC posted here: http://lists.llvm.org/pipermail/llvm-dev/2018-October/127239.html This patch adds a new CallBr IR instruction to support asm-goto inline assembly like gcc as used by the linux kernel. This instruction is both a call instruction and a terminator instruction with multiple successors. Only inline assembly usage is supported today. This also adds a new INLINEASM_BR opcode to SelectionDAG and MachineIR to represent an INLINEASM block that is also considered a terminator instruction. There will likely be more bug fixes and optimizations to follow this, but we felt it had reached a point where we would like to switch to an incremental development model. Patch by Craig Topper, Alexander Ivchenko, Mikhail Dvoretckii Differential Revision: https://reviews.llvm.org/D53765 llvm-svn: 353563	2019-02-08 20:48:56 +00:00
Valery Pykhtin	7fe97f8c7c	[AMDGPU] Fix DPP combiner Differential revision: https://reviews.llvm.org/D55444 dpp move with uses and old reg initializer should be in the same BB. bound_ctrl:0 is only considered when bank_mask and row_mask are fully enabled (0xF). Otherwise the old register value is checked for identity. Added add, subrev, and, or instructions to the old folding function. Kill flag is cleared for the src0 (DPP register) as it may be copied into more than one user. The pass is still disabled by default. llvm-svn: 353513	2019-02-08 11:59:48 +00:00
Matt Arsenault	cba0c6d0c9	AMDGPU: Don't rematerialize mov with implicit operands This was pulling the mov used for register indexing on gfx9 out of the loop. llvm-svn: 353101	2019-02-04 22:26:21 +00:00
Neil Henning	0799352026	[AMDGPU] Fix a weird WWM intrinsic issue. I found a really strange WWM issue through a very convoluted shader that essentially boils down to a bug in SIInstrInfo where canReadVGPR did not correctly identify that WWM is like a copy and can have a VGPR as its source. Differential Revision: https://reviews.llvm.org/D56002 llvm-svn: 352500	2019-01-29 14:28:17 +00:00
Matt Arsenault	cdd191d9db	AMDGPU: Add DS append/consume intrinsics Since these pass the pointer in m0 unlike other DS instructions, these need to worry about whether the address is uniform or not. This assumes the address is dynamically uniform, and just uses readfirstlane to get a copy into an SGPR. I don't know if these have the same 16-bit add for the addressing mode offset problem on SI or not, but I've just assumed they do. Also includes some misc. changes to avoid test differences between the LDS and GDS versions. llvm-svn: 352422	2019-01-28 20:14:49 +00:00
Stanislav Mekhanoshin	f92ed6966e	[AMDGPU] Fixed hazard recognizer to walk predecessors Fixes two problems with GCNHazardRecognizer: 1. It only scans up to 5 instructions emitted earlier. 2. It does not take control flow into account. An earlier instruction from the previous basic block is not necessarily a predecessor. At the same time a real predecessor block is not scanned. The patch provides a way to distinguish between scheduler and hazard recognizer mode. It is OK to work with emitted instructions in the scheduler because we do not really know what will be emitted later and its order. However, when pass works as a hazard recognizer the schedule is already finalized, and we have full access to the instructions for the whole function, so we can properly traverse predecessors and their instructions. Differential Revision: https://reviews.llvm.org/D56923 llvm-svn: 351759	2019-01-21 19:11:26 +00:00
Chandler Carruth	2946cd7010	Update the file headers across all of the LLVM projects in the monorepo to reflect the new license. We understand that people may be surprised that we're moving the header entirely to discuss the new license. We checked this carefully with the Foundation's lawyer and we believe this is the correct approach. Essentially, all code in the project is now made available by the LLVM project under our new license, so you will see that the license headers include that license only. Some of our contributors have contributed code under our old license, and accordingly, we have retained a copy of our old license notice in the top-level files in each project and repository. llvm-svn: 351636	2019-01-19 08:50:56 +00:00
Matt Arsenault	85af701e85	AMDGPU: Remove llvm.SI.load.const It's taken 3 years, but now all of the old AMDGPU and SI intrinsics are finally gone llvm-svn: 351586	2019-01-18 20:27:02 +00:00
Marek Olsak	c5cec5e1fa	AMDGPU: Add llvm.amdgcn.ds.ordered.add & swap Reviewers: arsenm, nhaehnle Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D52944 llvm-svn: 351351	2019-01-16 15:43:53 +00:00
David Stuttard	f77079f892	[AMDGPU] Add support for TFE/LWE in image intrinsics. 2nd try TFE and LWE support requires extra result registers that are written in the event of a failure in order to detect that failure case. The specific use-case that initiated these changes is sparse texture support. This means that if image intrinsics are used with either option turned on, the programmer must ensure that the return type can contain all of the expected results. This can result in redundant registers since the vector size must be a power-of-2. This change takes roughly 6 parts: 1. Modify the instruction defs in tablegen to add new instruction variants that can accomodate the extra return values. 2. Updates to lowerImage in SIISelLowering.cpp to accomodate setting TFE or LWE (where the bulk of the work for these instruction types is now done) 3. Extra verification code to catch cases where intrinsics have been used but insufficient return registers are used. 4. Modification to the adjustWritemask optimisation to account for TFE/LWE being enabled (requires extra registers to be maintained for error return value). 5. An extra pass to zero initialize the error value return - this is because if the error does not occur, the register is not written and thus must be zeroed before use. Also added a new (on by default) option to ensure ALL return values are zero-initialized that is required for sparse texture support. 6. Disable the inst_combine optimization in the presence of tfe/lwe (later TODO for this to re-enable and handle correctly). There's an additional fix now to avoid a dmask=0 For an image intrinsic with tfe where all result channels except tfe were unused, I was getting an image instruction with dmask=0 and only a single vgpr result for tfe. That is incorrect because the hardware assumes there is at least one vgpr result, plus the one for tfe. Fixed by forcing dmask to 1, which gives the desired two vgpr result with tfe in the second one. The TFE or LWE result is returned from the intrinsics using an aggregate type. Look in the test code provided to see how this works, but in essence IR code to invoke the intrinsic looks as follows: %v = call {<4 x float>,i32} @llvm.amdgcn.image.load.1d.v4f32i32.i32(i32 15, i32 %s, <8 x i32> %rsrc, i32 1, i32 0) %v.vec = extractvalue {<4 x float>, i32} %v, 0 %v.err = extractvalue {<4 x float>, i32} %v, 1 This re-submit of the change also includes a slight modification in SIISelLowering.cpp to work-around a compiler bug for the powerpc_le platform that caused a buildbot failure on a previous submission. Differential revision: https://reviews.llvm.org/D48826 Change-Id: If222bc03642e76cf98059a6bef5d5bffeda38dda Work around for ppcle compiler bug Change-Id: Ie284cf24b2271215be1b9dc95b485fd15000e32b llvm-svn: 351054	2019-01-14 11:55:24 +00:00
Valery Pykhtin	b7a459547d	Revert "[AMDGPU] Fix DPP combiner" This reverts commit e3e2923a39cbec3b3bc3a7d3f0e9a77a4115080e, svn revision rL350721 llvm-svn: 350730	2019-01-09 15:21:53 +00:00
Valery Pykhtin	1e0b5c719b	[AMDGPU] Fix DPP combiner Fixed issue with identity values and other cases, f32/f16 identity values to be added later. fma/mac instructions is disabled for now. Test is fully reworked, added comments. Other fixes: 1. dpp move with uses and old reg initializer should be in the same BB. 2. bound_ctrl:0 is only considered when bank_mask and row_mask are fully enabled (0xF). Othervise the old register value is checked for identity. 3. Added add, subrev, and, or instructions to the old folding function. 4. Kill flag is cleared for the src0 (DPP register) as it may be copied into more than one user. Differential revision: https://reviews.llvm.org/D55444 llvm-svn: 350721	2019-01-09 13:43:32 +00:00
Ron Lieberman	16de4fd2eb	[AMDGPU] Add sdwa support for ADD\|SUB U64 decomposed Pseudos The introduction of S_{ADD\|SUB}_U64_PSEUDO instructions which are decomposed into VOP3 instruction pairs for S_ADD_U64_PSEUDO: V_ADD_I32_e64 V_ADDC_U32_e64 and for S_SUB_U64_PSEUDO V_SUB_I32_e64 V_SUBB_U32_e64 preclude the use of SDWA to encode a constant. SDWA: Sub-Dword addressing is supported on VOP1 and VOP2 instructions, but not on VOP3 instructions. We desire to fold the bit-and operand into the instruction encoding for the V_ADD_I32 instruction. This requires that we transform the VOP3 into a VOP2 form of the instruction (_e32). %19:vgpr_32 = V_AND_B32_e32 255, killed %16:vgpr_32, implicit $exec %47:vgpr_32, %49:sreg_64_xexec = V_ADD_I32_e64 %26.sub0:vreg_64, %19:vgpr_32, implicit $exec %48:vgpr_32, dead %50:sreg_64_xexec = V_ADDC_U32_e64 %26.sub1:vreg_64, %54:vgpr_32, killed %49:sreg_64_xexec, implicit $exec which then allows the SDWA encoding and becomes %47:vgpr_32 = V_ADD_I32_sdwa 0, %26.sub0:vreg_64, 0, killed %16:vgpr_32, 0, 6, 0, 6, 0, implicit-def $vcc, implicit $exec %48:vgpr_32 = V_ADDC_U32_e32 0, %26.sub1:vreg_64, implicit-def $vcc, implicit $vcc, implicit $exec Differential Revision: https://reviews.llvm.org/D54882 llvm-svn: 348132	2018-12-03 13:04:54 +00:00
Graham Sellers	ba559ac058	[AMDGPU] Split 64-Bit XNOR to 64-Bit NOT/XOR The identity ~(x ^ y) == (~x ^ y) == (x ^ ~y) allows XNOR (XOR/NOT) to turn into NOT/XOR. Handling this case with its own split means we can make the NOT remain in the scalar unit. Previously, we split 64-bit XNOR into two 32-bit XNOR, then lowered. Now, we get three instructions (s_not, v_xor, v_xor) rather than four in the case where either of the sources is a scalar 64-bit. Add test cases to xnor.ll to attempt XNOR Vx, Sy and XNOR Sx, Vy. Also adding test that uses the opposite identity such that (~x ^ y) on the scalar unit (or vector for gfx906) can generate XNOR. This already worked, but I didn't see a test for it. Differential: https://reviews.llvm.org/D55071 llvm-svn: 348075	2018-12-01 12:27:53 +00:00
Nicolai Haehnle	a7b00058e0	AMDGPU: Divergence-driven selection of scalar buffer load intrinsics Summary: Moving SMRD to VMEM in SIFixSGPRCopies is rather bad for performance if the load is really uniform. So select the scalar load intrinsics directly to either VMEM or SMRD buffer loads based on divergence analysis. If an offset happens to end up in a VGPR -- either because a floating point calculation was involved, or due to other remaining deficiencies in SIFixSGPRCopies -- we use v_readfirstlane. There is some unrelated churn in tests since we now select MUBUF offsets in a unified way with non-scalar buffer loads. Change-Id: I170e6816323beb1348677b358c9d380865cd1a19 Reviewers: arsenm, alex-t, rampitec, tpr Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D53283 llvm-svn: 348050	2018-11-30 22:55:38 +00:00
Valery Pykhtin	3d9afa273f	[AMDGPU] Combine DPP mov with use instructions (VOP1/2/3) Introduces DPP pseudo instructions and the pass that combines DPP mov with subsequent uses. Differential revision: https://reviews.llvm.org/D53762 llvm-svn: 347993	2018-11-30 14:21:56 +00:00
David Stuttard	c6603861d8	Revert r347871 "Fix: Add support for TFE/LWE in image intrinsic" Also revert fix r347876 One of the buildbots was reporting a failure in some relevant tests that I can't repro or explain at present, so reverting until I can isolate. llvm-svn: 347911	2018-11-29 20:14:17 +00:00
Graham Sellers	04f7a4d2d2	[AMDGPU] Add and update scalar instructions This patch adds support for S_ANDN2, S_ORN2 32-bit and 64-bit instructions and adds splits to move them to the vector unit (for which there is no equivalent instruction). It modifies the way that the more complex scalar instructions are lowered to vector instructions by first breaking them down to sequences of simpler scalar instructions which are then lowered through the existing code paths. The pattern for S_XNOR has also been updated to apply inversion to one input rather than the output of the XOR as the result is equivalent and may allow leaving the NOT instruction on the scalar unit. A new tests for NAND, NOR, ANDN2 and ORN2 have been added, and existing tests now hit the new instructions (and have been modified accordingly). Differential: https://reviews.llvm.org/D54714 llvm-svn: 347877	2018-11-29 16:05:38 +00:00
David Stuttard	de02e4b1cc	Add support for TFE/LWE in image intrinsics TFE and LWE support requires extra result registers that are written in the event of a failure in order to detect that failure case. The specific use-case that initiated these changes is sparse texture support. This means that if image intrinsics are used with either option turned on, the programmer must ensure that the return type can contain all of the expected results. This can result in redundant registers since the vector size must be a power-of-2. This change takes roughly 6 parts: 1. Modify the instruction defs in tablegen to add new instruction variants that can accomodate the extra return values. 2. Updates to lowerImage in SIISelLowering.cpp to accomodate setting TFE or LWE (where the bulk of the work for these instruction types is now done) 3. Extra verification code to catch cases where intrinsics have been used but insufficient return registers are used. 4. Modification to the adjustWritemask optimisation to account for TFE/LWE being enabled (requires extra registers to be maintained for error return value). 5. An extra pass to zero initialize the error value return - this is because if the error does not occur, the register is not written and thus must be zeroed before use. Also added a new (on by default) option to ensure ALL return values are zero-initialized that is required for sparse texture support. 6. Disable the inst_combine optimization in the presence of tfe/lwe (later TODO for this to re-enable and handle correctly). There's an additional fix now to avoid a dmask=0 For an image intrinsic with tfe where all result channels except tfe were unused, I was getting an image instruction with dmask=0 and only a single vgpr result for tfe. That is incorrect because the hardware assumes there is at least one vgpr result, plus the one for tfe. Fixed by forcing dmask to 1, which gives the desired two vgpr result with tfe in the second one. The TFE or LWE result is returned from the intrinsics using an aggregate type. Look in the test code provided to see how this works, but in essence IR code to invoke the intrinsic looks as follows: %v = call {<4 x float>,i32} @llvm.amdgcn.image.load.1d.v4f32i32.i32(i32 15, i32 %s, <8 x i32> %rsrc, i32 1, i32 0) %v.vec = extractvalue {<4 x float>, i32} %v, 0 %v.err = extractvalue {<4 x float>, i32} %v, 1 Differential revision: https://reviews.llvm.org/D48826 Change-Id: If222bc03642e76cf98059a6bef5d5bffeda38dda llvm-svn: 347871	2018-11-29 15:21:13 +00:00
Francis Visoiu Mistrih	d7eebd6d83	[CodeGen][NFC] Make `TII::getMemOpBaseImmOfs` return a base operand Currently, instructions doing memory accesses through a base operand that is not a register can not be analyzed using `TII::getMemOpBaseRegImmOfs`. This means that functions such as `TII::shouldClusterMemOps` will bail out on instructions using an FI as a base instead of a register. The goal of this patch is to refactor all this to return a base operand instead of a base register. Then in a separate patch, I will add FI support to the mem op clustering in the MachineScheduler. Differential Revision: https://reviews.llvm.org/D54846 llvm-svn: 347746	2018-11-28 12:00:20 +00:00
Matt Arsenault	88ce3dcbc8	AMDGPU: Record SGPR spills when restoring too It's possible in some cases to have a restore present without a corresponding spill. Due to an apparent bug in D54366 <https://reviews.llvm.org/D54366>, only the restore for a register was emitted. It's probably always a bug for this to happen, but due to how SGPR spilling is implemented, this makes the issues appear worse than it is. llvm-svn: 347595	2018-11-26 21:28:40 +00:00
Matt Arsenault	eabb8dd015	AMDGPU: Fix analyzeBranch failing with pseudoterminators If a block had one of the _term instructions used for gluing exec modifying instructions to the end of the block, analyzeBranch would fail, preventing the verifier from catching a broken successor list. llvm-svn: 347027	2018-11-16 05:03:02 +00:00

... 2 3 4 5 6 ...

632 Commits