llvm-project

Commit Graph

Author	SHA1	Message	Date
Matt Arsenault	adc59d7076	AMDGPU: Assign enum name to stack ID Also assert that it is correct for SGPRs. There is currently a bug where stack slot coloring replaces SGPR spill FIs with one with the default ID, which results in a more confusing assert later about a dead object. llvm-svn: 330607	2018-04-23 15:51:26 +00:00
Nicolai Haehnle	7a87977fb2	AMDGPU: Legalize the operand of SI_INIT_M0 Summary: This fixes a case where the argument to a sendmsg intrinsic ends up in a VGPR, for whatever reason. The underlying performance issue is that a multiplication that can be an s_mul_i32 is instead needlessly generated as v_mul_u32_u24, but this is not addressed by this patch. Change-Id: I61fd4034314d5acdf6074632c30b65364dfa7328 Reviewers: arsenm, rampitec Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D45826 llvm-svn: 330393	2018-04-20 07:14:25 +00:00
Stanislav Mekhanoshin	160f85794d	[AMDGPU] Use packed literals with zero either lower or hi part Differential Revision: https://reviews.llvm.org/D45790 llvm-svn: 330365	2018-04-19 21:16:50 +00:00
David Blaikie	13e77db2df	Fix layering of MachineValueType.h by moving it from CodeGen to Support This is used by llvm tblgen as well as by LLVM Targets, so the only common place is Support for now. (maybe we need another target for these sorts of things - but for now I'm at least making them correct & we can make them better if/when people have strong feelings) llvm-svn: 328395	2018-03-23 23:58:25 +00:00
Matt Arsenault	69932e4d69	AMDGPU: Don't leave dead illegal VGPR->SGPR copies Normally DCE kills these, but at -O0 these get left behind leaving suspicious looking illegal copies. Replace with IMPLICIT_DEF to avoid iterator issues. llvm-svn: 327842	2018-03-19 14:07:15 +00:00
Tim Renouf	2a99fa2c08	[AMDGPU] added writelane intrinsic Summary: For use by LLPC SPV_AMD_shader_ballot extension. The v_writelane instruction was already implemented for use by SGPR spilling, but I had to add an extra dummy operand tied to the destination, to represent that all lanes except the selected one keep the old value of the destination register. .ll test changes were due to schedule changes caused by that new operand. Differential Revision: https://reviews.llvm.org/D42838 llvm-svn: 326353	2018-02-28 19:10:32 +00:00
Marek Olsak	7d92b7e23a	AMDGPU: Fix S_BUFFER_LOAD_DWORD_SGPR moveToVALU Author: Bas Nieuwenhuizen https://reviews.llvm.org/D42881 llvm-svn: 324353	2018-02-06 15:17:55 +00:00
Marek Olsak	d4bb329d0e	AMDGPU: Fold inline offset for loads properly in moveToVALU on GFX9 Summary: This enables load merging into x2, x4, which is driven by inline offsets. 6500 shaders are affected: Code Size in affected shaders: -15.14 % Reviewers: arsenm, nhaehnle Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D42078 llvm-svn: 323909	2018-01-31 20:18:11 +00:00
Matthias Braun	f1caa2833f	MachineFunction: Return reference from getFunction(); NFC The Function can never be nullptr so we can return a reference. llvm-svn: 320884	2017-12-15 22:22:58 +00:00
Sam Kolton	5f7f32c382	[AMDGPU] SDWA: add support for PRESERVE into SDWA peephole. Summary: Reviewers: arsenm, vpykhtin, rampitec Subscribers: kzhuravl, wdng, nhaehnle, mgorny, yaxunl, dstuttard, tpr, t-tye Differential Revision: https://reviews.llvm.org/D37817 llvm-svn: 319662	2017-12-04 16:22:32 +00:00
Matt Arsenault	686d5c728f	AMDGPU: Use carry-less adds in FI elimination llvm-svn: 319501	2017-11-30 23:42:30 +00:00
Matt Arsenault	84445dd13c	AMDGPU: Use gfx9 carry-less add/sub instructions llvm-svn: 319491	2017-11-30 22:51:26 +00:00
Nicolai Haehnle	39980dac0b	AMDGPU: Consistently check for immediates in SIInstrInfo::FoldImmediate Summary: The PeepholeOptimizer pass calls this function solely based on checking DefMI->isMoveImmediate(), which only checks the MoveImm bit of the instruction description. So it's up to FoldImmediate itself to properly check that DefMI actually moves from an immediate. I don't have a separate test case for this, but the next patch introduces a test case which happens to crash without this change. This error is caught by the assertion in MachineOperand::getImm(). Change-Id: I88e7cdbcf54d75e1a296822e6fe5f9a5f095bbf8 Reviewers: arsenm, rampitec Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D40342 llvm-svn: 319155	2017-11-28 08:41:50 +00:00
David Blaikie	b3bde2ea50	Fix a bunch more layering of CodeGen headers that are in Target All these headers already depend on CodeGen headers so moving them into CodeGen fixes the layering (since CodeGen depends on Target, not the other way around). llvm-svn: 318490	2017-11-17 01:07:10 +00:00
Matt Arsenault	301162c4fe	AMDGPU: Replace i64 add/sub lowering Use VOP3 add/addc like usual. This has some tradeoffs. Inline immediates fold a little better, but other constants are worse off. SIShrinkInstructions could be made smarter to handle these cases. This allows us to avoid selecting scalar adds where we need to track the carry in scc and replace its users. This makes it easier to use the carryless VALU adds. llvm-svn: 318340	2017-11-15 21:51:43 +00:00
Marek Olsak	ffadcb744b	AMDGPU: Fold immediate offset into BUFFER_LOAD_DWORD lowered from SMEM Summary: -5.3% code size in affected shaders. Changed stats only: 48486 shaders in 30489 tests Totals: SGPRS: 2086406 -> 2072430 (-0.67 %) VGPRS: 1626872 -> 1627960 (0.07 %) Spilled SGPRs: 7865 -> 7912 (0.60 %) Code Size: 60978060 -> 60188764 (-1.29 %) bytes Max Waves: 374530 -> 374342 (-0.05 %) Totals from affected shaders: SGPRS: 299664 -> 285688 (-4.66 %) VGPRS: 233844 -> 234932 (0.47 %) Spilled SGPRs: 3959 -> 4006 (1.19 %) Code Size: 14905272 -> 14115976 (-5.30 %) bytes Max Waves: 46202 -> 46014 (-0.41 %) Reviewers: arsenm, nhaehnle Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, llvm-commits, t-tye Differential Revision: https://reviews.llvm.org/D38915 llvm-svn: 317750	2017-11-09 01:52:17 +00:00
Marek Olsak	5914ece6aa	AMDGPU: Select s_buffer_load_dword with a non-constant SGPR offset Summary: Apps that benefit: - alien isolation - bioshock infinite - civilization: beyond earth - company of heroes 2 - dirt showdown - dota 2 - F1 2015 - grid autosport - hitman - legend of grimrock - serious sam 3: bfe - shadow warrior - talos principle - total war: warhammer - UE4 demos: effects cave, elemental, sun temple Reviewers: arsenm, nhaehnle Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, llvm-commits, t-tye Differential Revision: https://reviews.llvm.org/D38914 llvm-svn: 317038	2017-10-31 21:06:42 +00:00
Marek Olsak	ce76ea0394	AMDGPU: Add new intrinsic llvm.amdgcn.kill(i1) Summary: Kill the thread if operand 0 == false. llvm.amdgcn.wqm.vote can be applied to the operand. Also allow kill in all shader stages. Reviewers: arsenm, nhaehnle Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D38544 llvm-svn: 316427	2017-10-24 10:27:13 +00:00
Matt Arsenault	9ab1fa6803	AMDGPU: Fix not accounting for instruction size in bundles These were counted as 0. Fixes branch limit exceeded errors in some large programs. llvm-svn: 314944	2017-10-04 22:59:12 +00:00
Nicolai Haehnle	ce4ddd06da	AMDGPU: VALU carry-in and v_cndmask condition cannot be EXEC The hardware will only forward EXEC_LO; the high 32 bits will be zero. Additionally, inline constants do not work. At least, v_addc_u32_e64 v0, vcc, v0, v1, -1 which could conceivably be used to combine (v0 + v1 + 1) into a single instruction, acts as if all carry-in bits are zero. The llvm.amdgcn.ps.live test is adjusted; it would be nice to combine s_mov_b64 s[0:1], exec v_cndmask_b32_e64 v0, v1, v2, s[0:1] into v_mov_b32 v0, v3 but it's not particularly high priority. Fixes dEQP-GLES31.functional.shaders.helper_invocation.value.* llvm-svn: 314522	2017-09-29 15:37:31 +00:00
Matt Arsenault	fdcdd88d57	AMDGPU: Fix crash on immediate operand We can have a v_mac with an immediate src0. We can still fold if it's an inline immediate, otherwise it already uses the constant bus. llvm-svn: 313852	2017-09-21 00:45:59 +00:00
Konstantin Zhuravlyov	ca8946a376	AMDGPU: Start selecting s_xnor_{b32, b64} Differential Revision: https://reviews.llvm.org/D37981 llvm-svn: 313565	2017-09-18 21:22:45 +00:00
Jan Sjodin	1f2f57a7ea	Fix warnings in r313297. llvm-svn: 313302	2017-09-14 21:49:52 +00:00
Matt Arsenault	c317287fde	AMDGPU: Fix violating constant bus restriction You can't use madmk/madmk if it already uses an SGPR input. llvm-svn: 313298	2017-09-14 20:54:29 +00:00
Jan Sjodin	312ccf761c	Add AddresSpace to PseudoSourceValue. Differential Revision: https://reviews.llvm.org/D35089 llvm-svn: 313297	2017-09-14 20:53:51 +00:00
Matt Arsenault	ecb43ef1bc	AMDGPU: Don't spill SP reg like a normal CSR llvm-svn: 313217	2017-09-13 23:47:01 +00:00
Stanislav Mekhanoshin	7fe9a5d9b4	Allow target to decide when to cluster loads/stores in misched MachineScheduler when clustering loads or stores checks if base pointers point to the same memory. This check is done through comparison of base registers of two memory instructions. This works fine when instructions have separate offset operand. If they require a full calculated pointer such instructions can never be clustered according to such logic. Changed shouldClusterMemOps to accept base registers as well and let it decide what to do about it. Differential Revision: https://reviews.llvm.org/D37698 llvm-svn: 313208	2017-09-13 22:20:47 +00:00
Stanislav Mekhanoshin	710da42b86	[AMDGPU] Produce madak and madmk from the two-address pass These two instructions are normally selected, but when the two address pass converts mac into mad we end up with the mad where we could have one of these. Differential Revision: https://reviews.llvm.org/D37389 llvm-svn: 312928	2017-09-11 17:13:57 +00:00
Stanislav Mekhanoshin	949fac9e40	[AMDGPU] Fix shouldClusterMemOps to process flat loads Flat loads do not have vdata operand but have vdst instead. Differential Revision: https://reviews.llvm.org/D37502 llvm-svn: 312640	2017-09-06 15:31:30 +00:00
Eugene Zelenko	59e128266c	[AMDGPU] Fix some Clang-tidy modernize-use-using and Include What You Use warnings; other minor fixes (NFC). llvm-svn: 310328	2017-08-08 00:47:13 +00:00
Connor Abbott	66b9bd6e50	[AMDGPU] Implement llvm.amdgcn.set.inactive intrinsic Summary: This intrinsic lets us set inactive lanes to an identity value when implementing wavefront reductions. In combination with Whole Wavefront Mode, it lets inactive lanes be skipped over as required by GLSL/Vulkan. Lowering the intrinsic needs to happen post-RA so that RA knows that the destination isn't completely overwritten due to the EXEC shenanigans, so we need another pseudo-instruction to represent the un-lowered intrinsic. Reviewers: tstellar, arsenm Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye Differential Revision: https://reviews.llvm.org/D34719 llvm-svn: 310088	2017-08-04 18:36:54 +00:00
Connor Abbott	92638ab625	[AMDGPU] Add support for Whole Wavefront Mode Summary: Whole Wavefront Wode (WWM) is similar to WQM, except that all of the lanes are always enabled, regardless of control flow. This is required for implementing wavefront reductions in non-uniform control flow, where we need to use the inactive lanes to propagate intermediate results, so they need to be enabled. We need to propagate WWM to uses (unless they're explicitly marked as exact) so that they also propagate intermediate results correctly. We do the analysis and exec mask munging during the WQM pass, since there are interactions with WQM for things that require both WQM and WWM. For simplicity, WWM is entirely block-local -- blocks are never WWM on entry or exit of a block, and WWM is not propagated to the block level. This means that computations involving WWM cannot involve control flow, but we only ever plan to use WWM for a few limited purposes (none of which involve control flow) anyways. Shaders can ask for WWM using the @llvm.amdgcn.wwm intrinsic. There isn't yet a way to turn WWM off -- that will be added in a future change. Finally, it turns out that turning on inactive lanes causes a number of problems with register allocation. While the best long-term solution seems like teaching LLVM's register allocator about predication, for now we need to add some hacks to prevent ourselves from getting into trouble due to constraints that aren't currently expressed in LLVM. For the gory details, see the comments at the top of SIFixWWMLiveness.cpp. Reviewers: arsenm, nhaehnle, tpr Subscribers: kzhuravl, wdng, mgorny, yaxunl, dstuttard, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D35524 llvm-svn: 310087	2017-08-04 18:36:52 +00:00
Connor Abbott	8c217d0a29	[AMDGPU] Add an llvm.amdgcn.wqm intrinsic for WQM Summary: Previously, we assumed that certain types of instructions needed WQM in pixel shaders, particularly DS instructions and image sampling instructions. This was ok because with OpenGL, the assumption was correct. But we want to start using DPP instructions for derivatives as well as other things, so the assumption that we can infer whether to use WQM based on the instruction won't continue to hold. This intrinsic lets frontends like Mesa indicate what things need WQM based on their knowledge of the API, rather than second-guessing them in the backend. We need to keep around the old method of enabling WQM, but eventually we should remove it once Mesa catches up. For now, this will let us use DPP instructions for computing derivatives correctly. Reviewers: arsenm, tpr, nhaehnle Subscribers: kzhuravl, wdng, yaxunl, dstuttard, llvm-commits, t-tye Differential Revision: https://reviews.llvm.org/D35167 llvm-svn: 310085	2017-08-04 18:36:49 +00:00
Matt Arsenault	8623e8d864	AMDGPU: Pass special input registers to functions llvm-svn: 309998	2017-08-03 23:00:29 +00:00
Matt Arsenault	9608a2891d	AMDGPU: Make areMemAccessesTriviallyDisjoint more aware of segment flat Checking the encoding is insufficient since now there can be global or scratch instructions. llvm-svn: 309472	2017-07-29 01:26:21 +00:00
Matt Arsenault	37a58e03c7	AMDGPU: Fix getMemOpBaseRegImmOfs for flat with offsets llvm-svn: 308762	2017-07-21 18:06:36 +00:00
Matt Arsenault	db78273b6e	Add an ID field to StackObjects On AMDGPU SGPR spills are really spilled to another register. The spiller creates the spills to new frame index objects, which is used as a placeholder. This will eventually be replaced with a reference to a position in a VGPR to write to and the frame index deleted. It is most likely not a real stack location that can be shared with another stack object. This is a problem when StackSlotColoring decides it should combine a frame index used for a normal VGPR spill with a real stack location and a frame index used for an SGPR. Add an ID field so that StackSlotColoring has a way of knowing the different frame index types are incompatible. llvm-svn: 308673	2017-07-20 21:03:45 +00:00
Alfred Huang	5b27072f57	[AMDGPU] Do not insert an instruction into worklist twice in movetovalu In moveToVALU(), move to vector ALU is performed, all instrs in the use chain will be visited. We do not want the same node to be pushed to the visit worklist more than once. Differential Revision: https://reviews.llvm.org/D34726 llvm-svn: 308039	2017-07-14 17:56:55 +00:00
Simon Pilgrim	0f5b35059d	[AMDGPU] Fix -Wimplicit-fallthrough warnings. NFCI. llvm-svn: 307381	2017-07-07 10:18:57 +00:00
Matt Arsenault	3f031e75aa	AMDGPU: Add operand target flags serialization llvm-svn: 306995	2017-07-02 23:21:48 +00:00
Sam Kolton	a179d25b99	[AMDGPU] SDWA: several fixes for V_CVT and VOPC instructions Summary: 1. Instruction V_CVT_U32_F32 allow omod operand (see SIInstrInfo.td:1435). In fact this operand shouldn't be allowed here. This fix checks if SDWA pseudo instruction has OMod operand and then copy it. 2. There were several problems with support of VOPC instructions in SDWA peephole pass. Reviewers: tstellar, arsenm, vpykhtin, airlied, kzhuravl Subscribers: wdng, nhaehnle, yaxunl, dstuttard, tpr, sarnex, t-tye Differential Revision: https://reviews.llvm.org/D34626 llvm-svn: 306413	2017-06-27 15:02:23 +00:00
Nicolai Haehnle	43cc6c4e0f	AMDGPU: M0 operands to spill/restore opcodes are dead Summary: With scalar stores, M0 is clobbered and therefore marked as implicitly defined. However, it is also dead. This fixes an assertion when the Greedy Register Allocator decides to optimize a spill/restore pair away again (via tryHintsRecoloring). Reviewers: arsenm Subscribers: qcolombet, kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D33319 llvm-svn: 306375	2017-06-27 08:04:13 +00:00
Sam Kolton	3c4933fcc6	[AMDGPU] SDWA: add support for GFX9 in peephole pass Summary: Added support based on merged SDWA pseudo instructions. Now peephole allow one scalar operand, omod and clamp modifiers. Added several subtarget features for GFX9 SDWA. This diff also contains changes from D34026. Depends D34026 Reviewers: vpykhtin, rampitec, arsenm Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye Differential Revision: https://reviews.llvm.org/D34241 llvm-svn: 305986	2017-06-22 06:26:41 +00:00
Sam Kolton	549c89d2c9	[AMDGPU] SDWA: merge VI and GFX9 pseudo instructions Summary: Previously there were two separate pseudo instruction for SDWA on VI and on GFX9. Created one pseudo instruction that is union of both of them. Added verifier to check that operands conform either VI or GFX9. Reviewers: dp, arsenm, vpykhtin Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, artem.tamazov Differential Revision: https://reviews.llvm.org/D34026 llvm-svn: 305886	2017-06-21 08:53:38 +00:00
Matt Arsenault	05c26472fa	AMDGPU: Don't add same implicit use multiple times For the last component, the same register use was added as an implicit use and another implicit kill use. llvm-svn: 305205	2017-06-12 17:19:20 +00:00
Matt Arsenault	89ad17ce4c	AMDGPU: Verify that flat offsets aren't used pre-GFX9 For convenience the operand is always present in the instruction, but it isn't valid to use except on GFX9. llvm-svn: 305200	2017-06-12 16:37:55 +00:00
Chandler Carruth	6bda14b313	Sort the remaining #include lines in include/... and lib/.... I did this a long time ago with a janky python script, but now clang-format has built-in support for this. I fed clang-format every line with a #include and let it re-sort things according to the precise LLVM rules for include ordering baked into clang-format these days. I've reverted a number of files where the results of sorting includes isn't healthy. Either places where we have legacy code relying on particular include ordering (where possible, I'll fix these separately) or where we have particular formatting around #include lines that I didn't want to disturb in this patch. This patch is entirely mechanical. If you get merge conflicts or anything, just ignore the changes in this patch and run clang-format over your #include lines in the files. Sorry for any noise here, but it is important to keep these things stable. I was seeing an increasing number of patches with irrelevant re-ordering of #include lines because clang-format was used. This patch at least isolates that churn, makes it easy to skip when resolving conflicts, and gets us to a clean baseline (again). llvm-svn: 304787	2017-06-06 11:49:48 +00:00
Tom Stellard	dde28a8c92	AMDGPU/GlobalISel: Mark 32-bit float constants as legal Reviewers: arsenm Reviewed By: arsenm Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, rovka, kristof.beyls, igorb, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D33212 llvm-svn: 304003	2017-05-26 16:40:03 +00:00
Matt Arsenault	ea8a4ed588	AMDGPU: Use appropriate soffset for spilling This needs to be the frame offset register, and not the global scratch wave offset register. For kernels, these are the same. llvm-svn: 303287	2017-05-17 19:37:57 +00:00
NAKAMURA Takumi	994a43d27a	AMDGPUCodeGen: Fix warnings in r303111. [-Wunused-variable] llvm-svn: 303137	2017-05-16 04:01:23 +00:00
Jan Sjodin	a06bfe054e	Re-submit AMDGPUMachineCFGStructurizer. Differential Revision: https://reviews.llvm.org/D23209 llvm-svn: 303111	2017-05-15 20:18:37 +00:00
Jan Sjodin	0e289822fa	Revert 303091. llvm-svn: 303098	2017-05-15 18:39:47 +00:00
Jan Sjodin	e9d2ddc9dd	Add AMDGPUMachineCFGStructurizer. Differential Revision: https://reviews.llvm.org/D23209 llvm-svn: 303091	2017-05-15 18:13:56 +00:00
Krzysztof Parzyszek	44e25f37ae	Move size and alignment information of regclass to TargetRegisterInfo 1. RegisterClass::getSize() is split into two functions: - TargetRegisterInfo::getRegSizeInBits(const TargetRegisterClass &RC) const; - TargetRegisterInfo::getSpillSize(const TargetRegisterClass &RC) const; 2. RegisterClass::getAlignment() is replaced by: - TargetRegisterInfo::getSpillAlignment(const TargetRegisterClass &RC) const; This will allow making those values depend on subtarget features in the future. Differential Revision: https://reviews.llvm.org/D31783 llvm-svn: 301221	2017-04-24 18:55:33 +00:00
Nicolai Haehnle	5dea645138	AMDGPU: Move v_readlane lane select from VGPR to SGPR Summary: Fix a compiler bug when the lane select happens to end up in a VGPR. Clarify the semantic of the corresponding intrinsic to be that of the corresponding GLSL: the lane select must be uniform across a wave front, otherwise results are undefined. Reviewers: arsenm Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D32343 llvm-svn: 301197	2017-04-24 17:17:36 +00:00
Nicolai Haehnle	ef449787d8	AMDGPU: Fix crash when scheduling non-memory SMRD instructions Summary: Fixes piglit spec/arb_shader_clock/execution/* Reviewers: arsenm Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, llvm-commits, t-tye Differential Revision: https://reviews.llvm.org/D32345 llvm-svn: 301191	2017-04-24 16:53:52 +00:00
Konstantin Zhuravlyov	88938d4e67	AMDGPU: Fix S_PACK_HH_B32_B16 - We really ought to zero out lower 16 bits Differential Revision: https://reviews.llvm.org/D32356 llvm-svn: 301026	2017-04-21 19:35:05 +00:00
Stanislav Mekhanoshin	86b0a5465b	[AMDGPU] added SIInstrInfo::getAddNoCarry() helper Addressed rest of post submit comments from D31993. Differential Revision: https://reviews.llvm.org/D32057 llvm-svn: 300288	2017-04-14 00:33:44 +00:00
Konstantin Zhuravlyov	d24aeb20fc	AMDGPU/GFX9: Do not use v_pack_b32_f16 when packing Differential Revision: https://reviews.llvm.org/D31819 llvm-svn: 300275	2017-04-13 23:17:00 +00:00
Matt Arsenault	21a438255d	AMDGPU: Diagnose illegal SGPR to VGPR copies This is possible in ways that are not compiler bugs, so stop asserting on them. This emits an extra error when emitting objects when it can't encode the new pseudo, but I'm not sure that matters. llvm-svn: 299712	2017-04-06 21:09:53 +00:00
Sam Kolton	27e0f8bc72	[AMDGPU] SDWA Peephole: improve search for immediates in SDWA patterns Previously compiler often extracted common immediates into specific register, e.g.: ``` %vreg0 = S_MOV_B32 0xff; %vreg2 = V_AND_B32_e32 %vreg0, %vreg1 %vreg4 = V_AND_B32_e32 %vreg0, %vreg3 ``` Because of this SDWA peephole failed to find SDWA convertible pattern. E.g. in previous example this could be converted into 2 SDWA src operands: ``` SDWA src: %vreg2 src_sel:BYTE_0 SDWA src: %vreg4 src_sel:BYTE_0 ``` With this change peephole check if operand is either immediate or register that is copy of immediate. llvm-svn: 299202	2017-03-31 11:42:43 +00:00
Yaxun Liu	1a14bfa022	[AMDGPU] Get address space mapping by target triple environment As we introduced target triple environment amdgiz and amdgizcl, the address space values are no longer enums. We have to decide the value by target triple. The basic idea is to use struct AMDGPUAS to represent address space values. For address space values which are not depend on target triple, use static const members, so that they don't occupy extra memory space and is equivalent to a compile time constant. Since the struct is lightweight and cheap, it can be created on the fly at the point of usage. Or it can be added as member to a pass and created at the beginning of the run* function. Differential Revision: https://reviews.llvm.org/D31284 llvm-svn: 298846	2017-03-27 14:04:01 +00:00
Matt Arsenault	b8f8dbc227	AMDGPU: Unify divergent function exits. StructurizeCFG can't handle cases with multiple returns creating regions with multiple exits. Create a copy of UnifyFunctionExitNodes that only unifies exit nodes that skips exit nodes with uniform branch sources. llvm-svn: 298729	2017-03-24 19:52:05 +00:00
Marek Olsak	5c7a61d221	AMDGPU: Buffer descriptor changes for GFX9 Reviewers: arsenm Subscribers: qcolombet, kzhuravl, wdng, nhaehnle, yaxunl, tony-tye, dstuttard, tpr Differential Revision: https://reviews.llvm.org/D31158 llvm-svn: 298397	2017-03-21 17:00:39 +00:00
Matt Arsenault	3cb9ff8863	AMDGPU: Keep track of modifiers when converting v_mac to v_mad Since v_max_f32_e64/v_max_f16_e64 can be folded if the target instruction supports the clamp bit, we also need to maintain modifiers when converting v_mac to v_mad. This fixes a rendering issue with Dirt Rally because a v_mac instruction with the clamp bit set was converted to a v_mad but that bit was lost during the conversion. Fixes: e184e01dd79 ("AMDGPU: Fold FP clamp as modifier bit") Patch by Samuel Pitoiset <samuel.pitoiset@gmail.com> llvm-svn: 297556	2017-03-11 05:40:40 +00:00
Matt Arsenault	eb522e68bc	AMDGPU: Support v2i16/v2f16 packed operations llvm-svn: 296396	2017-02-27 22:15:25 +00:00
Matt Arsenault	2ed2193218	AMDGPU: Don't fold immediate if clamp/omod are set Doesn't fix any practical problems because clamp/omod are currently folded after peephole optimizer. llvm-svn: 296375	2017-02-27 20:21:31 +00:00
Matt Arsenault	e0bf7d02f0	AMDGPU: Don't use stack space for SGPR->VGPR spills Before frame offsets are calculated, try to eliminate the frame indexes used by SGPR spills. Then we can delete them after. I think for now we can be sure that no other instruction will be re-using the same frame indexes. It should be easy to notice if this assumption ever breaks since everything asserts if it tries to use a dead frame index later. The unused emergency stack slot seems to still be left behind, so an additional 4 bytes is still wasted. llvm-svn: 295753	2017-02-21 19:12:08 +00:00
Matt Arsenault	9dba9bd4cf	AMDGPU: Use source modifiers with f16->f32 conversions The operand types were defined to fit the fp16_to_fp node, which has the half as an integer type. v_cvt_f32_f16 does support source modifiers, so change this to have an FP type and modifiers. For targets without legal f16, this requires recognizing the bit operations and trying to produce them. llvm-svn: 293857	2017-02-02 02:27:04 +00:00
Matt Arsenault	74f64833bc	AMDGPU: Allow clustering flat memory operations llvm-svn: 293809	2017-02-01 20:22:51 +00:00
Matt Arsenault	9f5e0ef0c5	AMDGPU: Implement early ifcvt target hooks. Leave early ifcvt disabled for now since there are some shader-db regressions. This causes some immediate improvements, but could be better. The cost checking that the pass does is based on critical path length for out of order CPUs which we do not want so it skips out on many cases we want. llvm-svn: 293016	2017-01-25 04:25:02 +00:00
Stanislav Mekhanoshin	6ec3e3a728	[AMDGPU] Prevent spills before exec mask is restored Inline spiller can decide to move a spill as early as possible in the basic block. It will skip phis and label, but we also need to make sure it skips instructions in the basic block prologue which restore exec mask. Added isPositionLike callback in TargetInstrInfo to detect instructions which shall be skipped in addition to common phis, labels etc. Differential Revision: https://reviews.llvm.org/D27997 llvm-svn: 292554	2017-01-20 00:44:31 +00:00
Diana Picus	116bbab4e4	[CodeGen] Rename MachineInstrBuilder::addOperand. NFC Rename from addOperand to just add, to match the other method that has been added to MachineInstrBuilder for adding more than just 1 operand. See https://reviews.llvm.org/D28057 for the whole discussion. Differential Revision: https://reviews.llvm.org/D28556 llvm-svn: 291891	2017-01-13 09:58:52 +00:00
Matt Arsenault	4c1e9ec008	AMDGPU: Don't add same instruction multiple times to worklist When the instruction is processed the first time, it may be deleted resulting in crashes. While the new test adds the same user to the worklist twice, this particular case doesn't crash but I'm not sure why. llvm-svn: 290191	2016-12-20 18:55:06 +00:00
Matt Arsenault	4bd7236193	AMDGPU: Fix handling of 16-bit immediates Since 32-bit instructions with 32-bit input immediate behavior are used to materialize 16-bit constants in 32-bit registers for 16-bit instructions, determining the legality based on the size is incorrect. Change operands to have the size specified in the type. Also adds a workaround for a disassembler bug that produces an immediate MCOperand for an operand that is supposed to be OPERAND_REGISTER. The assembler appears to accept out of bounds immediates and truncates them, but this seems to be an issue for 32-bit already. llvm-svn: 289306	2016-12-10 00:39:12 +00:00
Tom Stellard	8485fa096e	AMDGPU : Add S_SETREG instructions to fix fdiv precision issues. Patch By: Wei Ding Summary: This patch fixes the fdiv precision issues. Reviewers: b-sumner, cfang, wdng, arsenm Subscribers: kzhuravl, nhaehnle, yaxunl, tony-tye Differential Revision: https://reviews.llvm.org/D26424 llvm-svn: 288879	2016-12-07 02:42:15 +00:00
Matt Arsenault	26faed3960	AMDGPU: Consolidate inline immediate predicate functions llvm-svn: 288718	2016-12-05 22:26:17 +00:00
Matt Arsenault	97279a8ca3	AMDGPU: Rename flat operands to match mubuf Use vaddr/vdst for the same purposes. This also fixes a beg in SIInsertWaits for the operand check. The stored value operand is currently called data0 in the single offset case, not data. llvm-svn: 288188	2016-11-29 19:30:44 +00:00
Matt Arsenault	437fd71f5b	AMDGPU: Use else if llvm-svn: 288187	2016-11-29 19:30:41 +00:00
Marek Olsak	79c05871a2	AMDGPU/SI: Add back reverted SGPR spilling code, but disable it suggested as a better solution by Matt llvm-svn: 287942	2016-11-25 17:37:09 +00:00
Marek Olsak	e3895bfb47	Revert "AMDGPU: Implement SGPR spilling with scalar stores" This reverts commit 4404d0d6e354e80dd7f8f0a0e12d8ad809cf007e. llvm-svn: 287936	2016-11-25 16:03:34 +00:00
Marek Olsak	a45dae458d	Revert "AMDGPU: Make m0 unallocatable" This reverts commit 124ad83dae04514f943902446520c859adee0e96. llvm-svn: 287932	2016-11-25 16:03:15 +00:00
Matt Arsenault	9e5c7b1031	AMDGPU: Make m0 unallocatable m0 may need to be written for spill code, so we don't want general code uses relying on the value stored in it. This introduces a few code quality regressions where copies from m0 are not coalesced into copies of a copy of m0. llvm-svn: 287841	2016-11-24 00:26:40 +00:00
Nicolai Haehnle	ce2b589df5	AMDGPU: Fix legalization of MUBUF instructions in shaders Summary: The addr64-based legalization is incorrect for MUBUF instructions with idxen set as well as for BUFFER_LOAD/STORE_FORMAT_* instructions. This affects e.g. shaders that access buffer textures. Since we never actually need the addr64-legalization in shaders, this patch takes the easy route and keys off the calling convention. If this ever affects (non-OpenGL) compute, the type of legalization needs to be chosen based on some TSFlag. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=98664 Reviewers: arsenm, tstellarAMD Subscribers: kzhuravl, wdng, yaxunl, tony-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D26747 llvm-svn: 287339	2016-11-18 11:55:52 +00:00
Tom Stellard	0d162b1c4f	AMDGPU/SI: Avoid creating unnecessary copies in the SIFixSGPRCopies pass Summary: 1. Don't try to copy values to and from the same register class. 2. Replace copies with of registers with immediate values with v_mov/s_mov instructions. The main purpose of this change is to make MachineSink do a better job of determining when it is beneficial to split a critical edge, since the pass assumes that copies will become move instructions. This prevents a regression in uniform-cfg.ll if we enable critical edge splitting for AMDGPU. Reviewers: arsenm Subscribers: arsenm, kzhuravl, llvm-commits Differential Revision: https://reviews.llvm.org/D23408 llvm-svn: 287131	2016-11-16 18:42:17 +00:00
Matt Arsenault	3666629837	AMDGPU: Analyze mubuf with immediate soffset Fixes giving up on clustering common addr64 accesses with constant 0 soffset. llvm-svn: 287018	2016-11-15 20:14:27 +00:00
Stanislav Mekhanoshin	ea91cca593	[AMDGPU] Add wave barrier builtin The wave barrier represents the discardable barrier. Its main purpose is to carry convergent attribute, thus preventing illegal CFG optimizations. All lanes in a wave come to convergence point simultaneously with SIMT, thus no special instruction is needed in the ISA. The barrier is discarded during code generation. Differential Revision: https://reviews.llvm.org/D26585 llvm-svn: 287007	2016-11-15 19:00:15 +00:00
Matt Arsenault	dc45274d54	AMDGPU: Implement SGPR spilling with scalar stores nThis avoids the nasty problems caused by using memory instructions that read the exec mask while spilling / restoring registers used for control flow masking, but only for VI when these were added. This always uses the scalar stores when enabled currently, but it may be better to still try to spill to a VGPR and use this on the fallback memory path. The cache also needs to be flushed before wave termination if a scalar store is used. llvm-svn: 286766	2016-11-13 18:20:54 +00:00
Konstantin Zhuravlyov	f86e4b7266	[AMDGPU] Add f16 support (VI+) Differential Revision: https://reviews.llvm.org/D25975 llvm-svn: 286753	2016-11-13 07:01:11 +00:00
Matt Arsenault	52f14ec596	AMDGPU: Preserve vcc undef flags when inverting branch If the branch was on a read-undef of vcc, passes that used analyzeBranch to invert the branch condition wouldn't preserve the undef flag resulting in a verifier error. Fixes verifier failures in a future commit. Also fix verifier error when inserting copy for vccz corruption bug. llvm-svn: 286133	2016-11-07 19:09:27 +00:00
Matt Arsenault	314cbf7a3b	AMDGPU: Refactor copyPhysReg Separate the subregister splitting logic to re-use later. llvm-svn: 286118	2016-11-07 16:39:22 +00:00
Nicolai Haehnle	368972c3b3	AMDGPU: Allow additional implicit operands on MOVRELS instructions Summary: The post-RA scheduler occasionally uses additional implicit operands when the vector implicit operand as a whole is killed, but some subregisters are still live because they are directly referenced later. Unfortunately, this seems incredibly subtle to reproduce. Fixes piglit spec/glsl-110/execution/variable-indexing/vs-temp-array-mat2-index-wr.shader_test and others. Reviewers: arsenm, tstellarAMD Subscribers: kzhuravl, wdng, yaxunl, tony-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D25656 llvm-svn: 285835	2016-11-02 17:03:11 +00:00
Matt Arsenault	3d463193a9	AMDGPU: Default to using scalar mov to materialize immediate This is the conservatively correct way because it's easy to move or replace a scalar immediate. This was incorrect in the case when the register class wasn't known from the static instruction definition, but still needed to be an SGPR. The main example of this is inlineasm has an SGPR constraint. Also start verifying the register classes of inlineasm operands. llvm-svn: 285762	2016-11-01 22:55:07 +00:00
Matt Arsenault	2d8c289b4b	AMDGPU: Workaround for instruction size with literals Instructions with a 32-bit base encoding with an optional 32-bit literal encoded after them report their size as 4 for the disassembler. Consider these when computing the MachineInstr size. This fixes problems caused by size estimate consistency in BranchRelaxation. llvm-svn: 285743	2016-11-01 20:42:24 +00:00
Matt Arsenault	c88ba36eab	AMDGPU: Use 1/2pi inline imm on VI I'm guessing at how it is supposed to be printed llvm-svn: 285490	2016-10-29 04:05:06 +00:00
Tom Stellard	6695ba0440	AMDGPU/SI: Don't use non-0 waitcnt values when waiting on Flat instructions Summary: Flat instruction can return out of order, so we need always need to wait for all the outstanding flat operations. Reviewers: tony-tye, arsenm Subscribers: kzhuravl, wdng, nhaehnle, llvm-commits, yaxunl Differential Revision: https://reviews.llvm.org/D25998 llvm-svn: 285479	2016-10-28 23:53:48 +00:00
Matt Arsenault	7b6475568d	AMDGPU: Add definitions for scalar store instructions Also add glc bit to the scalar loads since they exist on VI and change the caching behavior. This currently has an assembler bug where the glc bit is incorrectly accepted on SI/CI which do not have it. llvm-svn: 285463	2016-10-28 21:55:15 +00:00
Matt Arsenault	08906a3c62	AMDGPU: Fix using incorrect private resource with no allocation It's possible to have a use of the private resource descriptor or scratch wave offset registers even though there are no allocated stack objects. This would result in continuing to use the maximum number reserved registers. This could go over the number of SGPRs available on VI, or violate the SGPR limit requested by the function attributes. llvm-svn: 285435	2016-10-28 19:43:31 +00:00
Matt Arsenault	1110f14b42	AMDGPU: Fix counting si_mask_branch as 4 bytes llvm-svn: 285202	2016-10-26 14:53:54 +00:00
Nicolai Haehnle	a785209bc2	AMDGPU: Fix Two Address problems with v_movreld Summary: The v_movreld machine instruction is used with three operands that are in a sense tied to each other (the explicit VGPR_32 def and the implicit VGPR_NN def and use). There is no way to express that using the currently available operand bits, and indeed there are cases where the Two Address instructions pass does the wrong thing. This patch introduces a new set of pseudo instructions that are identical in intended semantics as v_movreld, but they only have two tied operands. Having to add a new set of pseudo instructions is admittedly annoying, but it's a fairly straightforward and solid approach. The only alternative I see is to try to teach the Two Address instructions pass about Three Address instructions, and I'm afraid that's trickier and is going to end up more fragile. Note that v_movrels does not suffer from this problem, and so this patch does not touch it. This fixes several GL45-CTS.shaders.indexing.* tests. Reviewers: tstellarAMD, arsenm Subscribers: kzhuravl, wdng, yaxunl, llvm-commits, tony-tye Differential Revision: https://reviews.llvm.org/D25633 llvm-svn: 284980	2016-10-24 14:56:02 +00:00

1 2 3 4 5 ...

302 Commits