llvm-project

Commit Graph

Author	SHA1	Message	Date
Matt Arsenault	35b1902bce	AMDGPU: Move canShrink into TII llvm-svn: 340855	2018-08-28 18:22:34 +00:00
Ryan Taylor	1f334d0062	[AMDGPU] Add support for a16 modifiear for gfx9 Summary: Adding support for a16 for gfx9. A16 bit replaces r128 bit for gfx9. Change-Id: Ie8b881e4e6d2f023fb5e0150420893513e5f4841 Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, jfb, llvm-commits Differential Revision: https://reviews.llvm.org/D50575 llvm-svn: 340831	2018-08-28 15:07:30 +00:00
Tim Renouf	904343f879	[AMDGPU] Add support for multi-dword s.buffer.load intrinsic Summary: Patch by Marek Olsak and David Stuttard, both of AMD. This adds a new amdgcn intrinsic supporting s.buffer.load, in particular multiple dword variants. These are convenient to use from some front-end implementations. Also modified the existing llvm.SI.load.const intrinsic to common up the underlying implementation. This modification also requires that we can lower to non-uniform loads correctly by splitting larger dword variants into sizes supported by the non-uniform versions of the load. V2: Addressed minor review comments. V3: i1 glc is now i32 cachepolicy for consistency with buffer and tbuffer intrinsics, plus fixed formatting issue. V4: Added glc test. Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D51098 Change-Id: I83a6e00681158bb243591a94a51c7baa445f169b llvm-svn: 340684	2018-08-25 14:53:17 +00:00
Samuel Pitoiset	7bd9dcffcd	AMDGPU: bump AS.MAX_COMMON_ADDRESS to 6 since 32-bit addr space 32-bit constant address space is declared as 6, so the maximum number of address spaces is 6, not 5. Fixes "LLVM ERROR: Pointer address space out of range". v5: rename MAX_COMMON_ADDRESS to MAX_AMDGPU_ADDRESS v4: - fix compilation issues - fix out of bounds access v3: use static_assert() v2: add a very simple test for 32-bit addr space Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=106630 llvm-svn: 340417	2018-08-22 16:08:48 +00:00
Samuel Pitoiset	d81d6f7d58	AMDGPU: fix existing alias rules for constant and global Constant and global may alias, also one rules table wasn't ordered correctly. Pinpointed by Matt. v2: add a test with swapped parameters llvm-svn: 340416	2018-08-22 16:08:43 +00:00
Matt Arsenault	bb8e64e7f5	AMDGPU: Fix not respecting byval alignment in call frame setup This was hackily adding in the 4-bytes reserved for the callee's emergency stack slot. Treat it like a normal stack allocation so we get the correct alignment padding behavior. This fixes an inconsistency between the caller and callee. llvm-svn: 340396	2018-08-22 11:09:45 +00:00
Alina Sbirlea	ab6f84f763	Update MemorySSA in BasicBlockUtils. Summary: Extend BasicBlocksUtils to update MemorySSA. Subscribers: sanjoy, arsenm, nhaehnle, jlebar, Prazek, llvm-commits Differential Revision: https://reviews.llvm.org/D45300 llvm-svn: 340365	2018-08-21 23:32:03 +00:00
Scott Linder	72855e36c5	[AMDGPU] Consider loads from flat addrspace to be potentially divergent In general we can't assume flat loads are uniform, and cases where we can prove they are should be handled through infer-address-spaces. Differential Revision: https://reviews.llvm.org/D50991 llvm-svn: 340343	2018-08-21 21:24:31 +00:00
Farhana Aleen	3528c80378	[AMDGPU] Support idot2 pattern. Summary: Transform add (mul ((i32)S0.x, (i32)S1.x), add( mul ((i32)S0.y, (i32)S1.y), (i32)S3) => i/udot2((v2i16)S0, (v2i16)S1, (i32)S3) Author: FarhanaAleen Reviewed By: arsenm Subscribers: llvm-commits, AMDGPU Differential Revision: https://reviews.llvm.org/D50024 llvm-svn: 340295	2018-08-21 16:21:15 +00:00
Tim Renouf	bb5ee41ab4	[AMDGPU] Allow int types for MUBUF vdata Summary: Previously the new llvm.amdgcn.raw/struct.buffer.load/store intrinsics only allowed float types for the data to be loaded or stored, which sometimes meant the frontend needed to generate a bitcast. In this, the new intrinsics copied the old buffer intrinsics. This commit extends the new intrinsics to allow int types as well. Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D50315 Change-Id: I8202af2d036455553681dcbb3d7d32ae273f8f85 llvm-svn: 340270	2018-08-21 11:08:12 +00:00
Tim Renouf	4f703f5e11	[AMDGPU] New buffer intrinsics Summary: This commit adds new intrinsics llvm.amdgcn.raw.buffer.load llvm.amdgcn.raw.buffer.load.format llvm.amdgcn.raw.buffer.load.format.d16 llvm.amdgcn.struct.buffer.load llvm.amdgcn.struct.buffer.load.format llvm.amdgcn.struct.buffer.load.format.d16 llvm.amdgcn.raw.buffer.store llvm.amdgcn.raw.buffer.store.format llvm.amdgcn.raw.buffer.store.format.d16 llvm.amdgcn.struct.buffer.store llvm.amdgcn.struct.buffer.store.format llvm.amdgcn.struct.buffer.store.format.d16 llvm.amdgcn.raw.buffer.atomic.* llvm.amdgcn.struct.buffer.atomic.* with the following changes from the llvm.amdgcn.buffer.* intrinsics: * there are separate raw and struct versions: raw does not have an index arg and sets idxen=0 in the instruction, and struct always sets idxen=1 in the instruction even if the index is 0, to allow for the fact that gfx9 does bounds checking differently depending on whether idxen is set; * there is a combined cachepolicy arg (glc+slc) * there are now only two offset args: one for the offset that is included in bounds checking and swizzling, to be split between the instruction's voffset and immoffset fields, and one for the offset that is excluded from bounds checking and swizzling, to go into the instruction's soffset field. The AMDISD::BUFFER_* SD nodes always have an index operand, all three offset operands, combined cachepolicy operand, and an extra idxen operand. The obsolescent llvm.amdgcn.buffer.* intrinsics continue to work. Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, t-tye, jfb, llvm-commits Differential Revision: https://reviews.llvm.org/D50306 Change-Id: If897ea7dc34fcbf4d5496e98cc99a934f62fc205 llvm-svn: 340269	2018-08-21 11:07:10 +00:00
Tim Renouf	35484c9d50	[AMDGPU] New tbuffer intrinsics Summary: This commit adds new intrinsics llvm.amdgcn.raw.tbuffer.load llvm.amdgcn.struct.tbuffer.load llvm.amdgcn.raw.tbuffer.store llvm.amdgcn.struct.tbuffer.store with the following changes from the llvm.amdgcn.tbuffer.* intrinsics: * there are separate raw and struct versions: raw does not have an index arg and sets idxen=0 in the instruction, and struct always sets idxen=1 in the instruction even if the index is 0, to allow for the fact that gfx9 does bounds checking differently depending on whether idxen is set; * there is a combined format arg (dfmt+nfmt) * there is a combined cachepolicy arg (glc+slc) * there are now only two offset args: one for the offset that is included in bounds checking and swizzling, to be split between the instruction's voffset and immoffset fields, and one for the offset that is excluded from bounds checking and swizzling, to go into the instruction's soffset field. The AMDISD::TBUFFER_* SD nodes always have an index operand, all three offset operands, combined format operand, combined cachepolicy operand, and an extra idxen operand. The tbuffer pseudo- and real instructions now also have a combined format operand. The obsolescent llvm.amdgcn.tbuffer.* and llvm.SI.tbuffer.store intrinsics continue to work. V2: Separate raw and struct intrinsics. V3: Moved extract_glc and extract_slc defs to a more sensible place. V4: Rebased on D49995. V5: Only two separate offset args instead of three. V6: Pseudo- and real instructions have joint format operand. V7: Restored optionality of dfmt and nfmt in assembler. V8: Addressed minor review comments. Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D49026 Change-Id: If22ad77e349fac3a5d2f72dda53c010377d470d4 llvm-svn: 340268	2018-08-21 11:06:05 +00:00
Vitaly Buka	30b5ed3eb7	Revert "AMDGPU: bump AS.MAX_COMMON_ADDRESS to 6 since 32-bit addr space" As it introduces out of bound access. This reverts commit r340172 and r340171 llvm-svn: 340202	2018-08-20 19:31:03 +00:00
Marcello Maggioni	5ca4128b45	[PSV] Update API to be able to use TargetCustom without UB. getTargetCustom() requires values for "Kind" in the constructor that are not in the PSVKind enum. Passing a value that is not inside an enum as an argument to a constructor of the type of the enum is UB. Changing to the underlying type of the enum would solve the UB Differential Revision: https://reviews.llvm.org/D50909 llvm-svn: 340200	2018-08-20 19:23:45 +00:00
Samuel Pitoiset	216a2da577	AMDGPU: fix compilation errors since r340171 Some buildbot slaves reports compilation errors, but it compiled fine on my side, sorry for the breakage. llvm-svn: 340172	2018-08-20 13:31:41 +00:00
Samuel Pitoiset	c95ef77d37	AMDGPU: bump AS.MAX_COMMON_ADDRESS to 6 since 32-bit addr space 32-bit constant address space is declared as 6, so the maximum number of address spaces is 6, not 5. Fixes "LLVM ERROR: Pointer address space out of range". v3: use static_assert() v2: add a very simple test for 32-bit addr space Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=106630 Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> llvm-svn: 340171	2018-08-20 13:18:59 +00:00
Chandler Carruth	c73c0307fe	[MI] Change the array of `MachineMemOperand` pointers to be a generically extensible collection of extra info attached to a `MachineInstr`. The primary change here is cleaning up the APIs used for setting and manipulating the `MachineMemOperand` pointer arrays so chat we can change how they are allocated. Then we introduce an extra info object that using the trailing object pattern to attach some number of MMOs but also other extra info. The design of this is specifically so that this extra info has a fixed necessary cost (the header tracking what extra info is included) and everything else can be tail allocated. This pattern works especially well with a `BumpPtrAllocator` which we use here. I've also added the basic scaffolding for putting interesting pointers into this, namely pre- and post-instruction symbols. These aren't used anywhere yet, they're just there to ensure I've actually gotten the data structure types correct. I'll flesh out support for these in a subsequent patch (MIR dumping, parsing, the works). Finally, I've included an optimization where we store any single pointer inline in the `MachineInstr` to avoid the allocation overhead. This is expected to be the overwhelmingly most common case and so should avoid any memory usage growth due to slightly less clever / dense allocation when dealing with >1 MMO. This did require several ergonomic improvements to the `PointerSumType` to reasonably support the various usage models. This also has a side effect of freeing up 8 bits within the `MachineInstr` which could be repurposed for something else. The suggested direction here came largely from Hal Finkel. I hope it was worth it. ;] It does hopefully clear a path for subsequent extensions w/o nearly as much leg work. Lots of thanks to Reid and Justin for careful reviews and ideas about how to do all of this. Differential Revision: https://reviews.llvm.org/D50701 llvm-svn: 339940	2018-08-16 21:30:05 +00:00
Matt Arsenault	7121bed210	AMDGPU: Custom lower fexp This will allow the library to just use __builtin_expf directly without expanding this itself. Note f64 still won't work because there is no exp instruction for it. llvm-svn: 339902	2018-08-16 17:07:52 +00:00
Matt Arsenault	f533e6b0ed	AMDGPU: Fold fneg into fmed3 llvm-svn: 339821	2018-08-15 21:46:27 +00:00
Matt Arsenault	a816073764	AMDGPU: Improve extract_vector_elt reduction combine Handle fmul, fsub and preserve flags. Also really test minnum/maxnum reductions. The existing tests were only checking from minnum/maxnum matched from a fast math compare and select which is not the same. llvm-svn: 339820	2018-08-15 21:34:06 +00:00
Matt Arsenault	b3a80e5397	AMDGPU: Implement llvm.amdgcn.icmp/fcmp for i16/f16 Also support these on targets without support for these, since it will allow us to freely create these in instcombine. llvm-svn: 339819	2018-08-15 21:25:20 +00:00
Matt Arsenault	6c7ba82900	AMDGPU: Address todo for handling 1/(2 pi) llvm-svn: 339814	2018-08-15 21:03:55 +00:00
Chandler Carruth	66654b72c9	[SDAG] Remove the reliance on MI's allocation strategy for `MachineMemOperand` pointers attached to `MachineSDNodes` and instead have the `SelectionDAG` fully manage the memory for this array. Prior to this change, the memory management was deeply confusing here -- The way the MI was built relied on the `SelectionDAG` allocating memory for these arrays of pointers using the `MachineFunction`'s allocator so that the raw pointer to the array could be blindly copied into an eventual `MachineInstr`. This creates a hard coupling between how `MachineInstr`s allocate their array of `MachineMemOperand` pointers and how the `MachineSDNode` does. This change is motivated in large part by a change I am making to how `MachineFunction` allocates these pointers, but it seems like a layering improvement as well. This would run the risk of increasing allocations overall, but I've implemented an optimization that should avoid that by storing a single `MachineMemOperand` pointer directly instead of allocating anything. This is expected to be a net win because the vast majority of uses of these only need a single pointer. As a side-effect, this makes the API for updating a `MachineSDNode` and a `MachineInstr` reasonably different which seems nice to avoid unexpected coupling of these two layers. We can map between them, but we shouldn't be surprised at where that occurs. =] Differential Revision: https://reviews.llvm.org/D50680 llvm-svn: 339740	2018-08-14 23:30:32 +00:00
Matt Arsenault	13b0db9285	AMDGPU: Check NSZ MI flag when folding omod I'm not sure the exact nsz flag combination that is OK. I think as long as it's on either, this is OK. For now just check it on the omod multiply. llvm-svn: 339513	2018-08-12 08:44:25 +00:00
Matt Arsenault	b5acec1f79	AMDGPU: Use splat vectors for undefs when folding canonicalize If one of the elements is undef, use the canonicalized constant from the other element instead of 0. Splat vectors are more useful for other optimizations, such as matching vector clamps. This was breaking on clamps of half3 from the undef 4th component. llvm-svn: 339512	2018-08-12 08:42:54 +00:00
Matt Arsenault	3ead7d7389	AMDGPU: Fix packing undef parts of build_vector llvm-svn: 339511	2018-08-12 08:42:46 +00:00
Tom Stellard	8adc86a7dc	AMDGPU/GlobalISel: Define instruction mapping for G_INSERT Reviewers: arsenm Reviewed By: arsenm Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, rovka, kristof.beyls, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D49625 llvm-svn: 339491	2018-08-11 00:51:54 +00:00
Matt Arsenault	940e6075e4	AMDGPU: More canonicalized operations llvm-svn: 339464	2018-08-10 19:20:17 +00:00
Matt Arsenault	3dcf4ce435	AMDGPU: Combine and of seto/setuo and fp_class Clear the nan (or non-nan) test bits from the mask. llvm-svn: 339462	2018-08-10 18:58:56 +00:00
Matt Arsenault	8ad00d30fa	AMDGPU: Match isfinite pattern to class instructions llvm-svn: 339460	2018-08-10 18:58:41 +00:00
Matt Arsenault	5bb9d798b4	AMDGPU: Add LLVM_FALLTHROUGH llvm-svn: 339458	2018-08-10 17:57:12 +00:00
Matt Arsenault	935f3b70fe	AMDGPU: Error more gracefully on libcalls I think this is the only situation where the callsite will have a null instruction. llvm-svn: 339271	2018-08-08 16:58:39 +00:00
Matt Arsenault	e719139b10	AMDGPU: Fix shifts for i128 llvm-svn: 339270	2018-08-08 16:58:33 +00:00
Jan Vesely	7b2c98ab59	AMDGPU: Remove broken i16 ternary patterns Fixup test to check for GCN prefix These patterns always zero extend the result even though it might need sign extension. This has been broken since the addition of i16 support. It has popped up in mad_sat(char) test since min(max()) combination is turned into v_med3, resulting in the following (incorrect) sequence: v_mad_i16 v2, v10, v9, v11 v_med3_i32 v2, v2, v8, v7 Fixes mad_sat(char) piglit on VI. Differential Revision: https://reviews.llvm.org/D49836 llvm-svn: 339190	2018-08-07 21:54:37 +00:00
Matt Arsenault	96b678427a	AMDGPU: Add feature vi-insts This is necessary to add a VI specific builtin, __builtin_amdgcn_s_dcache_wb. We already have an overly specific feature for one of these builtins, for s_memrealtime. I'm not sure whether it's better to add more of those, or to get rid of that and merge it with vi-insts. Alternatively, maybe this logically goes with scalar-stores? llvm-svn: 339104	2018-08-07 07:28:46 +00:00
Matt Arsenault	08f3fe4fae	AMDGPU: cvt_pk_rtz_f16 canonicalizes llvm-svn: 339078	2018-08-06 23:01:31 +00:00
Matt Arsenault	e94ee833f9	AMDGPU: Handle some vector operations in isCanonicalized llvm-svn: 339077	2018-08-06 22:45:51 +00:00
Matt Arsenault	a29e76244a	AMDGPU: Push fcanonicalize through partially constant build_vector This usually avoids some re-packing code, and may help find canonical sources. llvm-svn: 339072	2018-08-06 22:30:44 +00:00
Matt Arsenault	f2a167fb1d	AMDGPU: Refactor fcanonicalize combine This will make more complex combines easier. llvm-svn: 339070	2018-08-06 22:10:26 +00:00
Matt Arsenault	d49ab0b214	AMDGPU: Treat more custom operations as canonicalizing Everything should quiet, and I think everything should flush. I assume the min3/med3/max3 follow the same rules as regular min/max for flushing, which should at least be conservatively correct. There are still more operations that need to be handled. llvm-svn: 339065	2018-08-06 21:58:11 +00:00
Matt Arsenault	ce6d61fba8	AMDGPU: Conversions always produce canonical results Not sure why this was checking for denormals for f16. My interpretation of the IEEE standard is conversions should produce a canonical result, and the ISA manual says denormals are created when appropriate. llvm-svn: 339064	2018-08-06 21:51:52 +00:00
Matt Arsenault	f8768bfc84	AMDGPU: Fix implementation of isCanonicalized If denormals are enabled, denormals are canonical. Also fix a few other issues. minnum/maxnum are supposed to canonicalize. Temporarily improve workaround for the instruction behavior change in gfx9. Handle selects and fcopysign. The tests were also largely broken, since they were checking for a flush used on some targets after the store of the result. llvm-svn: 339061	2018-08-06 21:38:27 +00:00
Matt Arsenault	0d1b3934e2	AMDGPU: Fold v_lshl_or_b32 with 0 src0 Appears from expansion of some packed cases. llvm-svn: 339025	2018-08-06 15:40:20 +00:00
David Bolvansky	c0aa4b75a4	Enrich inline messages Summary: This patch improves Inliner to provide causes/reasons for negative inline decisions. 1. It adds one new message field to InlineCost to report causes for Always and Never instances. All Never and Always instantiations must provide a simple message. 2. Several functions that used to return the inlining results as boolean are changed to return InlineResult which carries the cause for negative decision. 3. Changed remark priniting and debug output messages to provide the additional messages and related inline cost. 4. Adjusted tests for changed printing. Patch by: yrouban (Yevgeny Rouban) Reviewers: craig.topper, sammccall, sgraenitz, NutshellySima, shchenz, chandlerc, apilipenko, javed.absar, tejohnson, dblaikie, sanjoy, eraman, xbolva00 Reviewed By: tejohnson, xbolva00 Subscribers: xbolva00, llvm-commits, arsenm, mehdi_amini, eraman, haicheng, steven_wu, dexonsmith Differential Revision: https://reviews.llvm.org/D49412 llvm-svn: 338969	2018-08-05 14:53:08 +00:00
Matt Arsenault	c3dc8e65e2	DAG: Enhance isKnownNeverNaN Add a parameter for testing specifically for sNaNs - at least one instruction pattern on AMDGPU needs to check specifically for this. Also handle more cases, and add a target hook for custom nodes, similar to the hooks for known bits. llvm-svn: 338910	2018-08-03 18:27:52 +00:00
Tim Renouf	366a49d986	[AMDGPU] Minor change to d16 buffer load implementation Summary: By not reconstructing the operand list of the SDNode, this change makes it easier to add the forthcoming new tbuffer and buffer intrinsics. Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D49995 Change-Id: I0cb79ef0801532645d7dd954a6d7355139db7b38 llvm-svn: 338784	2018-08-02 23:33:01 +00:00
Tim Renouf	abd85fb1f5	[AMDGPU] Reworked SIFixWWMLiveness Summary: I encountered some problems with SIFixWWMLiveness when WWM is in a loop: 1. It sometimes gave invalid MIR where there is some control flow path to the new implicit use of a register on EXIT_WWM that does not pass through any def. 2. There were lots of false positives of registers that needed to have an implicit use added to EXIT_WWM. 3. Adding an implicit use to EXIT_WWM (and adding an implicit def just before the WWM code, which I tried in order to fix (1)) caused lots of the values to be spilled and reloaded unnecessarily. This commit is a rework of SIFixWWMLiveness, with the following changes: 1. Instead of considering any register with a def that can reach the WWM code and a def that can be reached from the WWM code, it now considers three specific cases that need to be handled. 2. A register that needs liveness over WWM to be synthesized now has it done by adding itself as an implicit use to defs other than the dominant one. Also added the following fixmes: FIXME: We should detect whether a register in one of the above categories is already live at the WWM code before deciding to add the implicit uses to synthesize its liveness. FIXME: I believe this whole scheme may be flawed due to the possibility of the register allocator doing live interval splitting. Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D46756 Change-Id: Ie7fba0ede0378849181df3f1a9a7a39ed1a94a94 llvm-svn: 338783	2018-08-02 23:31:32 +00:00
Tim Renouf	f1c7b92a6a	[AMDGPU] Avoid using divergent value in mubuf addr64 descriptor Summary: This fixes a problem where a load from global+idx generated incorrect code on <=gfx7 when the index is divergent. Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D47383 Change-Id: Ib4d177d6254b1dd3f8ec0203fdddec94bd8bc5ed llvm-svn: 338779	2018-08-02 22:53:57 +00:00
Matt Arsenault	36cdcfadcf	AMDGPU: Fix scalarizing v4f16 fcanonicalize llvm-svn: 338714	2018-08-02 13:43:42 +00:00
Alexander Ivchenko	49168f6778	[GlobalISel] Rewrite CallLowering::lowerReturn to accept multiple VRegs per Value This is logical continuation of https://reviews.llvm.org/D46018 (r332449) Differential Revision: https://reviews.llvm.org/D49660 llvm-svn: 338685	2018-08-02 08:33:31 +00:00
Matt Arsenault	a7dfd48310	AMDGPU: Use SPseudoInst helper llvm-svn: 338631	2018-08-01 20:49:00 +00:00
Matt Arsenault	709374d186	AMDGPU: Improve hack for packing conversion ops Mutate the node type during selection when it doesn't matter. This avoids an intermediate bitcast node on targets with legal i16/f16. Also fixes missing output modifiers on v_cvt_pkrtz_f32_f16, which I assume are OK. llvm-svn: 338619	2018-08-01 20:13:58 +00:00
Matt Arsenault	55ab9213d3	AMDGPU: Partially fix handling of packed amdgpu_ps arguments Fixes annoying limitations when writing tests. Also remove more leftover code for manually scalarizing arguments and return values. llvm-svn: 338618	2018-08-01 19:57:34 +00:00
Jan Vesely	93b252799b	AMDGPU/R600: Convert kernel param loads to use PARAM_I_ADDRESS Non ext aligned i32 loads are still optimized to use CONSTANT_BUFFER (AS 8) llvm-svn: 338610	2018-08-01 18:36:07 +00:00
Jan Vesely	5ba1b4bdab	AMDGPU: Allow fp32-denormals feature for r600 targets This was accidentally removed in r335942. Differential Revision: https://reviews.llvm.org/D49934 llvm-svn: 338569	2018-08-01 15:04:36 +00:00
Ryan Taylor	894c8fd0e2	[AMDGPU] Optimize _L image intrinsic to _LZ when lod is zero Summary: Add _L to _LZ image intrinsic table mapping to table gen. In ISelLowering check if image intrinsic has lod and if it's equal to zero, if so remove lod and change opcode to equivalent mapped _LZ. Change-Id: Ie24cd7e788e2195d846c7bd256151178cbb9ec71 Subscribers: arsenm, mehdi_amini, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, steven_wu, dexonsmith, llvm-commits Differential Revision: https://reviews.llvm.org/D49483 llvm-svn: 338523	2018-08-01 12:12:01 +00:00
David Bolvansky	fbbb83c782	Revert "Enrich inline messages", tests fail llvm-svn: 338496	2018-08-01 08:02:40 +00:00
David Bolvansky	7f36cd9d96	Enrich inline messages Summary: This patch improves Inliner to provide causes/reasons for negative inline decisions. 1. It adds one new message field to InlineCost to report causes for Always and Never instances. All Never and Always instantiations must provide a simple message. 2. Several functions that used to return the inlining results as boolean are changed to return InlineResult which carries the cause for negative decision. 3. Changed remark priniting and debug output messages to provide the additional messages and related inline cost. 4. Adjusted tests for changed printing. Patch by: yrouban (Yevgeny Rouban) Reviewers: craig.topper, sammccall, sgraenitz, NutshellySima, shchenz, chandlerc, apilipenko, javed.absar, tejohnson, dblaikie, sanjoy, eraman, xbolva00 Reviewed By: tejohnson, xbolva00 Subscribers: xbolva00, llvm-commits, arsenm, mehdi_amini, eraman, haicheng, steven_wu, dexonsmith Differential Revision: https://reviews.llvm.org/D49412 llvm-svn: 338494	2018-08-01 07:37:16 +00:00
Konstantin Zhuravlyov	bb30ef7af4	AMDGPU: Add clamp bit to dot intrinsics Differential Revision: https://reviews.llvm.org/D49874 llvm-svn: 338470	2018-08-01 01:31:30 +00:00
Matt Arsenault	feedabfde7	AMDGPU: Break 64-bit arguments into 32-bit pieces llvm-svn: 338421	2018-07-31 19:29:04 +00:00
Matt Arsenault	0395da7842	AMDGPU: Split wide vectors of i16/f16 into 32-bit regs on calls This improves code for the same reasons as scalarizing 32-bit element vectors. llvm-svn: 338418	2018-07-31 19:17:47 +00:00
Matt Arsenault	9ced1e0d80	AMDGPU: Scalarize vector argument types to calls When lowering calling conventions, prefer to decompose vectors into the constitute register types. This avoids artifical constraints to satisfy a wide super-register. This improves code quality because now optimizations don't need to deal with the super-register constraint. For example the immediate folding code doesn't deal with 4 component reg_sequences, so by breaking the register down earlier the existing immediate folding code is able to work. This also avoids the need for the shader input processing code to manually split vector types. llvm-svn: 338416	2018-07-31 19:05:14 +00:00
David Bolvansky	ab79414f7b	Revert Enrich inline messages llvm-svn: 338389	2018-07-31 14:47:22 +00:00
David Bolvansky	b562dbabda	Enrich inline messages Summary: This patch improves Inliner to provide causes/reasons for negative inline decisions. 1. It adds one new message field to InlineCost to report causes for Always and Never instances. All Never and Always instantiations must provide a simple message. 2. Several functions that used to return the inlining results as boolean are changed to return InlineResult which carries the cause for negative decision. 3. Changed remark priniting and debug output messages to provide the additional messages and related inline cost. 4. Adjusted tests for changed printing. Patch by: yrouban (Yevgeny Rouban) Reviewers: craig.topper, sammccall, sgraenitz, NutshellySima, shchenz, chandlerc, apilipenko, javed.absar, tejohnson, dblaikie, sanjoy, eraman, xbolva00 Reviewed By: tejohnson, xbolva00 Subscribers: xbolva00, llvm-commits, arsenm, mehdi_amini, eraman, haicheng, steven_wu, dexonsmith Differential Revision: https://reviews.llvm.org/D49412 llvm-svn: 338387	2018-07-31 14:25:24 +00:00
Matt Arsenault	638a202760	AMDGPU: Don't handle FP16_TO_FP in isCanonicalized This needs more special handling to do correctly. Fixes test in subsequent commit. llvm-svn: 338381	2018-07-31 14:15:16 +00:00
Matt Arsenault	4aec86d37a	AMDGPU: Fold undef fcanonicalize to qNaN We could choose a free 0 for this, but this matches the behavior for fmul undef, 1.0. Also, the NaN use is more useful for folding use operations although if it's not eliminated it is more expensive in terms of code size. llvm-svn: 338376	2018-07-31 13:34:31 +00:00
Matt Arsenault	de496c32a4	AMDGPU: Reduce code size with fcanonicalize (fneg x) When fcanonicalize is lowered to a mul, we can use -1.0 for free and avoid the cost of the bigger encoding for source modifers. llvm-svn: 338244	2018-07-30 12:16:58 +00:00
Matt Arsenault	f3c9a34def	AMDGPU: Make fneg combine handle fcanonicalize llvm-svn: 338243	2018-07-30 12:16:47 +00:00
Nicolai Haehnle	7f0d05d532	AMDGPU: Force skip over s_sendmsg and exp instructions Summary: These instructions interact with hardware blocks outside the shader core, and they can have "scalar" side effects even when EXEC = 0. We don't want these scalar side effects to occur when all lanes want to skip these instructions, so always add the execz skip branch instruction for basic blocks that contain them. Also ensure that we skip scalar stores / atomics, though we don't code-gen those yet. Reviewers: arsenm, rampitec Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D48431 Change-Id: Ieaeb58352e2789ffd64745603c14970c60819d44 llvm-svn: 338235	2018-07-30 09:23:59 +00:00
Matt Arsenault	8f9dde94b7	AMDGPU: Stop wasting argument registers with v3i32/v3f32 SelectionDAGBuilder widens v3i32/v3f32 arguments to to v4i32/v4f32 which consume an additional register. In addition to wasting argument space, this produces extra instructions since now it appears the 4th vector component has a meaningful value to most combines. llvm-svn: 338197	2018-07-28 14:11:34 +00:00
Matt Arsenault	81920b0a25	DAG: Add calling convention argument to calling convention funcs This seems like a pretty glaring omission, and AMDGPU wants to treat kernels differently from other calling conventions. llvm-svn: 338194	2018-07-28 13:25:19 +00:00
Matt Arsenault	72b0e38b26	AMDGPU: Stop trying to extend arguments for clover This was trying to replace i8/i16 arguments with i32, which was broken and no longer necessary. llvm-svn: 338193	2018-07-28 12:34:25 +00:00
Jan Vesely	6ff58ed5ca	AMDGPU/R600: Add MOV instructions to BFE patterns R600 can't handle immediates for BFE, these will be eliminated later. Fixes powr/pow regressions n r600 since r334817 Differential Revision: https://reviews.llvm.org/D49641 llvm-svn: 338127	2018-07-27 15:00:13 +00:00
Matt Arsenault	0183c56c11	AMDGPU: Fix code size for return_to_epilog pseudo llvm-svn: 338113	2018-07-27 09:15:03 +00:00
Tom Stellard	e9bdc5f1d8	AMDGPU/GlobalISel: Fix crash in regbankselect on non-power-of-2 types Reviewers: arsenm Reviewed By: arsenm Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, rovka, kristof.beyls, dstuttard, tpr, llvm-commits, t-tye Differential Revision: https://reviews.llvm.org/D49624 llvm-svn: 338102	2018-07-27 06:04:40 +00:00
Scott Linder	eb1f75d561	[AMDGPU] Fix VGPR spills where offset doesn't fit in 12 bits Scale the offset of VGPR spills by the wave size when it cannot fit in the 12-bit offset immediate field and so is added to the soffset SGPR. This accounts for hardware swizzling of scratch memory. Differential Revision: https://reviews.llvm.org/D49448 llvm-svn: 338060	2018-07-26 19:47:51 +00:00
Stanislav Mekhanoshin	7e7268ac1c	[AMDGPU] Use AssumptionCacheTracker in the divrem32 expansion Differential Revision: https://reviews.llvm.org/D49761 llvm-svn: 337938	2018-07-25 17:02:11 +00:00
Tom Stellard	b7f19e6d1e	AMDGPU/GlobalISel: Legalize G_INSERT Reviewers: arsenm Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, rovka, kristof.beyls, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D49601 llvm-svn: 337798	2018-07-24 02:19:20 +00:00
Tom Stellard	2d37929c10	AMDGPU/GlobalISel: Remove unnecessary legality constraint for G_EXTRACT Summary: We were marking G_EXTRACT operations unsupported if the output type was larger than the input type. I don't see how this could ever actually happen, so I dropped the constraint. Doing this makes it possible to reuse the same legality code for G_INSERT. Reviewers: arsenm Reviewed By: arsenm Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, rovka, kristof.beyls, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D49600 llvm-svn: 337794	2018-07-24 01:43:49 +00:00
Matt Arsenault	5ae765e68c	AMDGPU: Use existing function to check for VGPRs llvm-svn: 337621	2018-07-20 21:20:36 +00:00
Matt Arsenault	4bec7d4261	Reapply "AMDGPU: Fix handling of alignment padding in DAG argument lowering" Reverts r337079 with fix for msan error. llvm-svn: 337535	2018-07-20 09:05:08 +00:00
Farhana Aleen	c370d7b33d	[AMDGPU] [AMDGPU] Support a fdot2 pattern. Summary: Optimize fma((float)S0.x, (float)S1.x fma((float)S0.y, (float)S1.y, z)) -> fdot2((v2f16)S0, (v2f16)S1, (float)z) Author: FarhanaAleen Reviewed By: rampitec, b-sumner Subscribers: AMDGPU Differential Revision: https://reviews.llvm.org/D49146 llvm-svn: 337198	2018-07-16 18:19:59 +00:00
Mark Searles	f4e7025299	[AMDGPU][Waitcnt] Re-apply fix "comparison of integers of different signs" build error" Re-apply "[AMDGPU][Waitcnt] fix "comparison of integers of different signs" build error"" ( fe0a456510131f268e388c4a18a92f575c0db183 ), which was inadvertantly reverted via 2b2ee080f0164485562593b1b87291a48cea4a9a . llvm-svn: 337156	2018-07-16 10:21:36 +00:00
Mark Searles	72da47df25	run post-RA hazard recognizer pass late Memory legalizer, waitcnt, and shrink passes can perturb the instructions, which means that the post-RA hazard recognizer pass should run after them. Otherwise, one of those passes may invalidate the work done by the hazard recognizer. Note that this has adverse side-effect that any consecutive S_NOP 0's, emitted by the hazard recognizer, will not be shrunk into a single S_NOP <N>. This should be addressed in a follow-on patch. Differential Revision: https://reviews.llvm.org/D49288 llvm-svn: 337154	2018-07-16 10:02:41 +00:00
Mark Searles	c2d5d9adb5	Revert "[AMDGPU][Waitcnt] fix "comparison of integers of different signs" build error" This reverts commit fe0a456510131f268e388c4a18a92f575c0db183. llvm-svn: 337153	2018-07-16 10:02:40 +00:00
Evgeniy Stepanov	1971ba097d	Revert "AMDGPU: Fix handling of alignment padding in DAG argument lowering" This reverts commit r337021. WARNING: MemorySanitizer: use-of-uninitialized-value #0 0x1415cd65 in void write_signed<long>(llvm::raw_ostream&, long, unsigned long, llvm::IntegerStyle) /code/llvm-project/llvm/lib/Support/NativeFormatting.cpp:95:7 #1 0x1415c900 in llvm::write_integer(llvm::raw_ostream&, long, unsigned long, llvm::IntegerStyle) /code/llvm-project/llvm/lib/Support/NativeFormatting.cpp:121:3 #2 0x1472357f in llvm::raw_ostream::operator<<(long) /code/llvm-project/llvm/lib/Support/raw_ostream.cpp:117:3 #3 0x13bb9d4 in llvm::raw_ostream::operator<<(int) /code/llvm-project/llvm/include/llvm/Support/raw_ostream.h:210:18 #4 0x3c2bc18 in void printField<unsigned int, &(amd_kernel_code_s::amd_kernel_code_version_major)>(llvm::StringRef, amd_kernel_code_s const&, llvm::raw_ostream&) /code/llvm-project/llvm/lib/Target/AMDGPU/Utils/AMDKernelCodeTUtils.cpp:78:23 #5 0x3c250ba in llvm::printAmdKernelCodeField(amd_kernel_code_s const&, int, llvm::raw_ostream&) /code/llvm-project/llvm/lib/Target/AMDGPU/Utils/AMDKernelCodeTUtils.cpp:104:5 #6 0x3c27ca3 in llvm::dumpAmdKernelCode(amd_kernel_code_s const, llvm::raw_ostream&, char const) /code/llvm-project/llvm/lib/Target/AMDGPU/Utils/AMDKernelCodeTUtils.cpp:113:5 #7 0x3a46e6c in llvm::AMDGPUTargetAsmStreamer::EmitAMDKernelCodeT(amd_kernel_code_s const&) /code/llvm-project/llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUTargetStreamer.cpp:161:3 #8 0xd371e4 in llvm::AMDGPUAsmPrinter::EmitFunctionBodyStart() /code/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp:204:26 [...] Uninitialized value was created by an allocation of 'KernelCode' in the stack frame of function '_ZN4llvm16AMDGPUAsmPrinter21EmitFunctionBodyStartEv' #0 0xd36650 in llvm::AMDGPUAsmPrinter::EmitFunctionBodyStart() /code/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp:192 llvm-svn: 337079	2018-07-14 01:20:53 +00:00
Tom Stellard	ac68471326	AMDGPU/GlobalISel: Implement select() for 32-bit @llvm.minnun and @llvm.maxnum Reviewers: arsenm, nhaehnle Subscribers: kzhuravl, wdng, yaxunl, rovka, kristof.beyls, dstuttard, tpr, llvm-commits, t-tye Differential Revision: https://reviews.llvm.org/D46172 llvm-svn: 337056	2018-07-13 22:16:03 +00:00
Tom Stellard	390a5f4774	AMDGPU/GlobalISel: Implement select() for @llvm.amdgcn.exp Reviewers: arsenm, nhaehnle Subscribers: kzhuravl, wdng, yaxunl, rovka, kristof.beyls, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D45882 llvm-svn: 337046	2018-07-13 21:05:14 +00:00
Matt Arsenault	d362b6ad29	AMDGPU: Properly handle shader inputs with split arguments This needs to refer to arguments by their original argument index, not the argument split index which depends on what the type splitting decides to do. Also avoid increment PSInputNum for each split piece. llvm-svn: 337022	2018-07-13 16:40:37 +00:00
Matt Arsenault	de95077780	AMDGPU: Fix handling of alignment padding in DAG argument lowering This was completely broken if there was ever a struct argument, as this information is thrown away during the argument analysis. The offsets as passed in to LowerFormalArguments are not useful, as they partially depend on the legalized result register type, and they don't consider the alignment in the first place. Ignore the Ins array, and instead figure out from the raw IR type what we need to do. This seems to fix the padding computation if the DAG lowering is forced (and stops breaking arguments following padded arguments if the arguments were only partially lowered in the IR) llvm-svn: 337021	2018-07-13 16:40:25 +00:00
Matt Arsenault	4dca0a9904	AMDGPU: Fix assert in truncate combine with vectors The piece above probably has the same problem, but I need to try to come up with a test for it. llvm-svn: 336935	2018-07-12 19:40:16 +00:00
Tom Stellard	752ddbd068	AMDGPU/SI: Initialize InstrInfo before TargetLoweringInfo in GCNSubtarget SITargetLowering queries SIInstrInfo in its constructor, so SIInstrInfo must be initialized first. This fixes msan buildbot failures and was introduced by r336851. llvm-svn: 336861	2018-07-11 22:15:15 +00:00
Tom Stellard	1a8adce24f	AMDGPU: Remove duplicate call to initializeSubtargetDependencies() This was added in r336851. llvm-svn: 336853	2018-07-11 21:12:03 +00:00
Tom Stellard	5bfbae5cb1	AMDGPU: Refactor Subtarget classes Summary: This is a follow-up to r335942. - Merge SISubtarget into AMDGPUSubtarget and rename to GCNSubtarget - Rename AMDGPUCommonSubtarget to AMDGPUSubtarget - Merge R600Subtarget::Generation and GCNSubtarget::Generation into AMDGPUSubtarget::Generation. Reviewers: arsenm, jvesely Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, javed.absar, llvm-commits Differential Revision: https://reviews.llvm.org/D49037 llvm-svn: 336851	2018-07-11 20:59:01 +00:00
Konstantin Zhuravlyov	bde59e9989	AMDGPU/NFC: Use already available explicit kernarg size instead of calculating it again when filling out the metadata. llvm-svn: 336825	2018-07-11 17:27:17 +00:00
Richard Trieu	d5e57ed9c2	Fix -Wmismatched-tags warning class -> struct in forward declaration. llvm-svn: 336733	2018-07-10 22:09:33 +00:00
Scott Linder	01ce144ddf	[AMDGPU] Fix layering issue with AMDGPUHSAMetadataStreamer (NFC) llvm-svn: 336722	2018-07-10 20:07:22 +00:00
Scott Linder	2ad2c18b82	[AMDGPU] Refactor HSAMetadataStream::emitKernel (NFC) Move all metadata construction into AMDGPUHSAMetadataStreamer. Differential Revision: https://reviews.llvm.org/D48176 llvm-svn: 336707	2018-07-10 17:31:32 +00:00
Konstantin Zhuravlyov	f0badd5ac1	AMDGPU: Make hidden argument metadata consistent with amdgpu-implicitarg-num-bytes attribute Differential Revision: https://reviews.llvm.org/D49096 llvm-svn: 336697	2018-07-10 16:12:51 +00:00
Matt Arsenault	a680199a96	Reapply "AMDGPU: Force inlining if LDS global address is used" This reverts commit r336623 llvm-svn: 336675	2018-07-10 14:03:41 +00:00
Vlad Tsyrklevich	688e752207	Revert "AMDGPU: Force inlining if LDS global address is used" This reverts commit r336587, it was causing test failures on the sanitizer bots. llvm-svn: 336623	2018-07-10 00:46:07 +00:00
Mark Searles	5bfd8d8991	[AMDGPU][Waitcnt] fix "comparison of integers of different signs" build error Build error on Android; reported by and fix provided by (thanks) by Mauro Rossi <issor.oruam@gmail.com> Fixes the following building error: external/llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp:1903:61: error: comparison of integers of different signs: 'typename iterator_traits<__wrap_iter<MachineBasicBlock **> >::difference_type' (aka 'int') and 'unsigned int' [-Werror,-Wsign-compare] BlockWaitcntProcessedSet.end(), &MBB) < Count)) { ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ^ ~~~~~ 1 error generated. Differential Revision: https://reviews.llvm.org/D49089 llvm-svn: 336588	2018-07-09 19:28:14 +00:00
Matt Arsenault	40cb6cab56	AMDGPU: Force inlining if LDS global address is used These won't work for the forseeable future. These aren't allowed from OpenCL, but IPO optimizations can make them appear. Also directly set the attributes on functions, regardless of the linkage rather than cloning functions like before. llvm-svn: 336587	2018-07-09 19:22:22 +00:00
Tom Stellard	ec4feae1b6	AMDGPU: Fix UBSan error caused by r335942 Summary: Fixes PR38071. Reviewers: arsenm, dstenb Reviewed By: arsenm Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D48979 llvm-svn: 336448	2018-07-06 17:16:17 +00:00
Matt Arsenault	29f303799b	AMDGPU/GlobalISel: Implement custom kernel arg lowering Avoid using allocateKernArg / AssignFn. We do not want any of the type splitting properties of normal calling convention lowering. For now at least this exists alongside the IR argument lowering pass. This is necessary to handle struct padding correctly while some arguments are still skipped by the IR argument lowering pass. llvm-svn: 336373	2018-07-05 17:01:20 +00:00
Ryan Taylor	5f04458a61	[AMDGPU] Add VALU to V_INTERP Instructions Wait states are not properly being inserted after buffer_store for v_interp instructions. Add VALU to V_INTERP instructions so that the GCNHazardRecognizer can check and insert the appropriate wait states when needed. Differential Revision: https://reviews.llvm.org/D48772 Change-Id: Id540c9b074fc69b5c1de6b182276aa089c74aa64 llvm-svn: 336339	2018-07-05 12:02:07 +00:00
Piotr Padlewski	5b3db45e8f	Implement strip.invariant.group Summary: This patch introduce new intrinsic - strip.invariant.group that was described in the RFC: Devirtualization v2 Reviewers: rsmith, hfinkel, nlopes, sanjoy, amharc, kuhar Subscribers: arsenm, nhaehnle, JDevlieghere, hiraditya, xbolva00, llvm-commits Differential Revision: https://reviews.llvm.org/D47103 Co-authored-by: Krzysztof Pszeniczny <krzysztof.pszeniczny@gmail.com> llvm-svn: 336073	2018-07-02 04:49:30 +00:00
Tom Stellard	eebbfc2809	AMDGPU/GlobalISel: Make IMPLICIT_DEF of all sizes < 512 legal. Summary: We could split sizes that are not power of two into smaller sized G_IMPLICIT_DEF instructions, but this ends up generating G_MERGE_VALUES instructions which we then have to handle in the instruction selector. Since G_IMPLICIT_DEF is really a no-op it's easier just to keep everything that can fit into a register legal. Reviewers: arsenm Reviewed By: arsenm Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, rovka, kristof.beyls, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D48777 llvm-svn: 336041	2018-06-30 04:09:44 +00:00
Matt Arsenault	f5be3ad7f8	AMDGPU: Don't use struct type for argument layout This was introducing unnecessary padding after the explicit arguments, depending on the alignment of the total struct type. Also has the side effect of avoiding creating an extra GEP for the offset from the base kernel argument to the explicit kernel argument offset. llvm-svn: 335999	2018-06-29 17:31:42 +00:00
Stanislav Mekhanoshin	20d4795d93	[AMDGPU] Enable LICM in the BE pipeline This allows to hoist code portion to compute reciprocal of loop invariant denominator in integer division after codegen prepare expansion. Differential Revision: https://reviews.llvm.org/D48604 llvm-svn: 335988	2018-06-29 16:26:53 +00:00
Tom Stellard	c5a154db48	AMDGPU: Separate R600 and GCN TableGen files Summary: We now have two sets of generated TableGen files, one for R600 and one for GCN, so each sub-target now has its own tables of instructions, registers, ISel patterns, etc. This should help reduce compile time since each sub-target now only has to consider information that is specific to itself. This will also help prevent the R600 sub-target from slowing down new features for GCN, like disassembler support, GlobalISel, etc. Reviewers: arsenm, nhaehnle, jvesely Reviewed By: arsenm Subscribers: MatzeB, kzhuravl, wdng, mgorny, yaxunl, dstuttard, tpr, t-tye, javed.absar, llvm-commits Differential Revision: https://reviews.llvm.org/D46365 llvm-svn: 335942	2018-06-28 23:47:12 +00:00
Stanislav Mekhanoshin	67aa18f165	[AMDGPU] Early expansion of 32 bit udiv/urem This allows hoisting of a common code, for instance if denominator is loop invariant. Current change is expansion only, adding licm to the target pass list going to be a separate patch. Given this patch changes to codegen are minor as the expansion is similar to that on DAG. DAG expansion still must remain for R600. Differential Revision: https://reviews.llvm.org/D48586 llvm-svn: 335868	2018-06-28 15:59:18 +00:00
Stanislav Mekhanoshin	298a61590a	[AMDGPU] Overload llvm.amdgcn.fmad.ftz to support f16 Differential Revision: https://reviews.llvm.org/D48677 llvm-svn: 335866	2018-06-28 15:24:46 +00:00
Matt Arsenault	75e7192ba3	AMDGPU: Remove MFI::ABIArgOffset We have too many mechanisms for tracking the various offsets used for kernel arguments, so remove one. There's still a lot of confusion with these because there are two different "implicit" argument areas located at the beginning and end of the kernarg segment. Additionally, the offset was determined based on the memory size of the split element types. This would break in a future commit where v3i32 is decomposed into separate i32 pieces. llvm-svn: 335830	2018-06-28 10:18:55 +00:00
Matt Arsenault	1fb9013368	AMDGPU: Error on calls from graphics shaders In principle nothing should stop these from working, but work is necessary to create an ABI for dealing with the stack related registers. llvm-svn: 335829	2018-06-28 10:18:36 +00:00
Matt Arsenault	12269dda5c	AMDGPU: Fix AMDGPUCodeGenPrepare using uninitialized AMDGPUAS struct Not sure how this wasn't noticed before. llvm-svn: 335828	2018-06-28 10:18:23 +00:00
Matt Arsenault	513e0c0ea4	AMDGPU: Fix assert on aggregate type kernel arguments Just fix the crash for now by not doing the optimization since figuring out how to properly convert the bits for an arbitrary struct is a pain. Also fix a crash when there is only an empty struct argument. llvm-svn: 335827	2018-06-28 10:18:11 +00:00
Stanislav Mekhanoshin	1a1687f1bb	[AMDGPU] Convert rcp to rcp_iflag If a source of rcp instruction is a result of any conversion from an integer convert it into rcp_iflag instruction. No FP exception can ever happen except division by zero if a single precision rcp argument is a representation of an integral number. Differential Revision: https://reviews.llvm.org/D48569 llvm-svn: 335742	2018-06-27 15:33:33 +00:00
Konstantin Zhuravlyov	30f03b3bc0	AMDGPU/NFC: Fix typo in comment llvm-svn: 335707	2018-06-27 05:36:03 +00:00
Konstantin Zhuravlyov	777477705a	AMDGPU: Silence unused warnings in waitcnt insertion pass in release build Differential Revision: https://reviews.llvm.org/D48607 llvm-svn: 335669	2018-06-26 21:33:38 +00:00
Stanislav Mekhanoshin	dacda79ee6	[AMDGPU] Add llvm.amdgcn.fmad.ftz intrinsic This intrinsic selects v_mad_f32 regardless of fp32 denorm support. Differential Revision: https://reviews.llvm.org/D48573 llvm-svn: 335654	2018-06-26 20:04:19 +00:00
Matt Arsenault	8c4a35237a	AMDGPU: Add pass to lower kernel arguments to loads This replaces most argument uses with loads, but for now not all. The code in SelectionDAG for calling convention lowering is actively harmful for amdgpu_kernel. It attempts to split the argument types into register legal types, which results in low quality code for arbitary types. Since all kernel arguments are passed in memory, we just want the raw types. I've tried a couple of methods of mitigating this in SelectionDAG, but it's easier to just bypass this problem alltogether. It's possible to hack around the problem in the initial lowering, but the real problem is the DAG then expects to be able to use CopyToReg/CopyFromReg for uses of the arguments outside the block. Exposing the argument loads in the IR also has the advantage that the LoadStoreVectorizer can merge them. I'm not sure the best approach to dealing with the IR argument list is. The patch as-is just leaves the IR arguments in place, so all the existing code will still compute the same kernarg size and pointlessly lowers the arguments. Arguably the frontend should emit kernels with an empty argument list in the first place. Alternatively a dummy array could be inserted as a single argument just to reserve space. This does have some disadvantages. Local pointer kernel arguments can no longer have AssertZext placed on them as the equivalent !range metadata is not valid on pointer typed loads. This is mostly bad for SI which needs to know about the known bits in order to use the DS instruction offset, so in this case this is not done. More importantly, this skips noalias arguments since this pass does not yet convert this to the equivalent !alias.scope and !noalias metadata. Producing this metadata correctly seems to be tricky, although this logically is the same as inlining into a function which doesn't exist. Additionally, exposing these loads to the vectorizer may result in degraded aliasing information if a pointer load is merged with another argument load. I'm also not entirely sure this is preserving the current clover ABI, although I would greatly prefer if it would stop widening arguments and match the HSA ABI. As-is I think it is extending < 4-byte arguments to 4-bytes but doesn't align them to 4-bytes. llvm-svn: 335650	2018-06-26 19:10:00 +00:00
Matt Arsenault	b1cc4f52ff	AMDGPU/GlobalISel: Add support for llvm.amdgcn.kernarg.segment.ptr Note a normal select test is not currently possible because this relies on input registers tracked in SIMachineFunctionInfo which are not currently serializable in MIR, but this does work end-to-end from the IR. llvm-svn: 335490	2018-06-25 16:17:48 +00:00
Matt Arsenault	2811a20f77	AMDGPU: Remove commented out code llvm-svn: 335486	2018-06-25 15:42:20 +00:00
Matt Arsenault	b3feccd7fa	AMDGPU/GlobalISel: Fix G_IMPLICIT_DEF for pointers llvm-svn: 335485	2018-06-25 15:42:12 +00:00
Matt Arsenault	73eeb42e50	AMDGPU: Respect align argument parameter This should avoid relying on the pointee type to get the alignment, particularly since pointee types are supposed to be removed at some point. Also fixes not getting the alignment for unsized types. llvm-svn: 335478	2018-06-25 14:29:04 +00:00
Reid Kleckner	fd7c9ab971	[AMDGPU] Update includes for intrinsic changes :( llvm-svn: 335409	2018-06-23 03:05:39 +00:00
Reid Kleckner	f5890e4e43	[IR] Split Intrinsics.inc into enums and implementations Implements PR34259 Intrinsics.h is a very popular header. Most LLVM TUs care about things like dbg_value, but they don't care how they are implemented. After I split these out, IntrinsicImpl.inc is 1.7 MB, so this saves each LLVM TU from scanning 1.7 MB of source that gets pre-processed away. It also means we can modify intrinsic properties without triggering a full rebuild, but that's probably less of a win. I think the next best thing to do would be to split out the target intrinsics into their own header. Very, very few TUs care about target-specific intrinsics. It's very hard to split up the target independent intrinsics like llvm.expect, assume, and dbg.value, though. llvm-svn: 335407	2018-06-23 02:02:38 +00:00
Matt Arsenault	3f8e7a3dbc	AMDGPU: Add patterns for i32/i64 local atomic load/store Not sure why the 32/64 split is needed in the atomic_load store hierarchies. The regular PatFrags do this, but we don't do it for the existing handling for global. llvm-svn: 335325	2018-06-22 08:39:52 +00:00
Tom Stellard	6af7307650	AMDGPU/GlobalISel: Default to using TableGen'd instruction selector Summary: We can select all instructions that are marked as legal in a full piglit run, so now is a good time to make the TableGen'd instruction selector default for all opcodes. This is NFC for a full piglit run, which is why there are no tests. Reviewers: arsenm, nhaehnle Subscribers: kzhuravl, wdng, yaxunl, rovka, kristof.beyls, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D48198 llvm-svn: 335319	2018-06-22 03:04:35 +00:00
Tom Stellard	26fac0f8e1	AMDGPU/GlobalISel: legalize and select 32-bit G_ASHR Reviewers: arsenm, nhaehnle Subscribers: kzhuravl, wdng, yaxunl, rovka, kristof.beyls, dstuttard, tpr, llvm-commits, t-tye Differential Revision: https://reviews.llvm.org/D48196 llvm-svn: 335318	2018-06-22 02:54:57 +00:00
Tom Stellard	9a6535718e	AMDGPU/GlobalISel: legalize and select 32-bit G_SITOFP Reviewers: arsenm, nhaehnle Reviewed By: arsenm Subscribers: kzhuravl, wdng, yaxunl, rovka, kristof.beyls, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D48195 llvm-svn: 335316	2018-06-22 02:34:29 +00:00
Tom Stellard	7712ee8891	AMDGPU/GlobalISel: Implement select() for COPY Reviewers: arsenm, nhaehnle Reviewed By: nhaehnle Subscribers: kzhuravl, wdng, yaxunl, rovka, kristof.beyls, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D46151 llvm-svn: 335315	2018-06-22 00:44:29 +00:00
Tom Stellard	3f1c6fe156	AMDGPU/GlobalISel: Implement select() for G_IMPLICIT_DEF Reviewers: arsenm, nhaehnle Subscribers: kzhuravl, wdng, yaxunl, rovka, kristof.beyls, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D46150 llvm-svn: 335307	2018-06-21 23:38:20 +00:00
Konstantin Zhuravlyov	e004b3d97b	AMDGPU: Remove ability to reserve VGPRs for debugger Differential Revision: https://reviews.llvm.org/D48234 llvm-svn: 335288	2018-06-21 20:28:19 +00:00
Scott Linder	1e8c2c705d	[AMDGPU] Update assembler for HSA Code Object v3 Update AMDGPU assembler syntax behind the code-object-v3 feature: * Replace/rename most AMDGPU assembler directives/symbols and document them. * Provide more diagnostics (e.g. values out of range, missing values, repeated values). * Provide path for backwards compatibility, even with underlying descriptor changes. Differential Revision: https://reviews.llvm.org/D47736 llvm-svn: 335281	2018-06-21 19:38:56 +00:00
Scott Linder	5792dd0f39	[AMDGPU] Fix bug with tracking processed blocks in SIInsertWaitcnts BlockWaitcntProcessedSet was not being cleared between calls, so it was producing incorrect counts in cases where MBB addresses happened to coincide across multiple calls. Differential Revision: https://reviews.llvm.org/D48391 llvm-svn: 335268	2018-06-21 18:48:48 +00:00
Konstantin Zhuravlyov	766c77efd7	AMDGPU/AMDHSA: Remove GridWorkGroupCountX/Y/Z and everything that comes with it from implementation and v3 header files. Leave definition in v2 header files for backwards compatibility. Differential Revision: https://reviews.llvm.org/D48191 llvm-svn: 335267	2018-06-21 18:36:04 +00:00
Nicolai Haehnle	15745ba5c1	AMDGPU: Remove redundant MIMG instruction variants Summary: For sample and gather ops, we can accurately determine the set of vaddr-size instruction variants that are required. This reduces the size of instruction tables by ~5%. The number of machine instruction opcodes is reduced from 10002 to 9476. Change-Id: Ie7fc65d3657b762c7816017fe70b2e9bec644a8a Reviewers: arsenm, rampitec Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, llvm-commits, t-tye Differential Revision: https://reviews.llvm.org/D48168 llvm-svn: 335232	2018-06-21 13:37:55 +00:00
Nicolai Haehnle	db6911a6f9	AMDGPU: Remove old-style image intrinsics Summary: This also removes the need for atomic pseudo instructions, since we select the correct encoding directly in SITargetLowering::lowerImage for dimension-aware image intrinsics. Mesa uses dimension-aware image intrinsics since commit a9a7993441. Change-Id: I7473d20009476a4ed6d919cae4e6dca9ff42e77a Reviewers: arsenm, rampitec, mareko, tpr, b-sumner Subscribers: kzhuravl, wdng, yaxunl, dstuttard, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D48167 llvm-svn: 335231	2018-06-21 13:37:45 +00:00
Nicolai Haehnle	7a9c03f484	AMDGPU: Select MIMG instructions manually in SITargetLowering Summary: Having TableGen patterns for image intrinsics is hitting limitations: for D16 we already have to manually pre-lower the packing of data values, and we will have to do the same for A16 eventually. Since there is already some custom C++ code anyway, it is arguably easier to just do everything in C++, now that we can use the beefed-up generic tables backend of TableGen to provide all the required metadata and map intrinsics to corresponding opcodes. With this approach, all image intrinsic lowering happens in SITargetLowering::lowerImage. That code is dense due to all the cases that it handles, but it should still be easier to follow than what we had before, by virtue of it all being done in a single location, and by virtue of not relying on the TableGen pattern magic that very few people really understand. This means that we will have MachineSDNodes with MIMG instructions during DAG combining, but that seems alright: previously we had intrinsic nodes instead, but those are similarly opaque to the generic CodeGen infrastructure, and the final pattern matching just did a 1:1 translation to machine instructions anyway. If anything, the fact that we now merge the address words into a vector before DAG combine should be an advantage. Change-Id: I417f26bd88f54ce9781c1668acc01f3f99774de6 Reviewers: arsenm, rampitec, rtaylor, tstellar Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D48017 llvm-svn: 335228	2018-06-21 13:36:57 +00:00
Nicolai Haehnle	0ab200b6c9	AMDGPU: Refactor MIMG instruction TableGen using generic tables Summary: This allows us to access rich information about MIMG opcodes from C++ code. Simplifying the mapping between equivalent opcodes of different data size becomes quite natural. This also flattens the MIMG-related class and multiclass hierarchy a little, and collapses together some of the scaffolding for sample and gather4 opcodes. Change-Id: I1a2549fdc1e881ff100e5393d2d87e73729a0ccd Reviewers: arsenm, rampitec Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D48016 llvm-svn: 335227	2018-06-21 13:36:44 +00:00
Nicolai Haehnle	e741d7e0fd	AMDGPU: Use generic tables instead of SearchableTable Summary: Reviewers: arsenm, rampitec Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D48014 Change-Id: Ibb43f90d955275571aff17d0c3ecfb5e5b299641 llvm-svn: 335226	2018-06-21 13:36:33 +00:00
Nicolai Haehnle	2367f03565	AMDGPU: Pass AMDGPUSampleVariant to MIMG_{Sampler,Gather}(_WQM) Summary: This will allows us to provide rich metadata about the instructions in tables that are accessible by custom C++ code. Change-Id: Id9305a26304ab6a6cceb6c65c8cd49141cc0101d Reviewers: arsenm, rampitec Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D48011 llvm-svn: 335224	2018-06-21 13:36:13 +00:00
Nicolai Haehnle	b3a9b68513	AMDGPU: Add implicit def of SCC to kill and indirect pseudos Summary: Kill instructions sometimes do use SCC in unusual circumstances, when v_cmpx cannot be used due to the operands that are involved. Additionally, even if SCC was never defined by the expansion, kill pseudos could previously occur between an s_cmp and an s_cbranch_scc, which breaks the SCC liveness tracking when the pseudo is expanded to split the basic block. While it would be possible to explicitly mark the SCC as live-in for the successor basic block, it's simpler to just mark the pseudo as using SCC, so that such a sequence is never emitted by instruction selection in the first place. A similar issue affects indirect source/dest pseudos in principle, although I haven't been able to come up with a test case where it actually matters (this affects instruction selection, so a MIR test can't be used). Fixes: dEQP-GLES3.functional.shaders.discard.dynamic_loop_always Change-Id: Ica8d82ecff1a763b892a1112cf1b06c948863a4f Reviewers: arsenm, rampitec Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D47761 llvm-svn: 335223	2018-06-21 13:36:08 +00:00
Nicolai Haehnle	f267431901	AMDGPU: Turn D16 for MIMG instructions into a regular operand Summary: This allows us to reduce the number of different machine instruction opcodes, which reduces the table sizes and helps flatten the TableGen multiclass hierarchies. We can do this because for each hardware MIMG opcode, we have a full set of IMAGE_xxx_Vn_Vm machine instructions for all required sizes of vdata and vaddr registers. Instead of having separate D16 machine instructions, a packed D16 instructions loading e.g. 4 components can simply use the same V2 opcode variant that non-D16 instructions use. We still require a TSFlag for D16 buffer instructions, because the D16-ness of buffer instructions is part of the opcode. Renaming the flag should help avoid future confusion. The one non-obvious code change is that for gather4 instructions, the disassembler can no longer automatically decide whether to use a V2 or a V4 variant. The existing logic which choose the correct variant for other MIMG instruction is extended to cover gather4 as well. As a bonus, some of the assembler error messages are now more helpful (e.g., complaining about a wrong data size instead of a non-existing instruction). While we're at it, delete a whole bunch of dead legacy TableGen code. Change-Id: I89b02c2841c06f95e662541433e597f5d4553978 Reviewers: arsenm, rampitec, kzhuravl, artem.tamazov, dp, rtaylor Subscribers: wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D47434 llvm-svn: 335222	2018-06-21 13:36:01 +00:00
Matt Arsenault	5a4ec8127f	AMDGPU: Fix scalar_to_vector for v4i16/v4f16 llvm-svn: 335161	2018-06-20 19:45:48 +00:00
Matt Arsenault	3d06668ad4	AMDGPU: Fix missing C++ mode comment llvm-svn: 335160	2018-06-20 19:45:40 +00:00
Stanislav Mekhanoshin	3b11794dbf	[AMDGPU] setcc (select cc, CT, CF), CF, eq \| ne -> xor cc, -1 \| cc This is the common case in the BE when we serialize condition and then rematerialize it. Use either original or inverted condition. Differential Revision: https://reviews.llvm.org/D48246 llvm-svn: 334882	2018-06-16 03:46:59 +00:00
Matt Arsenault	63bc0e3cb9	AMDGPU: Add combine for short vector extract_vector_elts Try to access pieces 4 bytes at a time. This helps various hasOneUse extract_vector_elt combines, such as load width reductions. Avoids test regressions in a future commit. llvm-svn: 334836	2018-06-15 15:31:36 +00:00
Matt Arsenault	02dc7e19e2	AMDGPU: Make v4i16/v4f16 legal Some image loads return these, and it's awkward working around them not being legal. llvm-svn: 334835	2018-06-15 15:15:46 +00:00
Roman Lebedev	dec562c849	[AMDGPU] Recognize x & ~(-1 << y) pattern. Summary: The same pattern as D48010, but this one is IR-canonical as of D47428. Reviewers: nhaehnle, bogner, tstellar, arsenm Reviewed By: arsenm Subscribers: arsenm, kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Tags: #amdgpu Differential Revision: https://reviews.llvm.org/D48012 llvm-svn: 334817	2018-06-15 09:56:45 +00:00
Roman Lebedev	9c17dad8f2	[AMDGPU] Recognize x & ((1 << y) - 1) pattern. Summary: As a followup for D48007. Since we already handle `x << (bitwidth - y) >> (bitwidth - y)` pattern, which does not have ub for both the edge cases (`y == 0`, `y == bitwidth`), i think also handling a pattern that is ub for `y == bitwidth` should be fine. Reviewers: nhaehnle, bogner, tstellar, arsenm Reviewed By: arsenm Subscribers: arsenm, kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Tags: #amdgpu Differential Revision: https://reviews.llvm.org/D48010 llvm-svn: 334816	2018-06-15 09:56:39 +00:00
Roman Lebedev	aa8587d1fc	[AMDGPU] Recognize x & (-1 >> (32 - y)) pattern. Summary: D47980 will canonicalize the `x << (32 - y) >> (32 - y)`, which is the pattern the AMDGPU expects to `x & (-1 >> (32 - y))`, which is not recognized by AMDGPU. Thus, it needs to be recognized, too. Reviewers: nhaehnle, bogner, tstellar, arsenm Reviewed By: arsenm Subscribers: arsenm, kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Tags: #amdgpu Differential Revision: https://reviews.llvm.org/D48007 llvm-svn: 334815	2018-06-15 09:56:31 +00:00
Tom Stellard	a92847359a	AMDGPU/GlobalISel: Implement select() for @llvm.amdgcn.cvt.pkrtz Reviewers: arsenm, nhaehnle Reviewed By: arsenm Subscribers: kzhuravl, wdng, yaxunl, rovka, kristof.beyls, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D45907 llvm-svn: 334757	2018-06-14 19:26:37 +00:00
Tom Stellard	46bbbc33c0	AMDGPU/GlobalISel: Implement select() for 32-bit G_FADD and G_FMUL Reviewers: arsenm, nhaehnle Reviewed By: arsenm Subscribers: kzhuravl, wdng, yaxunl, rovka, kristof.beyls, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D46171 llvm-svn: 334665	2018-06-13 22:30:47 +00:00
Stanislav Mekhanoshin	7bec57300c	[AMDGPU] Corrected computeKnownBits for V_PERM_B32 Differential Revision: https://reviews.llvm.org/D48133 llvm-svn: 334640	2018-06-13 18:52:54 +00:00
Yaxun Liu	fb17bf60dd	[AMDGPU] Change enqueue kernel handle type Currently the handle type is a global pointer which holds 8 bytes. We need a larger type which hold 16 bytes, therefore change it to [i64 x 2]. Differential Revision: https://reviews.llvm.org/D48094 llvm-svn: 334625	2018-06-13 17:31:51 +00:00
Dmitry Preobrazhensky	32c6b5cb70	[AMDGPU][MC] Enabled parsing of relocations on VALU instructions See bug 37566: https://bugs.llvm.org/show_bug.cgi?id=37566 Reviewers: artem.tamazov, arsenm, nhaehnle Differential Revision: https://reviews.llvm.org/D47884 llvm-svn: 334622	2018-06-13 17:02:03 +00:00
Dmitry Preobrazhensky	ffbee7acdc	[AMDGPU][MC][GFX8][GFX9] Allow LDS direct reads for BUFFER_LOAD_DWORDX2/X3/X4 See bug 37653: https://bugs.llvm.org/show_bug.cgi?id=37653 Reviewers: artem.tamazov, arsenm Differential Revision: https://reviews.llvm.org/D47885 llvm-svn: 334609	2018-06-13 15:32:46 +00:00
Tom Stellard	264c171f36	AMDGPU: Move isSDNodeSourceOfDivergence() implementation to SITargetLowering Summary: The code that handles ISD:Register and ISD::CopyFromReg assumes the target is amdgcn, so this is broken on r600. We don't need this analysis on r600 anyway so we can safely move it to SITargetLowering. Reviewers: alex-t, arsenm, nhaehnle Reviewed By: arsenm Subscribers: msearles, kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D46298 llvm-svn: 334607	2018-06-13 15:06:37 +00:00
Stanislav Mekhanoshin	8fd3c4e431	[AMDGPU] DAG combine to produce V_PERM_B32 Differential Revision: https://reviews.llvm.org/D48099 llvm-svn: 334559	2018-06-12 23:50:37 +00:00
Konstantin Zhuravlyov	ce25bc3e82	AMDHSA/NFC: Code object v3 updates (additional): - Move section selection and alignment to AMDGPUAsmPrinter llvm-svn: 334521	2018-06-12 18:33:51 +00:00
Konstantin Zhuravlyov	00f2cb1116	AMDHSA: Code object v3 updates - Do not emit following assembler directives: - .hsa_code_object_version - .hsa_code_object_isa - .amd_amdgpu_isa - .amd_amdgpu_hsa_metadata - .amd_amdgpu_pal_metadata - Do not emit .note entries - Cleanup and bring in sync kernel descriptor header file - Emit kernel descriptor into .rodata with appropriate relocations and alignments llvm-svn: 334519	2018-06-12 18:02:46 +00:00
Mark Searles	987f292c56	[AMDGPU] prevent hitting Assertion `isReg() && "Wrong MachineOperand accessor"' The use iterator, used within findMaskOperands(), can return anything which is not a def. isUse() requires a register, so check isReg() before calling isUse(). Differential Revision: https://reviews.llvm.org/D48047 llvm-svn: 334459	2018-06-12 00:41:26 +00:00
George Burgess IV	c72204d5b5	Simplify; NFC Not shown in the diff: AQ is a `vector<SUnit >`, and SU is a `SUnit ` llvm-svn: 334451	2018-06-11 22:58:32 +00:00
Konstantin Zhuravlyov	3e5d66ac66	AMDGPU: Add 64-bit relative variant kind Differential Revision: https://reviews.llvm.org/D47601 llvm-svn: 334443	2018-06-11 21:37:57 +00:00
Stanislav Mekhanoshin	7ba3fc730c	[AMDGPU] Do not consider indirect acces through phi for wave limiter Rational: if there is indirect access that is usually an issue because load is not ready by the use. However, if use is inside a loop and load is outside that is potentially an issue for a first iteration only. Differential Revision: https://reviews.llvm.org/D47740 llvm-svn: 334420	2018-06-11 16:50:49 +00:00
Daniil Fukalov	c9a098b314	[AMDGPU] Inline asm - added i16, half and i128 types support AMDGPU inline assembler support i16, half and i128 typed variables in constraints, but they were reported as error. Needed to fix https://github.com/RadeonOpenCompute/ROCm/issues/341, e.g. to be able to load with global_load_dwordx4 to a 128bit integer variable Differential Revision: https://reviews.llvm.org/D44920 llvm-svn: 334301	2018-06-08 16:29:04 +00:00
Matt Arsenault	6fc3759811	AMDGPU: Error on LDS global address in functions These won't work as expected now, so error on them to avoid wasting time debugging this in the future. llvm-svn: 334269	2018-06-08 08:05:54 +00:00
Tony Tye	6db1f5da4f	[AMDGPU] Simplify memory legalizer (add missing virtual descructor) Differential Revision: https://reviews.llvm.org/D47504 llvm-svn: 334257	2018-06-08 01:00:11 +00:00
Tony Tye	a5a7c331e7	[AMDGPU] Simplify memory legalizer - Make code easier to maintain. - Avoid generating waitcnts for VMEM if the address sppace does not involve VMEM. - Add support to generate waitcnts for LDS and GDS memory. Differential Revision: https://reviews.llvm.org/D47504 llvm-svn: 334241	2018-06-07 22:28:32 +00:00
Matt Arsenault	f1c868ef08	AMDGPU: Fix not including v2f64 in SReg_128 Fixes assertion with calls returning v2f64. llvm-svn: 334189	2018-06-07 12:16:31 +00:00
Matt Arsenault	697300bd4f	AMDGPU: Use scalar operations for f16 fabs/fneg patterns Fixes unnecessary differences between subtargets. llvm-svn: 334184	2018-06-07 10:15:20 +00:00
Matt Arsenault	90083d3088	AMDGPU: Try a lot harder to emit scalar loads This has two main components. First, widen widen short constant loads in DAG when they have the correct alignment. This is already done a bit in AMDGPUCodeGenPrepare, since that has access to DivergenceAnalysis. This can't help kernarg loads created in the DAG. Start to use DAG divergence analysis to help this case. The second part is to avoid kernel argument lowering breaking the alignment of short vector elements because calling convention lowering wants to split everything into legal register types. When loading a split type, load the nearest 4-byte aligned segment and shift to get the desired bits. This extra load of the earlier argument piece ends up merging, and the bit extract hopefully folds out. There are a number of improvements and regressions with this, but I think as-is this is a better compromise between several of the worst parts of SelectionDAG. Particularly when i16 is legal, this produces worse code for i8 and i16 element vector kernel arguments. This is partially due to the very weak load merging the DAG does. It only looks for fairly specific combines between pairs of loads which no longer appear. In particular this causes v4i16 loads to be split into 2 components when previously the two halves were merged. Worse, because of the newly introduced shifts, there is a lot more unnecessary vector packing and unpacking code emitted. At least some of this is due to reporting false for isTypeDesirableForOp for i16 as a workaround for the lack of divergence information in the DAG. The cases where this happens it doesn't actually matter, but the relevant code in SimplifyDemandedBits doens't have the context to know to ignore this. The use of the scalar cache is probably more important than the mess of mostly scalar instructions doing this packing and unpacking. Future work can fix this, possibly by making better use of the new DAG divergence information for controlling promotion decisions, or adding another version of shift + trunc + shift combines that doesn't only know about the used types. llvm-svn: 334180	2018-06-07 09:54:49 +00:00
Stanislav Mekhanoshin	df61be70b2	[AMDGPU] Improve reciprocal handling When denormals are supported we are producing a full division for 1.0f / x. That still can be replaced by the faster version: bool c = fabs(x) > 0x1.0p+96f; float s = c ? 0x1.0p-32f : 1.0f; x = s; return s v_rcp_f32(x) in case if requested accuracy is 2.5ulp or less. The same version is used if denormals are not supported for non 1.0 numerators, where just v_rcp_f32 is then used for 1.0 numerator. The optimization of 1/x is extended to the case -1/x, which is the same except for the resulting sign bit. OpenCL conformance passed with both enabled and disabled denorms. Differential Revision: https://reviews.llvm.org/D47805 llvm-svn: 334142	2018-06-06 22:22:32 +00:00
Matt Arsenault	e9524f1fb3	AMDGPU: Custom lower v2f16 fneg/fabs with illegal f16 Fixes terrible code on targets without f16 support. The legalization creates a mess that is difficult to recover from. Also should avoid randomly breaking these tests multiple times in sequence in future commits. Some regressions in cases where it happens to be better to pull the source modifier after the conversion. llvm-svn: 334132	2018-06-06 21:28:11 +00:00
Peter Smith	57f661bd7d	[MC] Pass MCSubtargetInfo to fixupNeedsRelaxation and applyFixup On targets like Arm some relaxations may only be performed when certain architectural features are available. As functions can be compiled with differing levels of architectural support we must make a judgement on whether we can relax based on the MCSubtargetInfo for the function. This change passes through the MCSubtargetInfo for the function to fixupNeedsRelaxation so that the decision on whether to relax can be made per function. In this patch, only the ARM backend makes use of this information. We must also pass the MCSubtargetInfo to applyFixup because some fixups skip error checking on the assumption that relaxation has occurred, to prevent code-generation errors applyFixup must see the same MCSubtargetInfo as fixupNeedsRelaxation. Differential Revision: https://reviews.llvm.org/D44928 llvm-svn: 334078	2018-06-06 09:40:06 +00:00
Matt Arsenault	57e541e87e	AMDGPU: Preserve metadata when widening loads Preserves the low bound of the !range. I don't think it's legal to do anything with the top half since it's theoretically reading garbage. llvm-svn: 334045	2018-06-05 19:52:56 +00:00
Matt Arsenault	9224c00d2b	AMDGPU: Use more custom insert/extract_vector_elt lowering Apply to i8 vectors. llvm-svn: 334044	2018-06-05 19:52:46 +00:00
David Blaikie	31b98d2e99	Move Analysis/Utils/Local.h back to Transforms Review feedback from r328165. Split out just the one function from the file that's used by Analysis. (As chandlerc pointed out, the original change only moved the header and not the implementation anyway - which was fine for the one function that was used (since it's a template/inlined in the header) but not in general) llvm-svn: 333954	2018-06-04 21:23:21 +00:00
Stanislav Mekhanoshin	838c07c531	[AMDGPU] Small refactoring in the scheduler After last changes some code can be simplified. Differential Revision: https://reviews.llvm.org/D47661 llvm-svn: 333934	2018-06-04 17:57:40 +00:00
Stanislav Mekhanoshin	28624f94d5	[AMDGPU] Factored out common part of GCNRPTracker::reset() Differential Revision: https://reviews.llvm.org/D47664 llvm-svn: 333931	2018-06-04 17:21:54 +00:00
Mark Searles	f0b93f1e9e	[AMDGPU][Waitcnt] Fix handling of flat instrs On GFX9 and earlier, flat memory ops may decrement VMCNT out-of-order as well as LGKMCNT out-of-order. Differential Revision: https://reviews.llvm.org/D46616 llvm-svn: 333926	2018-06-04 16:51:59 +00:00
Nicolai Haehnle	59198ed040	AMDGPU: Make various NamedOperands upper case Summary: Avoid name clashes with the corresponding bit fields in the instruction encoding. Change-Id: Id1644e703e976e78f7af93788d9f44cb48c3251f Reviewers: arsenm, rampitec, kzhuravl Subscribers: wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D47433 llvm-svn: 333905	2018-06-04 14:45:20 +00:00
Nicolai Haehnle	01d261f18d	TableGen: Streamline the semantics of NAME Summary: The new rules are straightforward. The main rules to keep in mind are: 1. NAME is an implicit template argument of class and multiclass, and will be substituted by the name of the instantiating def/defm. 2. The name of a def/defm in a multiclass must contain a reference to NAME. If such a reference is not present, it is automatically prepended. And for some additional subtleties, consider these: 3. defm with no name generates a unique name but has no special behavior otherwise. 4. def with no name generates an anonymous record, whose name is unique but undefined. In particular, the name won't contain a reference to NAME. Keeping rules 1&2 in mind should allow a predictable behavior of name resolution that is simple to follow. The old "rules" were rather surprising: sometimes (but not always), NAME would correspond to the name of the toplevel defm. They were also plain bonkers when you pushed them to their limits, as the old version of the TableGen test case shows. Having NAME correspond to the name of the toplevel defm introduces "spooky action at a distance" and breaks composability: refactoring the upper layers of a hierarchy of nested multiclass instantiations can cause unexpected breakage by changing the value of NAME at a lower level of the hierarchy. The new rules don't suffer from this problem. Some existing .td files have to be adjusted because they ended up depending on the details of the old implementation. Change-Id: I694095231565b30f563e6fd0417b41ee01a12589 Reviewers: tra, simon_tatham, craig.topper, MartinO, arsenm, javed.absar Subscribers: wdng, llvm-commits Differential Revision: https://reviews.llvm.org/D47430 llvm-svn: 333900	2018-06-04 14:26:05 +00:00
Amaury Sechet	8467411dad	Set ADDE/ADDC/SUBE/SUBC to expand by default Summary: They've been deprecated in favor of UADDO/ADDCARRY or USUBO/SUBCARRY for a while. Target that uses these opcodes are changed in order to ensure their behavior doesn't change. Reviewers: efriedma, craig.topper, dblaikie, bkramer Subscribers: jholewinski, arsenm, jyknight, sdardis, nemanjai, nhaehnle, kbarton, fedor.sergeev, asb, rbar, johnrusso, simoncook, jordy.potman.lists, apazos, sabuasal, niosHD, jrtc27, zzheng, edward-jones, mgrang, atanasyan, llvm-commits Differential Revision: https://reviews.llvm.org/D47422 llvm-svn: 333748	2018-06-01 13:21:33 +00:00
Tom Stellard	e43778895c	AMDGPU/R600: Move intrinsics to IntrinsicsAMDGPU.td Reviewers: arsenm, nhaehnle, jvesely Reviewed By: arsenm Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, llvm-commits, t-tye Differential Revision: https://reviews.llvm.org/D47487 llvm-svn: 333720	2018-06-01 02:19:46 +00:00
Stanislav Mekhanoshin	739174c4be	[AMDGPU] Construct memory clauses before RA Memory clauses are formed into bundles in presence of xnack. Their source operands are marked as early-clobber. This allows to allocate distinct source and destination registers within a clause and prevent breaking the clause with s_nop in the hazard recognizer. Clauses are undone before post-RA scheduler to allow some rescheduling, which will not break the clause since artificial edges are created in the dag to keep memory operations together. Yet this allows a better ILP in some cases. Differential Revision: https://reviews.llvm.org/D47511 llvm-svn: 333691	2018-05-31 20:13:51 +00:00
Roman Tereshin	76c29c68dc	[GlobalISel][AMDGPU] LegalizerInfo verifier: Adding LegalizerInfo::verify(...) call for AMDGPU Reviewers: aemerson, qcolombet Reviewed By: qcolombet Differential Revision: https://reviews.llvm.org/D46339 llvm-svn: 333664	2018-05-31 16:16:48 +00:00
Stanislav Mekhanoshin	d4b500cb08	[AMDGPU] Track occupancy in MFI Keep track of achieved occupancy in SIMachineFunctionInfo. At the moment we have a lot of duplicated or even missed code to query and maintain occupancy info. Record it in the MFI and query in a single call. Interfaces: - getOccupancy() - returns current recorded achieved occupancy. - getMinAllowedOccupancy() - returns lesser of the achieved occupancy and the lowest occupancy we are ready to tolerate. For example if a kernel is memory bound we are ready to tolerate 4 waves. - limitOccupancy() - record occupancy level if we have to lower it. - increaseOccupancy() - record occupancy if scheduler managed to increase the occupancy. MFI takes care of integrating different checks affecting occupancy, including LDS use and waves-per-eu attribute. Note that scheduler starts with not yet known register pressure, so has to record either limit or increase in occupancy after it is done. Later passes can just query a resulting value. New interface is used in the active scheduler and NFC wrt its work. Changes are also made to experimental schedulers to use it and record an occupancy after they are done. Before the change waves-per-eu was ignored by experimental schedulers and tolerance window for memory bound kernels was not used. Differential Revision: https://reviews.llvm.org/D47509 llvm-svn: 333629	2018-05-31 05:36:04 +00:00
Jan Vesely	f5016b79a6	AMDGPU/R600: Make sure functions are cacheline aligned v2: use "ensureAlignment" make functions cache line aligned Fixes GPU hangs since r333219: "AMDGPU: Split R600 AsmPrinter code into its own class" Differential Revision: https://reviews.llvm.org/D47516 llvm-svn: 333622	2018-05-31 04:08:08 +00:00
Tom Stellard	c7624317d7	AMDGPU: Split AMDGPUTTI into GCNTTI and R600TTI Reviewers: arsenm, nhaehnle Reviewed By: arsenm Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, llvm-commits, t-tye Differential Revision: https://reviews.llvm.org/D47359 llvm-svn: 333605	2018-05-30 22:55:35 +00:00
Mark Searles	ed54ff1d51	[AMDGPU][Waitcnt] Fix build error: unused variable 'SWaitInst' https://reviews.llvm.org/rL333556 caused a buildbot failure. See http://lab.llvm.org:8011/builders/lld-x86_64-darwin13/builds/21876/steps/build_Lld/logs/stdio /Users/buildslave/as-bldslv9/lld-x86_64-darwin13/llvm.src/lib/Target/AMDGPU/SIInsertWaitcnts.cpp:2007:10: error: unused variable 'SWaitInst' [-Werror,-Wunused-variable] auto SWaitInst = BuildMI(EntryBB, EntryBB.getFirstNonPHI(), The unused variable was for debugging purposes; removing that piece of code to fix the build. llvm-svn: 333559	2018-05-30 16:27:57 +00:00
Matt Arsenault	7b4826e6ce	AMDGPU: Use better alignment for kernarg lowering This was just emitting loads with the ABI alignment for the raw type. The true alignment is often better, especially when an illegal vector type was scalarized. The better alignment allows using a scalar load more often. llvm-svn: 333558	2018-05-30 16:17:51 +00:00
Mark Searles	1054541490	[AMDGPU][Waitcnt] Fix handling of loops with many bottom blocks In terms of waitcnt insertion/if necessary, the waitcnt pass forces convergence for a loop. Previously, that kicked if greater than 2 passes over a loop, which doesn't account for loop with many bottom blocks. So, increase the threshold to (n+1), where n is the number of bottom blocks. This gives the pass an opportunity to consider the contribution of each bottom block, to the overall loop, before the forced convergence potentially kicks in. Differential Revision: https://reviews.llvm.org/D47488 llvm-svn: 333556	2018-05-30 15:47:45 +00:00
Matt Arsenault	2e4d338d16	AMDGPU: Fix typo in option description llvm-svn: 333457	2018-05-29 19:35:46 +00:00
Matt Arsenault	1ea0402e82	AMDGPU: Round up kernel argument allocation size AFAIK the driver's allocation will actually have to round this up anyway. It is useful to track the rounded up size, so that the end of the kernel segment is known to be dereferencable so a wider s_load_dword can be used for a short argument at the end of the segment. llvm-svn: 333456	2018-05-29 19:35:00 +00:00
Konstantin Zhuravlyov	2ca6b1f2ba	AMDGPU: Always set COMPUTE_PGM_RSRC2.ENABLE_TRAP_HANDLER to zero for AMDHSA as it is set by CP Differential Revision: https://reviews.llvm.org/D47392 llvm-svn: 333451	2018-05-29 19:09:13 +00:00
Matt Arsenault	ceafc55e5a	AMDGPU: Pass function directly instead of MachineFunction These functions just query the underlying IR function, so pass it directly. llvm-svn: 333442	2018-05-29 17:42:50 +00:00
Matt Arsenault	2fb9ccf770	AMDGPU: Add nuw to add off of kernarg ptr llvm-svn: 333441	2018-05-29 17:42:38 +00:00
Tom Stellard	57b9342c80	AMDGPU: Split R600 MCInst lowering into its own class Reviewers: arsenm, nhaehnle Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, llvm-commits, t-tye Differential Revision: https://reviews.llvm.org/D47307 llvm-svn: 333439	2018-05-29 17:41:59 +00:00
Tim Renouf	fa213f797b	[AMDGPU] Fixed build warning Summary: V2: Use cast instead of extra if. Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D47426 Change-Id: I6ac31da0306f79706960284a7ebd7b9c6237a83a llvm-svn: 333397	2018-05-29 08:15:37 +00:00
Farhana Aleen	eacb1020aa	[AMDGPU] Re-enabled 128bit wide-vector generation for local addr space by default. Summary: Bug reported here https://bugs.freedesktop.org/show_bug.cgi?id=105464 found to be resolved by some other fixes. Author: FarhanaAleen llvm-svn: 333380	2018-05-28 18:15:11 +00:00
Tim Renouf	364edcd2e5	[AMDGPU] Fixed WWM bug in block otherwise entirely in WQM Summary: For a block with WQM on entry and exit and containing no exact mode code, but containing some WWM code, the WQM pass forgot to process the block at all and so did not insert code to enter and leave WWM. This commit fixes that. Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D47027 Change-Id: I044792eead1293bed4203fb26ce75f47878afeb6 llvm-svn: 333362	2018-05-27 17:26:11 +00:00
Mark Searles	32efedcff3	[AMDGPU][Waitcnt] Remove obsolete waitcnt option With the removal of the old waitcnt pass, the '-enable-si-insert-waitcnts' option is obsolete. Remove it. Differential Revision: https://reviews.llvm.org/D47378 llvm-svn: 333303	2018-05-25 20:24:08 +00:00
Stanislav Mekhanoshin	7fc1cee051	[AMDGPU] Fixed test failure with AMDGPUPerfHint We shall not keep iterator to a map while map is modified, this leads to a broken map. llvm-svn: 333298	2018-05-25 18:46:58 +00:00
Reid Kleckner	cb48efd585	Fix -Winconsistent-missing-overrides in AMDGPU code llvm-svn: 333291	2018-05-25 17:46:24 +00:00
Stanislav Mekhanoshin	1c538423dc	[AMDGPU] Add perf hints to functions This is adoption of HSAIL perfhint pass. Two types of hints are produced: 1. Function is memory bound. 2. Kernel can use wave limiter. Currently these hints are used in the scheduler. If a function is suspected to be memory bound we allow occupancy to decrease to 4 waves in the course of scheduling. Differential Revision: https://reviews.llvm.org/D46992 llvm-svn: 333289	2018-05-25 17:25:12 +00:00
Tim Renouf	ad8b7c1190	[AMDGPU] Fixed incorrect break from loop Summary: Lower control flow did not correctly handle the case that a loop break in if/else was on a condition that was not guaranteed to be masked by exec. The first test kernel shows an example of this going wrong; after exiting the loop, exec is all ones, even if it was not before the loop. The fix is for lowering of if-break and else-break to insert an S_AND_B64 to mask the break condition with exec. This commit also includes the optimization of not inserting that S_AND_B64 if it is obviously not needed because the break condition is the result of a V_CMP in the same basic block. V2: Addressed some review comments. V3: Test fixes. Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D44046 Change-Id: I0fc56a01209a9e99d1d5c9b0ffd16f111caf200c llvm-svn: 333258	2018-05-25 07:55:04 +00:00
Tom Stellard	79fffe3515	AMDGPU: Remove AMDGPUMCInstLower.h Summary: The AMDGPUMCInstLower class is not used outside AMDGPUMCInstLower.cpp, so we don't need a header file. Reviewers: arsenm, nhaehnle Reviewed By: arsenm Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D47264 llvm-svn: 333254	2018-05-25 04:57:02 +00:00
Tom Stellard	c501501055	AMDGPU: Split R600 AsmPrinter code into its own class Reviewers: arsenm, nhaehnle Reviewed By: arsenm Subscribers: kzhuravl, wdng, mgorny, yaxunl, dstuttard, tpr, llvm-commits, t-tye Differential Revision: https://reviews.llvm.org/D47245 llvm-svn: 333219	2018-05-24 20:02:01 +00:00
Tom Stellard	1b95fed6f7	AMDGPU/R600: Remove code for handling AMDGPUISD::CLAMP Summary: We don't generate AMDGPUISD::CLAMP for R600 now that llvm.AMDGPU.clamp is gone. Reviewers: arsenm, nhaehnle Reviewed By: arsenm Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D47181 llvm-svn: 333153	2018-05-24 05:28:34 +00:00
Matt Arsenault	606bc315d6	AMDGPU: Fix v2f16 fneg/fabs pattern The integer operation convertion for some reason only happens if the source is a bitcast from an integer, which happens to always be the situation when the result is loaded. Add an additional pattern for when the source operation is really an FP operation. llvm-svn: 333019	2018-05-22 20:13:34 +00:00
Tom Stellard	b12f4dec08	AMDGPU: Move AMDGPUTargetLowering::isFPExtFoldable() into SITargetLowering Summary: This is always false for R600. Reviewers: arsenm, nhaehnle Reviewed By: arsenm Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D47180 llvm-svn: 333016	2018-05-22 19:37:55 +00:00
Matt Arsenault	1349a04ef5	AMDGPU: Make v2i16/v2f16 legal on VI This usually results in better code. Fixes using inline asm with short2, and also fixes having a different ABI for function parameters between VI and gfx9. Partially cleans up the mess used for lowering of the d16 operations. Making v4f16 legal will help clean this up more, but this requires additional work. llvm-svn: 332953	2018-05-22 06:32:10 +00:00
Tom Stellard	44b30b4537	AMDGPU: Remove #include "MCTargetDesc/AMDGPUMCTargetDesc.h" from common headers Summary: MCTargetDesc/AMDGPUMCTargetDesc.h contains enums for all the instuction and register defintions, which are huge so we only want to include them where needed. This will also make it easier if we want to split the R600 and GCN definitions into separate tablegenerated files. I was unable to remove AMDGPUMCTargetDesc.h from SIMachineFunctionInfo.h because it uses some enums from the header to initialize default values for the SIMachineFunction class, so I ended up having to remove includes of SIMachineFunctionInfo.h from headers too. Reviewers: arsenm, nhaehnle Reviewed By: nhaehnle Subscribers: MatzeB, kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, javed.absar, llvm-commits Differential Revision: https://reviews.llvm.org/D46272 llvm-svn: 332930	2018-05-22 02:03:23 +00:00
Peter Collingbourne	dcd7d6c331	MC: Separate creating a generic object writer from creating a target object writer. NFCI. With this we gain a little flexibility in how the generic object writer is created. Part of PR37466. Differential Revision: https://reviews.llvm.org/D47045 llvm-svn: 332868	2018-05-21 19:20:29 +00:00
Stanislav Mekhanoshin	9badad2051	[AMDGPU] Add divergence analysis as a dependency for ISel AMDGPUDAGToDAGISel adds DivergenceAnalysis in getAnalysisUsage but does not list it in pass dependencies which may lead to crash. Differential Revision: https://reviews.llvm.org/D47151 llvm-svn: 332862	2018-05-21 18:18:52 +00:00
Peter Collingbourne	571a3301ae	MC: Change MCAsmBackend::writeNopData() to take a raw_ostream instead of an MCObjectWriter. NFCI. To make this work I needed to add an endianness field to MCAsmBackend so that writeNopData() implementations know which endianness to use. Part of PR37466. Differential Revision: https://reviews.llvm.org/D47035 llvm-svn: 332857	2018-05-21 17:57:19 +00:00
Tom Stellard	a91ce17b5f	AMDGPU/GlobalISel: Address post-commit review comments for r332379 MCRegisterInfo::getPhysRegSize() will be deprecated. llvm-svn: 332856	2018-05-21 17:49:31 +00:00
Simon Pilgrim	ede0e4073e	Fix MSVC unused variable warning. NFCI. AMDGPURegisterInfo::getSubRegFromChannel is a static method - we don't need to get the AMDGPURegisterInfo instance. llvm-svn: 332807	2018-05-19 12:46:02 +00:00
Matt Arsenault	372d796ab1	AMDGPU: Add pass to optimize reqd_work_group_size Eliminate loads from the dispatch packet when they will have a known value. Also pattern match the code used by the library to handle partial workgroup dispatches, which isn't necessary if reqd_work_group_size is used. llvm-svn: 332771	2018-05-18 21:35:00 +00:00
Peter Collingbourne	e3f652973e	Support: Simplify endian stream interface. NFCI. Provide some free functions to reduce verbosity of endian-writing a single value, and replace the endianness template parameter with a field. Part of PR37466. Differential Revision: https://reviews.llvm.org/D47032 llvm-svn: 332757	2018-05-18 19:46:24 +00:00
Konstantin Zhuravlyov	caa8251971	AMDGPU/NFC: Set symbol's type that is coming from an argument in EmitAMDGPUSymbolType, instead of hard-coding it to STT_AMDGPU_HSA_KERNEL. llvm-svn: 332753	2018-05-18 18:41:37 +00:00
Peter Collingbourne	f7b81db715	MC: Change the streamer ctors to take an object writer instead of a stream. NFCI. The idea is that a client that wants split dwarf would create a specific kind of object writer that creates two files, and use it to create the streamer. Part of PR37466. Differential Revision: https://reviews.llvm.org/D47050 llvm-svn: 332749	2018-05-18 18:26:45 +00:00
Changpeng Fang	860d460063	AMDGPU/SI: Don't promote alloca to vector for atomic load/store Summary: Don't promote alloca to vector for atomic load/store Reviewer: arsenm Differential Revision: https://reviews.llvm.org/D46085 llvm-svn: 332673	2018-05-17 21:49:44 +00:00
Changpeng Fang	391bcf8893	AMDGPU/SI: Handle infinite loop for the structurizer to work with CFG with infinite loops. Summary: The current StructurizeCFG pass only works for CFG with one exit. AMDGPUUnifyDivergentExitNodes combines multiple "return" blocks and/or "unreachable" blocks to one exit block for the Structurizer to work. However, infinite loop is another kind of special "exit", and if we don't handle it, the case of multiple exits will prevent the structurizer from working. In this work, for each infinite loop, we add a dummy edge to the "return" block, and thus the AMDGPUUnifyDivergentExitNodes pass will work with infinite loops. This will make CFG with infinite loops be structurized. Reviewer: nhaehnle Differential Revision: https://reviews.llvm.org/D46340 llvm-svn: 332625	2018-05-17 16:45:01 +00:00
Konstantin Zhuravlyov	c72ece6c2c	AMDGPU : Recalculate SGPRs when trap handler is supported Differential Revision: https://reviews.llvm.org/D29911 llvm-svn: 332523	2018-05-16 20:47:48 +00:00
Tony Tye	43259df44a	[AMDGPU] Change llvm.debugtrap to be a debug breakpoint that can resume execution. No longer require the queue pointer to be passed in in fixed SGPRs. Differential Revision: https://reviews.llvm.org/D46769 llvm-svn: 332485	2018-05-16 16:19:34 +00:00
Matt Arsenault	67a9815a5c	AMDGPU: Custom lower v4i16/v4f16 vector operations Avoids stack access. Also handle extract hi elt pattern from truncate + shift to avoid a couple test regressions. llvm-svn: 332453	2018-05-16 11:47:30 +00:00
Stanislav Mekhanoshin	57d341c27a	[AMDGPU] Fix handling of void types in isLegalAddressingMode It is legal for the type passed to isLegalAddressingMode to be unsized or, more specifically, VoidTy. In this case, we must check the legality of load / stores for all legal types. Directly trying to call getTypeStoreSize is incorrect, and leads to breakage in e.g. Loop Strength Reduction. This change guards against that behaviour. Differential Revision: https://reviews.llvm.org/D40405 llvm-svn: 332409	2018-05-15 22:07:51 +00:00
Konstantin Zhuravlyov	f13c9969fc	AMDGPU: Fix v_dot{4, 8}* instruction encoding Differential Revision: https://reviews.llvm.org/D46848 llvm-svn: 332387	2018-05-15 19:32:47 +00:00
Tom Stellard	e182b28ae4	AMDGPU/GlobalISel: Implement select() for G_FCONSTANT Summary: Also clean up G_CONSTANT selection. Reviewers: arsenm, nhaehnle Subscribers: kzhuravl, wdng, yaxunl, rovka, kristof.beyls, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D46170 llvm-svn: 332379	2018-05-15 17:57:09 +00:00
Konstantin Zhuravlyov	603a43fcd5	AMDGPU: Add disasm tests for deep learning instructions + fix v_fmac_f32 disasm Differential Revision: https://reviews.llvm.org/D46853 llvm-svn: 332377	2018-05-15 17:39:13 +00:00
Nicola Zaghen	d34e60ca85	Rename DEBUG macro to LLVM_DEBUG. The DEBUG() macro is very generic so it might clash with other projects. The renaming was done as follows: - git grep -l 'DEBUG' \| xargs sed -i 's/\bDEBUG\s\?(/LLVM_DEBUG(/g' - git diff -U0 master \| ../clang/tools/clang-format/clang-format-diff.py -i -p1 -style LLVM - Manual change to APInt - Manually chage DOCS as regex doesn't match it. In the transition period the DEBUG() macro is still present and aliased to the LLVM_DEBUG() one. Differential Revision: https://reviews.llvm.org/D43624 llvm-svn: 332240	2018-05-14 12:53:11 +00:00
Matt Arsenault	432aaea63f	AMDGPU: Rename OpenCL lowering pass to be R600 specific. This pass is a) broken. b) r600 specific. Fixing (a) is a bit more non-trivial, but fixing (b) is easy. Move this pass to being R600 only for now. This pass does pass all the unit tests, however clang no longer generates code that looks like the unit test input, so fixing the pass requires fixing the tests and the pass as one, and checking it works with clang still. Patch by Dave Airlie llvm-svn: 332196	2018-05-13 10:04:48 +00:00
Matt Arsenault	dfb88dfe30	AMDGPU: Make undef legal for v2i16/v2f16 This is apparently necessary to stop undef from being turned into a build_vector of 0s. llvm-svn: 332195	2018-05-13 10:04:38 +00:00
Stanislav Mekhanoshin	7012c246c1	[AMDGPU] Fix amdgpu-waves-per-eu accounting in scheduler We cannot query this attribute from a subtarget given a machine function. At this point attribute itself is already unavailable and can only be obtained through MFI. Differential Revision: https://reviews.llvm.org/D46781 llvm-svn: 332166	2018-05-12 01:41:56 +00:00
Tom Stellard	655fdd3f82	AMDGPU/GlobalISel: Implement select() for >32-bit G_STORE Reviewers: arsenm, nhaehnle Subscribers: kzhuravl, wdng, yaxunl, rovka, kristof.beyls, dstuttard, tpr, llvm-commits, t-tye Differential Revision: https://reviews.llvm.org/D46153 llvm-svn: 332154	2018-05-11 23:12:49 +00:00
Changpeng Fang	f094885a9e	AMDGPU/SI: Don't promote alloca to vector for AddrSpaceCast instruction. Summary: We have no logic to promote alloca to vector for an AddrSpaceCast instruction. Reviewer: arsenm Differential Revision: https://reviews.llvm.org/D45993 llvm-svn: 332147	2018-05-11 22:17:57 +00:00
Yaxun Liu	deba150c27	[AMDGPU] Fix compilation failure when IR contains comdat Remove a useless SwitchSection which also causes compilation failure when IR contains comdat. The SwitchSection is useless because the current section is already correct text section for the function therefore no need to switch. It causes compilation failure for comdat because functions with comdat has specific text section, not the default .text section. Since HIP uses comdat, this bug caused failures for HIP. Differential Revision: https://reviews.llvm.org/D46770 llvm-svn: 332137	2018-05-11 20:40:14 +00:00
Tom Stellard	dcc95e9385	AMDGPU/GlobalISel: Implement select() for 32-bit G_FPTOUI Reviewers: arsenm, nhaehnle Subscribers: kzhuravl, wdng, yaxunl, rovka, kristof.beyls, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D45883 llvm-svn: 332082	2018-05-11 05:44:16 +00:00
Tom Stellard	1e0edad4bb	AMDGPU/GlobalISel: Implement select() for G_BITCAST s32 <--> <2 x s16> Reviewers: arsenm, nhaehnle Subscribers: kzhuravl, wdng, yaxunl, rovka, kristof.beyls, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D45881 llvm-svn: 332042	2018-05-10 21:20:10 +00:00
Tom Stellard	1dc90204bf	AMDGPU/GlobalISel: Enable TableGen'd instruction selector Reviewers: arsenm, nhaehnle Reviewed By: arsenm Subscribers: kzhuravl, wdng, mgorny, yaxunl, rovka, kristof.beyls, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D45994 llvm-svn: 332039	2018-05-10 20:53:06 +00:00
Farhana Aleen	e24f3ff8de	[AMDGPU] Support horizontal vectorization of min/max. Author: FarhanaAleen Reviewed By: rampitec Subscribers: AMDGPU Differential Revision: https://reviews.llvm.org/D46604 llvm-svn: 331920	2018-05-09 21:18:34 +00:00
Matt Arsenault	eac81b2448	AMDGPU: Ignore any_extend in mul24 combine If a multiply is truncated, SimplifyDemandedBits sometimes turns a zero_extend of the inputs into an any_extend, which makes the known bits computation unhelpful. Ignore these and compute known bits for the underlying value, since we insert the correct extend type after. llvm-svn: 331919	2018-05-09 21:11:35 +00:00
Matt Arsenault	74fd7600d2	AMDGPU: Handle partial shift reduction for variable shifts If the variable shift amount has known bits, we can still reduce the shift. llvm-svn: 331917	2018-05-09 20:52:54 +00:00
Matt Arsenault	b143d9a5ea	AMDGPU: Partially shrink 64-bit shifts if reduced to 16-bit This is an extension of an existing combine to reduce wider shls if the result fits in the final result type. This introduces the same combine, but reduces the shift to a middle sized type to avoid the slow 64-bit shift. llvm-svn: 331916	2018-05-09 20:52:43 +00:00
Matt Arsenault	762d498808	AMDGPU: Add combine for trunc of bitcast from build_vector If the truncate is only accessing the first element of the vector, we can use the original source value. This helps with some combine ordering issues after operations are lowered to integer operations between bitcasts of build_vector. In particular it stops unnecessarily materializing the unused top half of a vector in some cases. llvm-svn: 331909	2018-05-09 18:37:39 +00:00
Matt Arsenault	378f86998c	AMDGPU: Stop special casing constant indexes of extract_vector_elt The same result folds out of the dynamic expansion logic if the index is constant. llvm-svn: 331906	2018-05-09 18:29:26 +00:00
Shiva Chen	801bf7ebbe	[DebugInfo] Examine all uses of isDebugValue() for debug instructions. Because we create a new kind of debug instruction, DBG_LABEL, we need to check all passes which use isDebugValue() to check MachineInstr is debug instruction or not. When expelling debug instructions, we should expel both DBG_VALUE and DBG_LABEL. So, I create a new function, isDebugInstr(), in MachineInstr to check whether the MachineInstr is debug instruction or not. This patch has no new test case. I have run regression test and there is no difference in regression test. Differential Revision: https://reviews.llvm.org/D45342 Patch by Hsiangkai Wang. llvm-svn: 331844	2018-05-09 02:42:00 +00:00
Tim Renouf	64afc2d7f0	[AMDGPU] Provide machine -> name mapping Summary: AMDGPU stores a numerical code for the particular GPU variant in EFlags in the ELF file. This commit provides a mapping from that number into the machine name for use by objdump-type tools. Change-Id: Id37fc0bebad443bd89c0080985ce298c4e7e9319 Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D46587 llvm-svn: 331798	2018-05-08 18:53:04 +00:00
Matt Arsenault	869cbedc81	AMDGPU: Fix broken dynamic vector indexing for packed types The intention of this was to multiply by 16, not shift by 16. llvm-svn: 331793	2018-05-08 18:43:25 +00:00
Changpeng Fang	d049da3740	AMDGPU: Use eraseFromParent to delete am instruction when it is no longer needed. Reviewer: Nicolai Differential Revision: https://reviews.llvm.org/D46438 llvm-svn: 331788	2018-05-08 18:32:35 +00:00
Stanislav Mekhanoshin	432936161e	[AMDGPU] Added checks for dpp_ctrl value - Report error for invalid dpp_ctrl values. - Changed the way it is reported, now the error will be emitted into asm and will work with release build as well. - Added dpp_ctrl value verifier for codegen. - Added symbolic constants for dpp_ctrl. Differential Revision: https://reviews.llvm.org/D46565 llvm-svn: 331775	2018-05-08 16:53:02 +00:00
Tom Stellard	37444285f1	AMDGPU/GlobalISel: Don't try to lower hull shaders Summary: The AMDGPU_HS calling convention is not supported yet. Reviewers: arsenm, nhaehnle Subscribers: kzhuravl, wdng, yaxunl, rovka, kristof.beyls, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D46149 llvm-svn: 331691	2018-05-07 22:17:54 +00:00
Mark Searles	4a0f2c5047	[AMDGPU][Waitcnt] Remove the old waitcnt pass Remove the old waitcnt pass ( si-insert-waits ), which is no longer maintained and getting crufty Differential Revision: https://reviews.llvm.org/D46448 llvm-svn: 331641	2018-05-07 14:43:28 +00:00
Tim Renouf	18a1e9d03a	[AMDGPU] Don't force WQM for DS op Summary: Previously, all DS ops forced WQM in a pixel shader. That was a hack to allow for graphics frontends using ds_swizzle to implement explicit derivatives, on SI/CI at least where DPP is not available. But it forced WQM for _any_ DS op. With this commit, DS ops no longer force WQM. Both graphics frontends (Mesa and LLPC) need to change to issue an explicit llvm.amdgcn.wqm intrinsic call when calculating explicit derivatives. The required Mesa change is: "amd/common: use llvm.amdgcn.wqm for explicit derivatives". Subscribers: qcolombet, arsenm, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D46051 Change-Id: I9b745b626fa91bbd66456e6cf41ee07eeea42f81 llvm-svn: 331633	2018-05-07 13:21:26 +00:00
Konstantin Zhuravlyov	91a74f53db	AMDGPU/NFC: Update D16PreservesUnusedBits description based Tony Tye's comments llvm-svn: 331564	2018-05-04 22:53:55 +00:00
Konstantin Zhuravlyov	3fc4067ac4	AMDGPU/NFC: Fix formatting for 900, 902 ISA Version features llvm-svn: 331553	2018-05-04 20:21:31 +00:00
Konstantin Zhuravlyov	c2c2eb7d01	AMDGPU: Add D16 instructions preserve unused bits feature - Predicate D16 patterns on this new feature - Added this new feature to gfx900/2/4 Differential Revision: https://reviews.llvm.org/D46366 llvm-svn: 331551	2018-05-04 20:06:57 +00:00
Michael Berg	7acc81b744	Fast Math Flag mapping into SDNode Summary: Adding support for Fast flags in the SDNode to leverage fast math sub flag usage. Reviewers: spatel, arsenm, jbhateja, hfinkel, escha, qcolombet, echristo, wristow, javed.absar Reviewed By: spatel Subscribers: llvm-commits, rampitec, nhaehnle, tstellar, FarhanaAleen, nemanjai, javed.absar, jbhateja, hfinkel, wdng Differential Revision: https://reviews.llvm.org/D45710 llvm-svn: 331547	2018-05-04 18:48:20 +00:00
Tom Stellard	b03c98d1a3	AMDGPU: Make getSubRegFromChannel a static member of AMDGPURegisterInfo Summary: This makes is possible to have R600RegisterInfo and SIRegisterInfo not inherit from AMDGPURegisterInfo. Reviewers: arsenm, nhaehnle Reviewed By: arsenm Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, llvm-commits, t-tye Differential Revision: https://reviews.llvm.org/D46280 llvm-svn: 331490	2018-05-03 22:38:06 +00:00
Piotr Padlewski	5dde809404	Rename invariant.group.barrier to launder.invariant.group Summary: This is one of the initial commit of "RFC: Devirtualization v2" proposal: https://docs.google.com/document/d/16GVtCpzK8sIHNc2qZz6RN8amICNBtvjWUod2SujZVEo/edit?usp=sharing Reviewers: rsmith, amharc, kuhar, sanjoy Subscribers: arsenm, nhaehnle, javed.absar, hiraditya, llvm-commits Differential Revision: https://reviews.llvm.org/D45111 llvm-svn: 331448	2018-05-03 11:03:01 +00:00
Farhana Aleen	07e612340f	[AMDGPU] A trivial fix for a buildbot failure caused by "commit 224a839fcbbead221f872cd32a1dd0c308d37299". Author: FarhanaAleen llvm-svn: 331383	2018-05-02 18:16:39 +00:00
Farhana Aleen	150cb6d91a	Revert "[AMDGPU] performAddCombine should run after DAG is legalized." This reverts commit 6b97d2995566b4dddd6bf0d75579ff44501d4494. llvm-svn: 331371	2018-05-02 16:48:52 +00:00
Farhana Aleen	2f4100f56e	[AMDGPU] performAddCombine should run after DAG is legalized. Summary: performAddCombine should run after DAG is legalized; Otherwise generic optimization in the DAGCombiner can optimize an addcarry+trunc into an addcarry instruction with illegal types. Author: FarhanaAleen Reviewed By: rampitec Subscribers: llvm-commits, AMDGPU Differential Revision: https://reviews.llvm.org/D46337 llvm-svn: 331368	2018-05-02 16:24:10 +00:00
Farhana Aleen	e2dfe8a853	[AMDGPU] Support horizontal vectorization. Author: FarhanaAleen Reviewed By: rampitec, arsenm Subscribers: llvm-commits, AMDGPU Differential Revision: https://reviews.llvm.org/D46213 llvm-svn: 331313	2018-05-01 21:41:12 +00:00
Konstantin Zhuravlyov	1501af4846	AMDGPU: Remove remnants of gfx901 (it was deprecated some time ago) llvm-svn: 331298	2018-05-01 18:47:48 +00:00
Adrian Prantl	4dfcc4a788	Remove @brief commands from doxygen comments, too. This is a follow-up to r331272. We've been running doxygen with the autobrief option for a couple of years now. This makes the \brief markers into our comments redundant. Since they are a visual distraction and we don't want to encourage more \brief markers in new code either, this patch removes them all. Patch produced by for i in $(git grep -l '\@brief'); do perl -pi -e 's/\@brief //g' $i & done https://reviews.llvm.org/D46290 llvm-svn: 331275	2018-05-01 16:10:38 +00:00
Adrian Prantl	5f8f34e459	Remove \brief commands from doxygen comments. We've been running doxygen with the autobrief option for a couple of years now. This makes the \brief markers into our comments redundant. Since they are a visual distraction and we don't want to encourage more \brief markers in new code either, this patch removes them all. Patch produced by for i in $(git grep -l '\\brief'); do perl -pi -e 's/\\brief //g' $i & done Differential Revision: https://reviews.llvm.org/D46290 llvm-svn: 331272	2018-05-01 15:54:18 +00:00
Matt Arsenault	0084adc516	AMDGPU: Add Vega12 and Vega20 Changes by Matt Arsenault Konstantin Zhuravlyov llvm-svn: 331215	2018-04-30 19:08:16 +00:00
Tom Stellard	add59c052d	AMDGPU: Remove some dead code llvm-svn: 331196	2018-04-30 16:28:02 +00:00
Tom Stellard	6c81418a63	AMDGPU/GlobalISel: Don't try to lower geometry shaders Summary: The AMDGPU_GS calling convention is not supported yet. Reviewers: arsenm, nhaehnle Reviewed By: nhaehnle Subscribers: kzhuravl, wdng, yaxunl, rovka, kristof.beyls, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D46041 llvm-svn: 331186	2018-04-30 15:15:23 +00:00
Nico Weber	432a38838d	IWYU for llvm-config.h in llvm, additions. See r331124 for how I made a list of files missing the include. I then ran this Python script: for f in open('filelist.txt'): f = f.strip() fl = open(f).readlines() found = False for i in xrange(len(fl)): p = '#include "llvm/' if not fl[i].startswith(p): continue if fl[i][len(p):] > 'Config': fl.insert(i, '#include "llvm/Config/llvm-config.h"\n') found = True break if not found: print 'not found', f else: open(f, 'w').write(''.join(fl)) and then looked through everything with `svn diff \| diffstat -l \| xargs -n 1000 gvim -p` and tried to fix include ordering and whatnot. No intended behavior change. llvm-svn: 331184	2018-04-30 14:59:11 +00:00
Matt Arsenault	540512c297	DAG: Fix not legalizing vector fcanonicalizes If an fcanoncialize was done on a vector type that was legal, llvm-svn: 330981	2018-04-26 19:21:37 +00:00
Matt Arsenault	fcc5ba46b7	AMDGPU: Extend extract_vector_elt fneg combine to fabs Fixes a regression in a future commit. llvm-svn: 330980	2018-04-26 19:21:32 +00:00
Matt Arsenault	8474803c7c	AMDGPU: Consolidate SubtargetPredicate definitions llvm-svn: 330979	2018-04-26 19:21:26 +00:00
Mark Searles	2a19af6e17	[AMDGPU][Waitcnt] As of gfx7, VMEM operations do not increment the export counter and the input registers are available in the next instruction; update the waitcnt pass to take this into account. Differential Revision: https://reviews.llvm.org/D46067 llvm-svn: 330954	2018-04-26 16:11:19 +00:00
Tom Stellard	dce46fa1cf	AMDGPU/R600: Move int_r600_store_stream_output to the public intrinsic file Summary: The TableGen'd GlobalISel instruction selector assumes all intrinsics are in the public Intrinsic:: namespace. Reviewers: jvesely, nhaehnle Reviewed By: jvesely, nhaehnle Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D45989 llvm-svn: 330866	2018-04-25 20:02:53 +00:00
Mark Searles	ec58183e1b	[AMDGPU] Waitcnt pass: add debug options - Add "amdgpu-waitcnt-forcezero" to force all waitcnt instrs to be emitted as s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) - Add debug counters to control force emit of s_waitcnt instrs; debug counters: si-insert-waitcnts-forceexp: force emit s_waitcnt expcnt(0) instrs si-insert-waitcnts-forcevm: force emit s_waitcnt lgkmcnt(0) instrs si-insert-waitcnts-forcelgkm: force emit s_waitcnt vmcnt(0) instrs - Add some debug statements Note that a variant of this patch was previously committed/reverted. Differential Revision: https://reviews.llvm.org/D45888 llvm-svn: 330862	2018-04-25 19:21:26 +00:00
Alexander Timofeev	b934728cd2	[AMDGPU] Revert b0efc4fd6 (https://reviews.llvm.org/D40556 ) llvm-svn: 330818	2018-04-25 12:32:46 +00:00
Tom Stellard	a2be8f4c35	AMDGPU: Remove deprecated llvm.AMDGPU.kilp intrinsic Summary: This is no longer used by mesa since its 18.0.0 release. Reviewers: nhaehnle Reviewed By: nhaehnle Subscribers: arsenm, kzhuravl, wdng, yaxunl, dstuttard, tpr, llvm-commits, t-tye Differential Revision: https://reviews.llvm.org/D45988 llvm-svn: 330775	2018-04-24 21:37:57 +00:00
Tom Stellard	257882ff72	AMDGPU/GlobalISel: Fall-back to SelectionDAG for non-void functions Reviewers: arsenm, nhaehnle Reviewed By: nhaehnle Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, rovka, kristof.beyls, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D45843 llvm-svn: 330774	2018-04-24 21:29:36 +00:00
Tom Stellard	c7709e1c29	AMDGPU/GlobalISel: Add support for amdgpu_ps calling convention Reviewers: arsenm Reviewed By: arsenm Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, rovka, kristof.beyls, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D45837 llvm-svn: 330767	2018-04-24 20:51:28 +00:00
Stanislav Mekhanoshin	a4bfb3c446	[AMDGPU] Truncate packed inline constant If a packed inline constant is sign extended it must be truncated after the shift. I.e. a constant (0xH0000, 0xHBC00), will be represented as 0xFFFFFFFFBC000000 in the IR because the immediate is sign extended to 64 bit. After the value shifted right by 16 to use it in a low part with op_sel_hi it becomes 0xFFFFFFFFBC00 and does not qualify as inline constant any longer. Fixed the error and added verification code. Without the fix and with the verification bug is causing pk_max_f16_literal.ll to fail. Differential Revision: https://reviews.llvm.org/D45987 llvm-svn: 330752	2018-04-24 18:17:55 +00:00
Mark Searles	70901b9047	[AMDGPU][Waitcnt] NFC. Cleanup some code/naming consistency: - s/SWaitcnt/Waitcnt s/WaitCnt/Waitcnt llvm-svn: 330730	2018-04-24 15:59:59 +00:00
Matt Arsenault	b21f9592be	AMDGPU: Move a flawed assert when spilling SGPRs It's possible to validly spill the frame offset register in a call sequence to a VGPR. There are definitely issues with SGPR spilling to memory, so move the assert later. llvm-svn: 330612	2018-04-23 16:13:30 +00:00
Matt Arsenault	adc59d7076	AMDGPU: Assign enum name to stack ID Also assert that it is correct for SGPRs. There is currently a bug where stack slot coloring replaces SGPR spill FIs with one with the default ID, which results in a more confusing assert later about a dead object. llvm-svn: 330607	2018-04-23 15:51:26 +00:00
Nicolai Haehnle	cbebba4917	AMDGPU: Fix SDWA peephole for V_AND_B32 Summary: Found by inspection. We care about the operand that doesn't contain the immediate. I believe this is currently not hit because we fold 0xff / 0xffff immediates only later. Change-Id: Ic3cf8538bc7da5eff3200d96eccf9d339e6345a7 Reviewers: arsenm, rampitec Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D45886 llvm-svn: 330586	2018-04-23 13:06:03 +00:00
Nicolai Haehnle	5a995664f0	AMDGPU: Fix a corner case crash in SIOptimizeExecMasking Summary: See the new test case; this is really unlikely to happen with real code, but I ran into this while attempting to bugpoint-reduce a different issue. Change-Id: I9ade1dc1aa8fd9c4d9fc83661d7b80e310b5c4a6 Reviewers: arsenm, rampitec Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D45885 llvm-svn: 330585	2018-04-23 13:05:50 +00:00
Nico Weber	5d53aed419	Consistently sort add_subdirectory calls in lib/Target/*/CMakeLists.txt llvm-svn: 330584	2018-04-23 12:49:34 +00:00
Nicolai Haehnle	7a87977fb2	AMDGPU: Legalize the operand of SI_INIT_M0 Summary: This fixes a case where the argument to a sendmsg intrinsic ends up in a VGPR, for whatever reason. The underlying performance issue is that a multiplication that can be an s_mul_i32 is instead needlessly generated as v_mul_u32_u24, but this is not addressed by this patch. Change-Id: I61fd4034314d5acdf6074632c30b65364dfa7328 Reviewers: arsenm, rampitec Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D45826 llvm-svn: 330393	2018-04-20 07:14:25 +00:00
Stanislav Mekhanoshin	160f85794d	[AMDGPU] Use packed literals with zero either lower or hi part Differential Revision: https://reviews.llvm.org/D45790 llvm-svn: 330365	2018-04-19 21:16:50 +00:00
Mark Searles	1bc6e71f32	[AMDGPU] Do not only rely on BB number when finding bottom loop We should also check that the "bottom" basic block of a loopis a successor of the "header" basic block, otherwise we don't propagate the information correctly when the CFG is complex. This fixes an important rendering problem with Wolfsentein 2, because of one vector-memory wait was missing. Differential Revision: https://reviews.llvm.org/D43831 llvm-svn: 330337	2018-04-19 15:42:30 +00:00
David Stuttard	31f482c26b	[AMDGPU] Fix issues for backend divergence tracking Summary: A change to use divergence analysis in the AMDGPU backend was getting formal arguments incorrect (not tagged as divergent) unless they were VGPR0, VGPR1 or VGPR2 For graphics shaders it is possible to have more than these passed in as VGPR Modified the checking code to check for any VGPR registers passed in as formal arguments. Also, some intrinsics that are sources of divergence may have been lowered during instruction selection and are missed on subsequent calls to isSDNodeSourceOfDivergence - added the relevant AMDGPUISD checks as well. Finally, the FunctionLoweringInfo tracks virtual registers that are live across basic block boundaries. This is used to check for divergence of CopyFromRegister registers using the DivergenceAnalysis analysis. For multiple blocks the lazily evaluated inverted map VirtReg2Value was not cleared when the ValueMap map was. Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D45372 Change-Id: I112f3bd6dfe0f62e63ce9b43b893982778e4bee3 llvm-svn: 330257	2018-04-18 13:53:31 +00:00
Stanislav Mekhanoshin	8b20b7dc2b	[AMDGPU] Enabled v2.16 literals for VOP3P Literal encoding needs op_sel_hi to select low 16 bit in this case. Differential Revision: https://reviews.llvm.org/D45745 llvm-svn: 330230	2018-04-17 23:09:05 +00:00
Dmitry Preobrazhensky	4c45e6ff0e	[AMDGPU][MC][VI][GFX9] Added support of SDWA/DPP for v_cndmask_b32 See bug 36356: https://bugs.llvm.org/show_bug.cgi?id=36356 Differential Revision: https://reviews.llvm.org/D45446 Reviewers: artem.tamazov, arsenm, timcorringham llvm-svn: 330123	2018-04-16 12:41:38 +00:00
Hiroshi Inoue	ae17900997	[NFC] fix trivial typos in document and comments "not not" -> "not" etc llvm-svn: 330083	2018-04-14 08:59:00 +00:00
Hiroshi Inoue	372ffa15cb	[NFC] fix trivial typos in comments "the the" -> "the", "we we" -> "we", etc llvm-svn: 330006	2018-04-13 11:37:06 +00:00
Jonas Paulsson	e8f1ac7063	[MachineScheduler] NFC refactoring This patch makes tryCandidate() virtual and some utility functions like tryLess(), tryGreater(), ... externally available (used to be static). This makes it possible for a target to derive a new MachineSchedStrategy from GenericScheduler and reuse most parts. It was necessary to wrap functions with the same names in AMDGPU/SIMachineScheduler in a local namespace. Review: Andy Trick, Florian Hahn https://reviews.llvm.org/D43329 llvm-svn: 329884	2018-04-12 07:21:39 +00:00
Tim Renouf	fd8d4af3bc	[AMDGPU] Ensure there are enough registers for wave dispatch Summary: This fixes the number of SGPRs and VGPRs in the *_RSRC1 register to allow for registers set up in wave dispatch, even if those registers are not used in the shader. Re-landed after noticing that the buildbot failure from 329808 seemed to be unrelated. Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D45503 Change-Id: I6575f0e0d2a528d1319d0b289f0ebe4510fa5771 llvm-svn: 329826	2018-04-11 17:18:36 +00:00
Yaxun Liu	9381ae9791	[AMDGPU] Fix lowering enqueue_kernel Two issues were fixed: runtime has difficulty to allocate memory for an external symbol of a kernel and set the address of the external symbol, therefore make the runtime handle of an enqueued kernel an ordinary global variable. Runtime only needs to store the address of the loaded kernel to the handle and has verified that this approach works. handle the situation where __enqueue_kernel* gets inlined therefore the enqueued kernel may be used through a constant expr instead of an instruction. Differential Revision: https://reviews.llvm.org/D45187 llvm-svn: 329815	2018-04-11 14:46:15 +00:00
Tim Renouf	8ca33bfcf3	Revert "[AMDGPU] Ensure there are enough registers for wave dispatch" This reverts 329808. That change caused a report of a failure in test/CodeGen/MIR/AMDGPU/mir-canon-multi.mir that I didn't see. I suspect it is an expensive-check-only error. Change-Id: I8133f26f15e7d5ec2b09c687c12cd70e918461b0 llvm-svn: 329811	2018-04-11 14:27:41 +00:00
Tim Renouf	f26b723491	[AMDGPU] Ensure there are enough registers for wave dispatch Summary: This fixes the number of SGPRs and VGPRs in the *_RSRC1 register to allow for registers set up in wave dispatch, even if those registers are not used in the shader. Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D45503 Change-Id: I6575f0e0d2a528d1319d0b289f0ebe4510fa5771 llvm-svn: 329808	2018-04-11 14:02:41 +00:00
Dmitry Preobrazhensky	fc715551a3	[AMDGPU][MC][GFX9] Added v_screen_partition_4se_b32 See bug 36845: https://bugs.llvm.org/show_bug.cgi?id=36845 Differential Revision: https://reviews.llvm.org/D45443 Reviewers: artem.tamazov, arsenm, timcorringham llvm-svn: 329801	2018-04-11 13:13:30 +00:00
Marek Olsak	a9a58fa236	AMDGPU: enable 128-bit for local addr space under an option Author: Samuel Pitoiset ds_read_b128 and ds_write_b128 have been recently enabled under the amdgpu-ds128 option because the performance benefit is unclear. Though, using 128-bit loads/stores for the local address space appears to introduce regressions in tessellation shaders. Not sure what is broken, but as ds_read_b128/ds_write_b128 are not enabled by default, just introduce a global option and enable 128-bit only if requested (until it's fixed/used correctly). v2: - fix regressions in merge-stores.ll and multiple_tails.ll Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=105464 llvm-svn: 329764	2018-04-10 22:48:23 +00:00
Nicolai Haehnle	b1c3b22b4c	AMDGPU/MC: Allow disassembling without symbol info Summary: We would like the UMR debugging tool[0] to be able to provide disassembly for currently live waves based on plain memory dumps, and we want to leverage the LLVM disassembler for this. This mostly works, except that UMR clearly can't provide real symbol info, so it wants to set DisInfo == nullptr. [0] https://cgit.freedesktop.org/amd/umr/ Reviewers: arsenm, rampitec, artem.tamazov, dp Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D45477 Change-Id: Ibb2c5af2e66f2e100b4702fd81308e1932bc4ee6 llvm-svn: 329715	2018-04-10 15:46:43 +00:00
Tim Renouf	7190a4692a	[AMDGPU] For OS type AMDPAL, fixed scratch on compute shader Summary: For OS type AMDPAL, the scratch descriptor is loaded from offset 0 of the GIT, whose 32 bit pointer is in s0 (s8 for gfx9 merged shaders). This commit fixes that to use offset 0x10 instead of offset 0 for a compute shader, per the PAL ABI spec. V2: Ensure s0 (s8 for gfx9 merged shader) is marked live-in when loading scratch descriptor from GIT. Reviewers: kzhuravl, nhaehnle, timcorringham Subscribers: kzhuravl, wdng, yaxunl, t-tye, llvm-commits, dstuttard, nhaehnle, arsenm Differential Revision: https://reviews.llvm.org/D44468 Change-Id: I93dffa647758e37f613bb5e0dfca840d82e6d26f llvm-svn: 329690	2018-04-10 11:25:15 +00:00
Konstantin Zhuravlyov	6183065b97	AMDGPU: Remove max_scratch_backing_memory_byte_size from kernel header 1. Remove max_scratch_backing_memory_byte_size from kernel header 2. Make it a reserved field 3. Ignore it while parsing assembly for backwards compatibility 4. Bump up minor version of kernel header Differential Revision: https://reviews.llvm.org/D45452 llvm-svn: 329620	2018-04-09 20:47:22 +00:00
Alex Shlyapnikov	79f2c720b5	Revert "AMDGPU: enable 128-bit for local addr space under an option" This reverts commit r329591. It breaks various bots: http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux-fast/builds/16516 http://lab.llvm.org:8011/builders/clang-ppc64be-linux/builds/17374 http://lab.llvm.org:8011/builders/clang-ppc64le-linux/builds/15992 http://lab.llvm.org:8011/builders/clang-ppc64be-linux-lnt http://lab.llvm.org:8011/builders/clang-ppc64le-linux-lnt/builds/11251 ... llvm-svn: 329610	2018-04-09 19:47:38 +00:00
Marek Olsak	52b033b827	AMDGPU: enable 128-bit for local addr space under an option Author: Samuel Pitoiset ds_read_b128 and ds_write_b128 have been recently enabled under the amdgpu-ds128 option because the performance benefit is unclear. Though, using 128-bit loads/stores for the local address space appears to introduce regressions in tessellation shaders. Not sure what is broken, but as ds_read_b128/ds_write_b128 are not enabled by default, just introduce a global option and enable 128-bit only if requested (until it's fixed/used correctly). Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=105464 llvm-svn: 329591	2018-04-09 16:56:32 +00:00
Tom Stellard	e753c52227	AMDGPU: Initialize GlobalISel passes Summary: This fixes AMDGPU GlobalISel test failures when enabling the AMDGPU target without any other targets that use GlobalISel. Reviewers: arsenm Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, rovka, kristof.beyls, dstuttard, tpr, llvm-commits, t-tye Differential Revision: https://reviews.llvm.org/D45353 llvm-svn: 329588	2018-04-09 16:09:13 +00:00
Dmitry Preobrazhensky	2f8e146ad3	[AMDGPU][MC][GFX9] Added instructions s_mul_hi_32, s_lshl_add_u32 See bugs 36841: https://bugs.llvm.org/show_bug.cgi?id=36841 36842: https://bugs.llvm.org/show_bug.cgi?id=36842 Differential Revision: https://reviews.llvm.org/D45251 Reviewers: artem.tamazov, arsenm, timcorringham llvm-svn: 329562	2018-04-09 13:10:33 +00:00
Dmitry Preobrazhensky	ae31223ba7	[AMDGPU][MC][GFX9] Added s_call_b64 See bug 36843: https://bugs.llvm.org/show_bug.cgi?id=36843 Differential Revision: https://reviews.llvm.org/D45268 Reviewers: artem.tamazov, arsenm, timcorringham llvm-svn: 329440	2018-04-06 18:24:49 +00:00
Dmitry Preobrazhensky	306b1a0119	[AMDGPU][MC][GFX9] Added instruction s_endpgm_ordered_ps_done See bug 36844: https://bugs.llvm.org/show_bug.cgi?id=36844 Differential Revision: https://reviews.llvm.org/D45313 Reviewers: artem.tamazov, arsenm, timcorringham llvm-svn: 329430	2018-04-06 17:25:00 +00:00
Dmitry Preobrazhensky	f20aff565d	[AMDGPU][MC][GFX9] Added instructions saveexec, wrexec and bitreplicate See bug 36840: https://bugs.llvm.org/show_bug.cgi?id=36840 Differential Revision: https://reviews.llvm.org/D45250 Reviewers: artem.tamazov, arsenm, timcorringham llvm-svn: 329419	2018-04-06 16:35:11 +00:00
Dmitry Preobrazhensky	59399ae4cc	[AMDGPU][MC][VI][GFX9] Added s_atc_probe* instructions See bug 36839: https://bugs.llvm.org/show_bug.cgi?id=36839 Differential Revision: https://reviews.llvm.org/D45249 Reviewers: artem.tamazov, arsenm, timcorringham llvm-svn: 329408	2018-04-06 15:48:39 +00:00
Dmitry Preobrazhensky	4732d876ee	[AMDGPU][MC][GFX9] Added s_dcache_discard* instructions See bug 36838: https://bugs.llvm.org/show_bug.cgi?id=36838 Differential Revision: https://reviews.llvm.org/D45247 Reviewers: artem.tamazov, arsenm, timcorringham llvm-svn: 329397	2018-04-06 15:08:42 +00:00
Konstantin Zhuravlyov	c233ae8004	AMDGPU/Metadata: Always report a fixed number of hidden arguments Currently it is 6. If the "feature" was not used, report dummy hidden argument. Otherwise it does not match the kernarg size reported in the kernel header. Differential Revision: https://reviews.llvm.org/D45129 llvm-svn: 329341	2018-04-05 20:46:04 +00:00
Simon Pilgrim	1d793b8ac5	[SchedModel] Complete models shouldn't match against itineraries when they don't use them (PR35639) For schedule models that don't use itineraries, checkCompleteness still checks that an instruction has a matching itinerary instead of skipping and going straight to matching the InstRWs. That doesn't seem to match what happens in TargetSchedule.cpp This patch causes problems for a number of models that had been incorrectly flagged as complete. Differential Revision: https://reviews.llvm.org/D43235 llvm-svn: 329280	2018-04-05 13:11:36 +00:00
Dmitry Preobrazhensky	523872ea59	[AMDGPU][MC] Enabled instruction TBUFFER_LOAD_FORMAT_XYZ for SI/CI See bug 36958: https://bugs.llvm.org/show_bug.cgi?id=36958 Differential Revision: https://reviews.llvm.org/D45099 Reviewers: artem.tamazov, arsenm, timcorringham llvm-svn: 329197	2018-04-04 13:54:55 +00:00
Dmitry Preobrazhensky	a0b8cd038c	[AMDGPU][MC] Added support of 3-element addresses for MIMG instructions See bug 35999: https://bugs.llvm.org/show_bug.cgi?id=35999 Differential Revision: https://reviews.llvm.org/D45084 Reviewers: artem.tamazov, arsenm, timcorringham llvm-svn: 329187	2018-04-04 13:01:17 +00:00
Nico Weber	1cbd096914	Sort targetgen calls in lib/Target/*/CMakeLists. Makes it easier to see mistakes such as the one fixed in r329178 and makes the different target CMakeLists more consistent. Also remove some stale-looking comments from the Nios2 target cmakefile. No intended behavior change. llvm-svn: 329181	2018-04-04 12:37:44 +00:00
Nicolai Haehnle	2f5a73820c	AMDGPU: Dimension-aware image intrinsics Summary: These new image intrinsics contain the texture type as part of their name and have each component of the address/coordinate as individual parameters. This is a preparatory step for implementing the A16 feature, where coordinates are passed as half-floats or -ints, but the Z compare value and texel offsets are still full dwords, making it difficult or impossible to distinguish between A16 on or off in the old-style intrinsics. Additionally, these intrinsics pass the 'texfailpolicy' and 'cachectrl' as i32 bit fields to reduce operand clutter and allow for future extensibility. v2: - gather4 supports 2darray images - fix a bug with 1D images on SI Change-Id: I099f309e0a394082a5901ea196c3967afb867f04 Reviewers: arsenm, rampitec, b-sumner Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, llvm-commits, t-tye Differential Revision: https://reviews.llvm.org/D44939 llvm-svn: 329166	2018-04-04 10:58:54 +00:00
Nicolai Haehnle	3ffd383a15	AMDGPU: Fix copying i1 value out of loop with non-uniform exit Summary: When an i1-value is defined inside of a loop and used outside of it, we cannot simply use the SGPR bitmask from the loop's last iteration. There are also useful and correct cases of an i1-value being copied between basic blocks, e.g. when a condition is computed outside of a loop and used inside it. The concept of dominators is not sufficient to capture what is going on, so I propose the notion of "lane-dominators". Fixes a bug encountered in Nier: Automata. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103743 Change-Id: If37b969ddc71d823ab3004aeafb9ea050e45bd9a Reviewers: arsenm, rampitec Subscribers: kzhuravl, wdng, mgorny, yaxunl, dstuttard, tpr, llvm-commits, t-tye Differential Revision: https://reviews.llvm.org/D40547 llvm-svn: 329164	2018-04-04 10:57:58 +00:00
Farhana Aleen	e80aeac0f2	[AMDGPU] performMinMaxCombine should not optimize patterns of vectors to min3/max3. Summary: There are no packed instructions for min3 or max3. So, performMinMaxCombine should not optimize vectors of f16 to min3/max3. Author: FarhanaAleen Reviewed By: arsenm Subscribers: llvm-commits, AMDGPU Differential Revision: https://reviews.llvm.org/D45219 llvm-svn: 329131	2018-04-03 23:00:30 +00:00
Farhana Aleen	936947349a	Revert "MSG" This reverts commit 9a0ce889d1c39c74d69ecad5ce9c875155ae55de. This was committed by mistake. llvm-svn: 329119	2018-04-03 21:51:45 +00:00
Farhana Aleen	3ab409dc86	MSG llvm-svn: 329114	2018-04-03 21:20:39 +00:00
Dmitry Preobrazhensky	b181c7312e	[AMDGPU][MC][GFX9] Added instructions v_cvt_norm_*16_f16, v_sat_pk_u8_i16 See bug 36847: https://bugs.llvm.org/show_bug.cgi?id=36847 Differential Revision: https://reviews.llvm.org/D45097 Reviewers: artem.tamazov, arsenm, timcorringham llvm-svn: 328988	2018-04-02 17:09:20 +00:00
Dmitry Preobrazhensky	6bad04ecf5	[AMDGPU][MC][GFX9] Added s_atomic_* and s_buffer_atomic_* instructions Fixed a bug which caused Tablegen crash. See bug 36837: https://bugs.llvm.org/show_bug.cgi?id=36837 Differential Revision: https://reviews.llvm.org/D45085 Reviewers: artem.tamazov, arsenm, timcorringham llvm-svn: 328983	2018-04-02 16:10:25 +00:00
Nico Weber	f492f58182	Revert r328975, it makes TableGen assert on the bots. llvm-svn: 328978	2018-04-02 14:20:23 +00:00
Dmitry Preobrazhensky	32c450ae6a	[AMDGPU][MC][GFX9] Added s_atomic_* and s_buffer_atomic_* instructions See bug 36837: https://bugs.llvm.org/show_bug.cgi?id=36837 Differential Revision: https://reviews.llvm.org/D45085 Reviewers: artem.tamazov, arsenm, timcorringham llvm-svn: 328975	2018-04-02 13:52:23 +00:00
Nicolai Haehnle	4254d45a79	AMDGPU: Make isIntrinsicSourceOfDivergence table-driven Summary: This is in preparation for the new dimension-aware image intrinsics, which I'd rather not have to list here by hand. Change-Id: Iaa16e3a635a11283918ce0d9e1e618591b0bf6fa Reviewers: arsenm, rampitec, b-sumner Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, llvm-commits, t-tye Differential Revision: https://reviews.llvm.org/D44938 llvm-svn: 328939	2018-04-01 17:09:14 +00:00
Nicolai Haehnle	5d0d30304c	AMDGPU: Make getTgtMemIntrinsic table-driven for resource-based intrinsics Summary: Avoids having to list all intrinsics manually. This is in preparation for the new dimension-aware image intrinsics, which I'd rather not have to list here by hand. Change-Id: If7ced04998397ef68c4cb8f7de66b5050fb767e5 Reviewers: arsenm, rampitec, b-sumner Subscribers: kzhuravl, wdng, mgorny, yaxunl, dstuttard, tpr, llvm-commits, t-tye Differential Revision: https://reviews.llvm.org/D44937 llvm-svn: 328938	2018-04-01 17:09:07 +00:00
Stanislav Mekhanoshin	74e2974ac6	[AMDGPU] Fixed some instructions latencies Differential Revision: https://reviews.llvm.org/D45073 llvm-svn: 328874	2018-03-30 16:19:13 +00:00
Michael Bedy	59e5ef793c	[AMDGPU] Fix the SDWA Peephole phase to handle src for dst:UNUSED_PRESERVE. Summary: The phase attempts to transform operations that extract a portion of a value into an SDWA src operand in cases where that value is used only once. It was not prepared for this use to be the preserved portion of a value for dst:UNUSED_PRESERVE, resulting in a crash or assert. This change either rejects the illegal SDWA attempt, or in the case where dst:WORD_1 and the src_sel would be WORD_0, removes the unneeded extract instruction. Reviewers: arsenm, #amdgpu Reviewed By: arsenm, #amdgpu Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D44364 llvm-svn: 328856	2018-03-30 05:03:36 +00:00
Matt Arsenault	efd1b30436	AMDGPU: Fix build warning in release llvm-svn: 328832	2018-03-29 21:44:44 +00:00
Matt Arsenault	03ae399d50	AMDGPU: Support realigning stack While the stack access instructions don't care about alignment > 4, some transformations on the pointer calculation do make assumptions based on knowing the low bits of a pointer are 0. If a stack object ends up being accessed through its absolute address (relative to the kernel scratch wave offset), the addressing expression may depend on the stack frame being properly aligned. This was breaking in a testcase due to the add->or combine. I think some of the SP/FP handling logic is still backwards, and overly simplistic to support all of the stack features. Code which tries to modify the SP with inline asm for example or variable sized objects will probably require redoing this. llvm-svn: 328831	2018-03-29 21:30:06 +00:00
Matt Arsenault	ffb132e74b	AMDGPU: Increase default stack alignment 8 and 16-byte values are common, so increase the default alignment to avoid realigning the stack in most functions. llvm-svn: 328821	2018-03-29 20:22:04 +00:00
Matt Arsenault	6c041a3cab	AMDGPU: Fix selection error on constant loads with < 4 byte alignment llvm-svn: 328818	2018-03-29 19:59:28 +00:00
Craig Topper	2fa1436206	[IR][CodeGen] Remove dependency on EVT from IR/Function.cpp. Move EVT to CodeGen layer. Currently EVT is in the IR layer only because of Function.cpp needing a very small piece of the functionality of EVT::getEVTString(). The rest of EVT is used in codegen making CodeGen a better place for it. The previous code converted a Type* to EVT and then called getEVTString. This was only expected to handle the primitive types from Type*. Since there only a few primitive types, we can just print them as strings directly. Differential Revision: https://reviews.llvm.org/D45017 llvm-svn: 328806	2018-03-29 17:21:10 +00:00
David Blaikie	a373d18eb7	Transforms: Introduce Transforms/Utils.h rather than spreading the declarations amongst Scalar.h and IPO.h Fixes layering - Transforms/Utils shouldn't depend on including a Scalar or IPO header, because Scalar and IPO depend on Utils. llvm-svn: 328717	2018-03-28 17:44:36 +00:00
Dmitry Preobrazhensky	622bde8bc7	[AMDGPU][MC] Added ds_add_src2_f32 See bug 36833: https://bugs.llvm.org/show_bug.cgi?id=36833 Differential Revision: https://reviews.llvm.org/D44779 Reviewers: arsenm, artem.tamazov, timcorringham llvm-svn: 328713	2018-03-28 16:21:56 +00:00
Dmitry Preobrazhensky	2456ac696a	[AMDGPU][MC] Added PCK variants of image load/store instructions See bug 36834: https://bugs.llvm.org/show_bug.cgi?id=36834 Differential Revision: https://reviews.llvm.org/D44795 Reviewers: artem.tamazov, arsenm, timcorringham, nhaehnle llvm-svn: 328710	2018-03-28 15:44:16 +00:00
Dmitry Preobrazhensky	a917e88585	[AMDGPU][MC][GFX9] Added buffer_*_format_d16_hi_x See bug 36835: https://bugs.llvm.org/show_bug.cgi?id=36835 Differential Revision: https://reviews.llvm.org/D44825 Reviewers: artem.tamazov, arsenm, timcorringham llvm-svn: 328707	2018-03-28 14:53:13 +00:00
Dmitry Preobrazhensky	dd2b929ffb	[AMDGPU][MC][GFX9] Added s_scratch* instructions See bug 36836: https://bugs.llvm.org/show_bug.cgi?id=36836 Differential Revision: https://reviews.llvm.org/D44832 Reviewers: artem.tamazov, arsenm, timcorringham llvm-svn: 328704	2018-03-28 14:08:03 +00:00
Tim Renouf	cdac172e2a	Revert "[AMDGPU] For OS type AMDPAL, fixed scratch on compute shader" This reverts commit 0daf86291d3aa04d3cc280cd0ef24abdb0174981. It was causing an assert in test/CodeGen/AMDGPU/amdpal.ll only on a release-with-asserts build. I will resubmit the change when I have fixed that. Change-Id: If270594eba27a7dc4076bdeab3fa8e6bfda3288a llvm-svn: 328695	2018-03-28 11:21:07 +00:00
Matt Arsenault	bd49eccca1	AMDGPU: Really implement getFrameRegister Currently this seems to only really be used for debug info. llvm-svn: 328677	2018-03-27 23:26:59 +00:00
Tim Renouf	e4208bfa5b	[AMDGPU] For OS type AMDPAL, fixed scratch on compute shader Summary: For OS type AMDPAL, the scratch descriptor is loaded from offset 0 of the GIT, whose 32 bit pointer is in s0 (s8 for gfx9 merged shaders). This commit fixes that to use offset 0x10 instead of offset 0 for a compute shader, per the PAL ABI spec. Reviewers: kzhuravl, nhaehnle, timcorringham Subscribers: kzhuravl, wdng, yaxunl, t-tye, llvm-commits, dstuttard, nhaehnle, arsenm Differential Revision: https://reviews.llvm.org/D44468 Change-Id: I93dffa647758e37f613bb5e0dfca840d82e6d26f llvm-svn: 328673	2018-03-27 21:35:00 +00:00
Matt Arsenault	17f3338015	AMDGPU: Fix not preserving CSR VGPR if used for SGPR spills Before this was not done if the function had no calls in it. This is still a possible issue with any callable function, regardless of calls present. llvm-svn: 328659	2018-03-27 19:42:55 +00:00
Matt Arsenault	95329f8c53	AMDGPU: Set natural stack alignment in DataLayout Only 4 byte alignment is ever useful, so increasing anything beyond this may require realigning the stack. llvm-svn: 328656	2018-03-27 19:26:40 +00:00
Matt Arsenault	0a0c871f60	AMDGPU: Fix crash when MachinePointerInfo invalid The combine on a select of a load only triggers for addrspace 0, and discards the MachinePointerInfo. The conservative default needs to be used for this. llvm-svn: 328652	2018-03-27 18:39:45 +00:00
Matt Arsenault	e9f3679031	AMDGPU: Fix FP restore from being reordered with stack ops In a function, s5 is used as the frame base SGPR. If a function is calling another function, during the call sequence it is copied to a preserved SGPR and restored. Before it was possible for the scheduler to move stack operations before the restore of s5, since there's nothing to associate a frame index access with the restore. Add an implicit use of s5 to the adjcallstack pseudo which ends the call sequence to preven this from happening. I'm not 100% satisfied with this solution, but I'm not sure what else would be better. llvm-svn: 328650	2018-03-27 18:38:51 +00:00
Tim Corringham	7116e8963d	[AMDGPU] Improve disassembler error handling Summary: llvm-objdump now disassembles unrecognised opcodes as data, using the .long directive. We treat unrecognised opcodes as being 32 bit values, so move along 4 bytes rather than the single byte which previously resulted in a cascade of bogus disassembly following an unrecognised opcode. While no solution can always disassemble code that contains embedded data correctly this provides a significant improvement. The disassembler will now cope with an arbitrary length section as it no longer truncates it to a multiple of 4 bytes, and will use the .byte directive for trailing bytes. Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D44685 llvm-svn: 328553	2018-03-26 17:06:33 +00:00
Nicolai Haehnle	4f850eabb6	AMDGPU: Introduce common SOP_Pseudo and VOP_Pseudo TableGen base classes Differential revision: https://reviews.llvm.org/D44820 Change-Id: I732979e2964006aa15d78a333d8886e6855f319a llvm-svn: 328496	2018-03-26 13:56:53 +00:00
Mandeep Singh Grang	860adef9e6	[AMDGPU] Change std::sort to llvm::sort in response to r327219 Summary: r327219 added wrappers to std::sort which randomly shuffle the container before sorting. This will help in uncovering non-determinism caused due to undefined sorting order of objects having the same key. To make use of that infrastructure we need to invoke llvm::sort instead of std::sort. Reviewers: tstellar, RKSimon, arsenm Reviewed By: arsenm Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, llvm-commits, t-tye Differential Revision: https://reviews.llvm.org/D44856 llvm-svn: 328429	2018-03-24 17:15:04 +00:00
David Blaikie	36a0f226b1	Fix layering by moving ValueTypes.h from CodeGen to IR ValueTypes.h is implemented in IR already. llvm-svn: 328397	2018-03-23 23:58:31 +00:00
David Blaikie	13e77db2df	Fix layering of MachineValueType.h by moving it from CodeGen to Support This is used by llvm tblgen as well as by LLVM Targets, so the only common place is Support for now. (maybe we need another target for these sorts of things - but for now I'm at least making them correct & we can make them better if/when people have strong feelings) llvm-svn: 328395	2018-03-23 23:58:25 +00:00
David Blaikie	6054e650ff	Move TargetLoweringObjectFile from CodeGen to Target to fix layering It's implemented in Target & include from other Target headers, so the header should be in Target. llvm-svn: 328392	2018-03-23 23:58:19 +00:00
Tony Tye	7a893d4e34	[AMDGPU] Remove use of OpenCL triple environment and replace with function attribute for AMDGPU - Remove use of the opencl and amdopencl environment member of the target triple for the AMDGPU target. - Use function attribute to communicate to the AMDGPU backend to add implicit arguments for OpenCL kernels for the AMDHSA OS. Differential Revision: https://reviews.llvm.org/D43736 llvm-svn: 328349	2018-03-23 18:45:18 +00:00
David Blaikie	2be3922807	Fix a couple of layering violations in Transforms Remove #include of Transforms/Scalar.h from Transform/Utils to fix layering. Transforms depends on Transforms/Utils, not the other way around. So remove the header and the "createStripGCRelocatesPass" function declaration (& definition) that is unused and motivated this dependency. Move Transforms/Utils/Local.h into Analysis because it's used by Analysis/MemoryBuiltins.cpp. llvm-svn: 328165	2018-03-21 22:34:23 +00:00
Nirav Dave	3264c1bdf6	[DAG, X86] Revert r327197 "Revert r327170, r327171, r327172" Reland ISel cycle checking improvements after simplifying node id invariant traversal and correcting typo. llvm-svn: 327898	2018-03-19 20:19:46 +00:00
Nicolai Haehnle	4186cc7c08	TableGen: Check the dynamic type of !cast<Rec>(string) Summary: The docs already claim that this happens, but so far it hasn't. As a consequence, existing TableGen files get this wrong a lot, but luckily the fixes are all reasonably straightforward. To make this work with all the existing forms of self-references (since the true type of a record is only built up over time), the lookup of self-references in !cast is delayed until the final resolving step. Change-Id: If5923a72a252ba2fbc81a889d59775df0ef31164 Reviewers: arsenm, craig.topper, tra, MartinO Subscribers: wdng, javed.absar, llvm-commits Differential Revision: https://reviews.llvm.org/D44475 llvm-svn: 327849	2018-03-19 14:14:20 +00:00
Matt Arsenault	fed0a45036	AMDGPU/GlobalISel: RegBankSelect for basic int ops llvm-svn: 327843	2018-03-19 14:07:23 +00:00
Matt Arsenault	69932e4d69	AMDGPU: Don't leave dead illegal VGPR->SGPR copies Normally DCE kills these, but at -O0 these get left behind leaving suspicious looking illegal copies. Replace with IMPLICIT_DEF to avoid iterator issues. llvm-svn: 327842	2018-03-19 14:07:15 +00:00
Nirav Dave	5f0ab71b62	Revert "[DAG, X86] Revert r327197 "Revert r327170, r327171, r327172"" as it times out building test-suite on PPC. llvm-svn: 327778	2018-03-17 19:24:54 +00:00
Nirav Dave	982d3a56ea	[DAG, X86] Revert r327197 "Revert r327170, r327171, r327172" Reland ISel cycle checking improvements after simplifying and reducing node id invariant traversal. llvm-svn: 327777	2018-03-17 17:42:10 +00:00
Matt Arsenault	abdc4f2dc7	AMDGPU/GlobalISel: Cleanup constant legality llvm-svn: 327774	2018-03-17 15:17:48 +00:00
Matt Arsenault	685d1e8157	AMDGPU/GlobalISel: Basic G_GEP legality llvm-svn: 327773	2018-03-17 15:17:45 +00:00
Matt Arsenault	85803366d6	AMDGPU/GlobalISel: Basic legality for load/store llvm-svn: 327772	2018-03-17 15:17:41 +00:00
Farhana Aleen	c6c9dc8773	[AMDGPU] Supported ds_write_b128 generation. Summary: This is a follow-on patch of https://reviews.llvm.org/D44210 Author: FarhanaAleen Reviewed By: msearles Subscribers: llvm-commits, AMDGPU Differential Revision: https://reviews.llvm.org/D44319 llvm-svn: 327726	2018-03-16 18:12:00 +00:00
Dmitry Preobrazhensky	4c8f4234b6	[AMDGPU][MC][GFX8][GFX9][DISASSEMBLER] Added "_e32" suffix to 32-bit VINTRP opcodes See bug 36751: https://bugs.llvm.org/show_bug.cgi?id=36751 Differential Revision: https://reviews.llvm.org/D44529 Reviewers: artem.tamazov, arsenm llvm-svn: 327723	2018-03-16 16:38:04 +00:00
Dmitry Preobrazhensky	9c1a6e7e24	[AMDGPU][MC] Corrected default values for unused SDWA operands See bug 36355: https://bugs.llvm.org/show_bug.cgi?id=36355 Differential Revision: https://reviews.llvm.org/D44481 Reviewers: artem.tamazov, arsenm llvm-svn: 327720	2018-03-16 15:40:27 +00:00
Mark Searles	c3c02bde73	[AMDGPU] Waitcnt pass: Modify the waitcnt pass to propagate info in the case of a single basic block loop. mergeInputScoreBrackets() does this for us; update it so that it processes the single bb's score bracket when processing the single bb's preds. It is, after all, a pred of itself, so it's score bracket is needed. Differential Revision: https://reviews.llvm.org/D44434 llvm-svn: 327583	2018-03-14 22:04:32 +00:00
Dmitry Preobrazhensky	d98c97b4f9	[AMDGPU][MC][GFX8] Added BUFFER_STORE_LDS_DWORD Instruction See bug 36558: https://bugs.llvm.org/show_bug.cgi?id=36558 Differential Revision: https://reviews.llvm.org/D43950 Reviewers: artem.tamazov, arsenm llvm-svn: 327299	2018-03-12 17:29:24 +00:00
Yaxun Liu	a99e7d8e44	[AMDGPU] Fix lowering enqueue kernel when kernel has no name Since the enqueued kernels have internal linkage, their names may be dropped. In this case, give them unique names __amdgpu_enqueued_kernel or __amdgpu_enqueued_kernel.n where n is a sequential number starting from 1. Differential Revision: https://reviews.llvm.org/D44322 llvm-svn: 327291	2018-03-12 16:34:06 +00:00
Dmitry Preobrazhensky	da4a7c01bf	[AMDGPU][MC] Corrected GATHER4 opcodes See bug 36252: https://bugs.llvm.org/show_bug.cgi?id=36252 Differential Revision: https://reviews.llvm.org/D43874 Reviewers: artem.tamazov, arsenm llvm-svn: 327278	2018-03-12 15:03:34 +00:00
Matt Arsenault	7b9ed89dcf	AMDGPU/GlobalISel: Legality and RegBankInfo for G_{INSERT\|EXTRACT}_VECTOR_ELT llvm-svn: 327269	2018-03-12 13:35:53 +00:00
Matt Arsenault	c0aefd561e	AMDGPU/GlobalISel: InstrMapping for G_MERGE_VALUES llvm-svn: 327268	2018-03-12 13:35:49 +00:00
Matt Arsenault	503afda95f	AMDGPU/GlobalISel: Make some G_MERGE_VALUEs legal llvm-svn: 327267	2018-03-12 13:35:43 +00:00
Michael Bedy	80cf9ff564	Test commit - change comment slightly. llvm-svn: 327234	2018-03-11 03:27:50 +00:00
Matt Arsenault	cbda7ff4ae	AMDGPU: Fix crash when constant folding with physreg operand llvm-svn: 327209	2018-03-10 16:05:35 +00:00
Nirav Dave	042678bd55	Revert: r327172 "Correct load-op-store cycle detection analysis" r327171 "Improve Dependency analysis when doing multi-node Instruction Selection" r328170 "[DAG] Enforce stricter NodeId invariant during Instruction selection" Reverting patch as NodeId invariant change is causing pathological increases in compile time on PPC llvm-svn: 327197	2018-03-10 02:16:15 +00:00
Nirav Dave	071699bf82	[DAG] Enforce stricter NodeId invariant during Instruction selection Instruction Selection makes use of the topological ordering of nodes by node id (a node's operands have smaller node id than it) when doing cycle detection. During selection we may violate this property as a selection of multiple nodes may induce a use dependence (and thus a node id restriction) between two unrelated nodes. If a selected node has an unselected successor this may allow us to miss a cycle in detection an invalid selection. This patch fixes this by marking all unselected successors of a selected node have negated node id. We avoid pruning on such negative ids but still can reconstruct the original id for pruning. In-tree targets have been updated to replace DAG-level replacements with ISel-level ones which enforce this property. This preemptively fixes PR36312 before triggering commit r324359 relands Reviewers: craig.topper, bogner, jyknight Subscribers: arsenm, nhaehnle, javed.absar, llvm-commits, hiraditya Differential Revision: https://reviews.llvm.org/D43198 llvm-svn: 327170	2018-03-09 20:57:15 +00:00
Farhana Aleen	a7cb31123c	[AMDGPU] Supported ds_read_b128 generation; Widened vector length for local address-space. Summary: Starting from GCN 2nd generation, ISA supports ds_read_b128 on top of ds_read_b64. This patch supports ds_read_b128 instruction pattern and generation of this instruction. In the vectorizer, this patch also widen the vector length so that vectorizer generates 128 bit loads for local address-space which gets translated to ds_read_b128. Since the performance benefit is not clear; compiler generates ds_read_b128 under -amdgpu-ds128. Author: FarhanaAleen Reviewed By: rampitec, arsenm Subscribers: llvm-commits, AMDGPU Differential Revision: https://reviews.llvm.org/D44210 llvm-svn: 327153	2018-03-09 17:41:39 +00:00
Stanislav Mekhanoshin	c8127fc674	[AMDGPU] Fixed V_DIV_FIXUP_F16 selection on GFX9 GFX9 should select opsel version. Differential Revision: https://reviews.llvm.org/D44279 llvm-svn: 327106	2018-03-09 07:21:43 +00:00
Matt Arsenault	c3fe46bbcf	AMDGPU/GlobalISel: Pass subtarget + TM to LegalizerInfo These are the parameters x86 already uses. llvm-svn: 327020	2018-03-08 16:24:16 +00:00
Farhana Aleen	89196642f7	[AMDGPU] Increased vector length for global/constant loads. Summary: GCN ISA supports instructions that can read 16 consecutive dwords from memory through the scalar data cache; loadstoreVectorizer should take advantage of the wider vector length and pack 16/8 elements of dwords/quadwords. Author: FarhanaAleen Reviewed By: rampitec Subscribers: llvm-commits, AMDGPU Differential Revision: https://reviews.llvm.org/D44179 llvm-svn: 326910	2018-03-07 17:09:18 +00:00
Farhana Aleen	347d12b4ce	Revert "[AMDGPU] Widened vector length for global/constant address space." This reverts commit ce988cc100dc65e7c6c727aff31ceb99231cab03. llvm-svn: 326907	2018-03-07 16:55:27 +00:00
Farhana Aleen	0d03d0588d	[AMDGPU] Widened vector length for global/constant address space. llvm-svn: 326904	2018-03-07 16:29:05 +00:00
Craig Topper	80d3bb3b4b	[TargetLowering] Rename DAGCombinerInfo::isAfterLegalizeVectorOps to DAGCombiner::isAfterLegalizeDAG since that's what it checks. NFC The code checks Level == AfterLegalizeDAG which is the fourth and last of the possible DAG combine stages that we have. There is a Level called AfterLegalVectorOps, but that's the third DAG combine and it doesn't always run. A function called isAfterLegalVectorOps should imply it returns true in either of the DAG combines that runs after the legalize vector ops stage, but that's not what this function does. llvm-svn: 326832	2018-03-06 19:44:52 +00:00
Stanislav Mekhanoshin	0f72225433	[AMDGPU] Add default ISA version targets In case if -mattr used to modify feature set bits in llvm-mc call getIsaVersion can fail to identify specific ISA due to test mismatch. Adding default fallback tests which will always correctly report at least major version. Differential Revision: https://reviews.llvm.org/D44163 llvm-svn: 326825	2018-03-06 18:33:55 +00:00
Yaxun Liu	46439e8d4a	[AMDGPU] Fix lowering OpenCL enqueue_kernel One addrspacecast disappeared in clang emitted IR for block invoke function due to adoption of the new addr space mapping. Differential Revision: https://reviews.llvm.org/D43785 llvm-svn: 326806	2018-03-06 16:04:39 +00:00
Matt Arsenault	e31ab94e97	AMDGPU/GlobalISel: Add InstrMapping for G_EXTRACT llvm-svn: 326715	2018-03-05 16:25:18 +00:00
Matt Arsenault	71272e6d4e	AMDGPU/GlobalISel: Make some G_EXTRACTs legal As far as I can tell legalization of weird sizes for the output type isn't implemented. llvm-svn: 326714	2018-03-05 16:25:15 +00:00
Matt Arsenault	4cc0b85276	AMDGPU: Fix build warning about override llvm-svn: 326713	2018-03-05 16:25:10 +00:00
Alexander Timofeev	2e5eeceeb7	Pass Divergence Analysis data to Selection DAG to drive divergence dependent instruction selection. Differential revision: https://reviews.llvm.org/D35267 llvm-svn: 326703	2018-03-05 15:12:21 +00:00
Matt Arsenault	b9699c009d	AMDGPU/GlobalISel: InstrMapping for G_ZEXT llvm-svn: 326589	2018-03-02 16:55:37 +00:00
Matt Arsenault	1c1aab99ae	AMDGPU/GlobalISel: InstrMapping for G_TRUNC llvm-svn: 326588	2018-03-02 16:55:33 +00:00
Matt Arsenault	ef8db767d7	AMDGPU/GlobalISel: Define InstrMappings for G_FCMP Patch by Tom Stellard llvm-svn: 326587	2018-03-02 16:53:15 +00:00
Matt Arsenault	2607dc60de	AMDGPU/GlobalISel: Define instruction mapping for @llvm.minnum Patch by Tom Stellard llvm-svn: 326586	2018-03-02 16:40:17 +00:00
Matt Arsenault	b46c191c49	AMDGPU/GlobalISel: Define instruction mapping for @llvm.maxnum Patch by Tom Stellard llvm-svn: 326567	2018-03-02 12:23:00 +00:00
Jan Vesely	b283ea0f0f	AMDGPU/GCN: Promote i16 ctpop i16 capable ASICs do not support i16 operands for this instruction. Add tablegen pattern to merge chained i16 additions. Differential Revision: https://reviews.llvm.org/D43985 llvm-svn: 326535	2018-03-02 02:50:22 +00:00
Matt Arsenault	41d2e3d98e	AMDGPU/GlobalISel: Define instruction mapping for G_FPTOSI Patch by Tom Stellard llvm-svn: 326534	2018-03-02 02:19:16 +00:00
Matt Arsenault	b23041ad4d	AMDGPU/GlobalISel: Define instruction mapping for G_FPTOUI Patch by Tom Stellard llvm-svn: 326533	2018-03-02 02:19:11 +00:00
Matt Arsenault	327d5fb2e5	AMDGPU/GlobalISel: Define instruction mapping for G_FMUL llvm-svn: 326532	2018-03-02 02:17:01 +00:00
Matt Arsenault	5a9e834eac	AMDGPU/GlobalISel: Define instruction mapping for G_FADD Patch by Tom Stellard llvm-svn: 326526	2018-03-02 01:22:13 +00:00
Matt Arsenault	d99317f1b3	AMDGPU/GlobalISel: Define instruction mapping for G_SHL Patch by Tom Stellard llvm-svn: 326525	2018-03-02 01:22:10 +00:00
Matt Arsenault	3c7a123ccc	AMDGPU/GlobalISel: Define instruction mapping for G_XOR llvm-svn: 326524	2018-03-02 01:22:06 +00:00
Matt Arsenault	c0f34c9e36	AMDGPU/GlobalISel: Define instruction mapping for G_AND Patch by Tom Stellard llvm-svn: 326523	2018-03-02 01:22:01 +00:00
Matt Arsenault	364f12e8f9	AMDGPU/GlobalISel: Define instruction mapping for @llvm.amdgcn.cvt.pkrtz Patch by Tom Stellard llvm-svn: 326490	2018-03-01 21:25:30 +00:00
Matt Arsenault	5320ee4a05	AMDGPU/GlobalISel: Define instruction mapping for G_OR Patch by Tom Stellard llvm-svn: 326489	2018-03-01 21:25:25 +00:00
Matt Arsenault	e65404f5c5	AMDGPU/GlobalISel: Remove default register mapping This crashes for some opcodes, which prevents the SelectionDAG fallback from working. Patch by Tom Stellard llvm-svn: 326487	2018-03-01 21:20:44 +00:00
Matt Arsenault	1422a19a88	AMDGPU/GlobalISel: Use a more correct getValueMapping This was finding the wrong size registers for anything with more than 2 components. Patch by Tom Stellard llvm-svn: 326483	2018-03-01 21:08:51 +00:00
Matt Arsenault	62669ede94	AMDGPU/GlobalISel: Define instruction mapping for G_BITCAST Patch by Tom Stellard llvm-svn: 326482	2018-03-01 20:59:44 +00:00
Matt Arsenault	0529a8e2de	AMDGPU/GlobalISel: Mark i32->i64 zext as legal llvm-svn: 326481	2018-03-01 20:56:21 +00:00
Matt Arsenault	36b99e1937	AMDGPU/GlobalISel: InstrMapping for llvm.amdgcn.exp.compr Patch by Tom Stellard llvm-svn: 326479	2018-03-01 20:40:55 +00:00
Matt Arsenault	8931bbf8df	AMDGPU/GlobalISel: Define instruction mapping for @llvm.amdgcn.exp Patch by Tom Stellard llvm-svn: 326477	2018-03-01 20:24:37 +00:00
Matt Arsenault	50721ab325	AMDGPU/GlobalISel: Define InstrMappings for G_ICMP Patch by Tom Stellard llvm-svn: 326472	2018-03-01 19:27:10 +00:00
Matt Arsenault	dc14ec05d4	AMDGPU/GlobalISel: Make i32 mul legal llvm-svn: 326471	2018-03-01 19:22:05 +00:00
Matt Arsenault	06cbb27a79	AMDGPU/GlobalISel: Define instruction mapping for G_IMPLICIT_DEF Patch by Tom Stellard llvm-svn: 326470	2018-03-01 19:16:52 +00:00
Matt Arsenault	e3d9ecf2b9	AMDGPU/GlobalISel: Define instruction mapping for G_FCONSTANT Patch by Tom Stellard llvm-svn: 326468	2018-03-01 19:13:30 +00:00
Matt Arsenault	51b0b20023	AMDGPU/GlobalISel: Add copyCost for VGPR->SGPR copies Patch by Tom Stellard llvm-svn: 326467	2018-03-01 19:09:25 +00:00
Matt Arsenault	3f6a204eaa	AMDGPU/GlobalISel: Make i32 xor legal llvm-svn: 326466	2018-03-01 19:09:21 +00:00
Matt Arsenault	8e80a5fbca	AMDGPU/GlobalISel: Mark 32/64-bit G_FCMP as legal Patch by Tom Stellard llvm-svn: 326465	2018-03-01 19:09:16 +00:00
Matt Arsenault	dd022ce064	AMDGPU/GlobalISel: Mark 32-bit G_FPTOSI as legal Patch by Tom Stellard llvm-svn: 326464	2018-03-01 19:04:25 +00:00
Alexander Timofeev	0081d23fd8	[AMDGPU] : fix for the crash in SIRegisterInfo when the regiser class not found Differential revision: https://reviews.llvm.org./D43334 llvm-svn: 326451	2018-03-01 17:36:43 +00:00
Tim Renouf	2a99fa2c08	[AMDGPU] added writelane intrinsic Summary: For use by LLPC SPV_AMD_shader_ballot extension. The v_writelane instruction was already implemented for use by SGPR spilling, but I had to add an extra dummy operand tied to the destination, to represent that all lanes except the selected one keep the old value of the destination register. .ll test changes were due to schedule changes caused by that new operand. Differential Revision: https://reviews.llvm.org/D42838 llvm-svn: 326353	2018-02-28 19:10:32 +00:00
Konstantin Zhuravlyov	40b09e86b9	AMDGPU: Add fast fmaf feature to gfx702 Differential Revision: https://reviews.llvm.org/D43790 llvm-svn: 326252	2018-02-27 21:46:15 +00:00
Matt Arsenault	2a26a286db	AMDGPU/GlobalISel: Make f64 constants legal llvm-svn: 326101	2018-02-26 17:20:43 +00:00
Tim Renouf	832f90fa0c	[AMDGPU] Scratch setup fix on AMDPAL gfx9+ merge shader Summary: With OS type AMDPAL, the scratch descriptor is hardwired to be loaded from offset 0 of the global information table, whose low pointer is passed in s0. For a merge shader on gfx9+, it needs to be s8 instead, as the hardware reserves s0-s7. Reviewers: kzhuravl Subscribers: arsenm, nhaehnle, dstuttard, llvm-commits, t-tye, yaxunl, wdng, kzhuravl Differential Revision: https://reviews.llvm.org/D42203 llvm-svn: 326088	2018-02-26 14:46:43 +00:00
Stanislav Mekhanoshin	fa48c496e2	[AMDGPU] Shrinking V_SUBBREV_U32 V_SUBBREV_U32 is a commute opcode for V_SUBB_U32. However, when we try to commute V_SUBB_U32 in order to shrink it we do not then process V_SUBBREV_U32 and it stay VOP3. This is fixed. Differential Revision: https://reviews.llvm.org/D43699 llvm-svn: 326011	2018-02-24 01:32:32 +00:00
Geoff Berry	d6ba3dbbbd	Fix compiler warning introduced in r325931. NFC. llvm-svn: 325938	2018-02-23 19:11:33 +00:00
Geoff Berry	f8bf2ec0a8	[MachineOperand][Target] MachineOperand::isRenamable semantics changes Summary: Add a target option AllowRegisterRenaming that is used to opt in to post-register-allocation renaming of registers. This is set to 0 by default, which causes the hasExtraSrcRegAllocReq/hasExtraDstRegAllocReq fields of all opcodes to be set to 1, causing MachineOperand::isRenamable to always return false. Set the AllowRegisterRenaming flag to 1 for all in-tree targets that have lit tests that were effected by enabling COPY forwarding in MachineCopyPropagation (AArch64, AMDGPU, ARM, Hexagon, Mips, PowerPC, RISCV, Sparc, SystemZ and X86). Add some more comments describing the semantics of the MachineOperand::isRenamable function and how it is set and maintained. Change isRenamable to check the operand's opcode hasExtraSrcRegAllocReq/hasExtraDstRegAllocReq bit directly instead of relying on it being consistently reflected in the IsRenamable bit setting. Clear the IsRenamable bit when changing an operand's register value. Remove target code that was clearing the IsRenamable bit when changing registers/opcodes now that this is done conservatively by default. Change setting of hasExtraSrcRegAllocReq in AMDGPU target to be done in one place covering all opcodes that have constant pipe read limit restrictions. Reviewers: qcolombet, MatzeB Subscribers: aemerson, arsenm, jyknight, mcrosier, sdardis, nhaehnle, javed.absar, tpr, arichardson, kristof.beyls, kbarton, fedor.sergeev, asb, rbar, johnrusso, simoncook, jordy.potman.lists, apazos, sabuasal, niosHD, escha, nemanjai, llvm-commits Differential Revision: https://reviews.llvm.org/D43042 llvm-svn: 325931	2018-02-23 18:25:08 +00:00
Nicolai Haehnle	6cf306deca	AMDGPU: Track physreg uses in SILoadStoreOptimizer Summary: This handles def-after-use of physregs, and allows us to merge loads and stores even across some physreg defs (typically M0 defs). Change-Id: I076484b2bda27c2cf46013c845a0380c5b89b67b Reviewers: arsenm, mareko, rampitec Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D42647 llvm-svn: 325882	2018-02-23 10:45:56 +00:00
Nicolai Haehnle	40b140fef1	AMDGPU: Stop using .NAME in .td files Summary: .NAME is a bit of an odd duck, in that we should really treat it like a template argument, but we currently don't, and so when and where NAME is initialized and how is pretty inconsistent. Best to just avoid using it as a field of already instantiated records, and use cast to string instead. Change-Id: I5a0c202401cede3d5c3827ab9c7858ea48b29108 Reviewers: arsenm, rampitec Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D43551 llvm-svn: 325794	2018-02-22 15:25:11 +00:00
Hiroshi Inoue	7f9f92f8b6	[NFC] fix trivial typos in comments "a a" -> "a" llvm-svn: 325752	2018-02-22 07:48:29 +00:00
Nicolai Haehnle	770397f4cd	AMDGPU: Do not combine loads/store across physreg defs Summary: Since this pass operates on machine SSA form, this should only really affect M0 in practice. Fixes various piglit variable-indexing/vs-varying-array-mat4-index-* Change-Id: Ib2a1dc3a8d7b08225a8da49a86f533faa0986aa8 Fixes: r317751 ("AMDGPU: Merge S_BUFFER_LOAD_DWORD_IMM into x2, x4") Reviewers: arsenm, mareko, rampitec Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D40343 llvm-svn: 325677	2018-02-21 13:31:35 +00:00
Dmitry Preobrazhensky	d6e1a9404d	[AMDGPU][MC] Added lds support for MUBUF instructions See bug 28234: https://bugs.llvm.org/show_bug.cgi?id=28234 Differential Revision: https://reviews.llvm.org/D43472 Reviewers: vpykhtin, artem.tamazov, arsenm llvm-svn: 325676	2018-02-21 13:13:48 +00:00
Konstantin Zhuravlyov	5c1237a1fd	Revert "[AMDGPU] Increased vector length for global/constant loads." https://reviews.llvm.org/rL325518 It breaks following OpenCL conformance tests: - Basic - parameter_types - Basic - vload_private llvm-svn: 325643	2018-02-20 23:30:21 +00:00
Tim Renouf	8234b4893a	[AMDGPU] stop buffer_store being moved illegally Summary: The machine instruction scheduler was illegally moving a buffer store past a buffer load with the same descriptor and offset. Fixed by marking buffer ops as mayAlias and isAliased. This may be overly conservative, and we may need to revisit. Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D43332 Change-Id: Iff3173d9e0653e830474546276ab9d30318b8ef7 llvm-svn: 325567	2018-02-20 10:03:38 +00:00
Mark Searles	65207923f6	[AMDGPU] Make note of existing waitcnt instrs; this is add-on work related to suppression of redundant waitcnt instrs. It is necessary to make note of these existing waitcnt instrs so that we do not fall into an infinite loop when handling loops. Also, [NFC] some minor code clean-up. llvm-svn: 325524	2018-02-19 19:19:59 +00:00
Mark Searles	419bdab759	[AMDGPU] Increased vector length for global/constant loads. Summary: GCN ISA supports instructions that can read 16 consecutive dwords from memory through the scalar data cache; loadstoreVectorizer should take advantage of the wider vector length and pack 16/8 elements of dwords/quadwords. Author: FarhanaAleen Reviewed By: rampitec Subscribers: llvm-commits, AMDGPU Differential Revision: https://reviews.llvm.org/D43275 llvm-svn: 325518	2018-02-19 16:42:49 +00:00
Jonas Paulsson	b51a9bc358	[AMDGPU] Return true in enableMultipleCopyHints(). Enable multiple COPY hints to eliminate more COPYs during register allocation. Note that this is something all targets should do, see https://reviews.llvm.org/D38128. Review: Stanislav Mekhanoshin, Tom Stellard. llvm-svn: 325425	2018-02-17 10:00:28 +00:00
Konstantin Zhuravlyov	ef9aafcddc	AMDGPU: Remove unused private member of AMDGPUTargetELFStreamer llvm-svn: 325408	2018-02-16 23:04:11 +00:00
Eric Christopher	8ceddb0ecd	Remove an unused function. llvm-svn: 325403	2018-02-16 22:46:47 +00:00
Konstantin Zhuravlyov	9122a63143	AMDGPU: Bring elf flags in sync with the spec - Add MACH flags - Add XNACK flag - Add reserved flags - Minor cleanups in docs Differential Revision: https://reviews.llvm.org/D43356 llvm-svn: 325399	2018-02-16 22:33:59 +00:00
Konstantin Zhuravlyov	331f97e171	AMDGPU: Bring processors and features in sync with the spec - Remove gfx800 - Make iceland gfx802 - Add xnack to gfx902 Differential Revision: https://reviews.llvm.org/D43355 llvm-svn: 325393	2018-02-16 21:26:25 +00:00
Changpeng Fang	ba92059ca9	AMDGPU/SI: Extend promoting alloca to vector to arrays of up to 16 elements Summary: This patch extends the promotion of alloca to vector to the arrays of up to 16 elements. Also we introduce an option, -disable-promote-alloca-to-vector, to switch promotion to vector off, if needed. Reviewers: arsenm Differential Revision: https://reviews.llvm.org/D33559 llvm-svn: 325372	2018-02-16 19:14:17 +00:00
Changpeng Fang	da38b5fd49	AMDGPU/SI: Turn off GPR Indexing Mode immediately after the interested instruction. Summary: In the current implementation of GPR Indexing Mode when the index is of non-uniform, the s_set_gpr_idx_off instruction is incorrectly inserted after the loop. This will lead the instructions with vgpr operands (v_readfirstlane for example) to read incorrect vgpr. In this patch, we fix the issue by inserting s_set_gpr_idx_on/off immediately around the interested instruction. Reviewers: rampitec Differential Revision: https://reviews.llvm.org/D43297 llvm-svn: 325355	2018-02-16 16:31:30 +00:00
Stanislav Mekhanoshin	ff2763a658	[AMDGPU] Combine adjacent waitcounts in a single strongest wait Differential Revision: https://reviews.llvm.org/D43350 llvm-svn: 325299	2018-02-15 22:03:55 +00:00
Stanislav Mekhanoshin	c078ca92eb	[AMDGPU] Remove non-temporal flag from argument loads Kernel arguments likely read by all workitems and should not bypass cache. Fixes performance hit in sub-dword argument loads. Differential Revision: https://reviews.llvm.org/D43249 llvm-svn: 325146	2018-02-14 18:05:14 +00:00
Yaxun Liu	0124b5484c	[AMDGPU] Change constant addr space to 4 Differential Revision: https://reviews.llvm.org/D43170 llvm-svn: 325030	2018-02-13 18:00:25 +00:00
Daniel Neilson	a60f4621ae	[AMDGPUPromoteAlloca] Replace deprecated memory intrinsic APIs (NFCI) Summary: This change is part of step five in the series of changes to remove alignment argument from memcpy/memmove/memset in favour of alignment attributes. In particular, this changes the AMDGPUPromoteAlloca pass to cease using: 1) The old getAlignment() API of MemoryIntrinsic in favour of getting source & dest specific alignments through the new API. 2) The old IRBuilder createMemCpy/createMemMove single-alignment APIs in favour of the new API that allows setting source and destination alignments independently. Steps: Step 1) Remove alignment parameter and create alignment parameter attributes for memcpy/memmove/memset. ( rL322965, rC322964, rL322963 ) Step 2) Expand the IRBuilder API to allow creation of memcpy/memmove with differing source and dest alignments. ( rL323597 ) Step 3) Update Clang to use the new IRBuilder API. ( rC323617 ) Step 4) Update Polly to use the new IRBuilder API. ( rL323618 ) Step 5) Update LLVM passes that create memcpy/memmove calls to use the new IRBuilder API, and those that use use MemIntrinsicInst::[get\|set]Alignment() to use [get\|set]DestAlignment() and [get\|set]SourceAlignment() instead. ( rL323886, r323891, rL324148, rL324273, rL324278, rL324384, rL324395, rL324402, rL324626, rL324642, rL324653, rL324654, rL324773 ) Step 6) Remove the single-alignment IRBuilder API for memcpy/memmove, and the MemIntrinsicInst::[get\|set]Alignment() methods. Reference http://lists.llvm.org/pipermail/llvm-dev/2015-August/089384.html http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20151109/312083.html llvm-svn: 324774	2018-02-09 21:56:15 +00:00
Matt Arsenault	0063ce7201	AMDGPU: Remove tied operand from si_else llvm-svn: 324751	2018-02-09 17:18:38 +00:00
Matt Arsenault	923712b6b5	Reapply "AMDGPU: Add 32-bit constant address space" This reverts r324494 and reapplies r324487. llvm-svn: 324747	2018-02-09 16:57:57 +00:00
Matt Arsenault	bcf7bec4b8	AMDGPU: Fix layering issue Move utility function that depends on codegen. Fixes build with r324487 reapplied. llvm-svn: 324746	2018-02-09 16:57:48 +00:00
Stanislav Mekhanoshin	9c6cd0458b	[AMDGPU] More descriptive names in the memory legalizer NFC. Differential Revision: https://reviews.llvm.org/D43054 llvm-svn: 324712	2018-02-09 06:05:33 +00:00
Matt Arsenault	9c2f3c4852	AMDGPU: Process SDWA block at a time Right now this loops over the entire function every time there is a change, which is not very efficient. There's no practical reason to track this so globally, since the code motion optimization passes should be sinking instructions with single uses and the pass currently will not fold with multiple uses. llvm-svn: 324667	2018-02-08 22:46:41 +00:00
Matt Arsenault	c24d5e2819	AMDGPU: Minor cleanups Column limit, typo, unnecessary reference llvm-svn: 324666	2018-02-08 22:46:38 +00:00
Matt Arsenault	b02cebf552	AMDGPU: Fix incorrect reordering when inline asm defines LDS address Defs of operands outside of the instruction's explicit defs need to be checked. llvm-svn: 324554	2018-02-08 01:56:14 +00:00
Matt Arsenault	c908e3f77a	AMDGPU: Don't crash when trying to fold implicit operands llvm-svn: 324550	2018-02-08 01:12:46 +00:00
Stanislav Mekhanoshin	db39b4b0b4	[AMDGPU] Fixed wait count reuse The code reusing existing wait counts is incorrect since it keeps adding new operands to an old instruction instead of replacing the immediate. It was also effectively switched off by the condition that wait count is not an AMDGPU::S_WAITCNT. Also switched to BuildMI instead of creating instructions directly. Differential Revision: https://reviews.llvm.org/D42997 llvm-svn: 324547	2018-02-08 00:18:35 +00:00
Rafael Espindola	f4e3f3e31c	Revert "AMDGPU: Add 32-bit constant address space" This reverts commit r324487. It broke clang tests. llvm-svn: 324494	2018-02-07 18:09:35 +00:00
Marek Olsak	871c30e540	AMDGPU: Add 32-bit constant address space Note: This is a candidate for LLVM 6.0, because it was planned to be in that release but was delayed due to a long review period. Merge conflict in release_60 - resolution: Add "-p6:32:32" into the second (non-amdgiz) string. Only scalar loads support 32-bit pointers. An address in a VGPR will fail to compile. That's OK because the results of loads will only be used in places where VGPRs are forbidden. Updated AMDGPUAliasAnalysis and used SReg_64_XEXEC. The tests cover all uses cases we need for Mesa. Reviewers: arsenm, nhaehnle Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D41651 llvm-svn: 324487	2018-02-07 16:01:00 +00:00
Marek Olsak	b2cc77985b	AMDGPU: Remove the s_buffer workaround for GFX9 chips Summary: I checked the AMD closed source compiler and the workaround is only needed when x3 is emulated as x4, which we don't do in LLVM. SMEM x3 opcodes don't exist, and instead there is a possibility to use x4 with the last component being unused. If the last component is out of buffer bounds and falls on the next 4K page, the hw hangs. Reviewers: arsenm, nhaehnle Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, llvm-commits, t-tye Differential Revision: https://reviews.llvm.org/D42756 llvm-svn: 324486	2018-02-07 16:00:40 +00:00
Tom Stellard	33445765dd	AMDGPU/GlobalISel: Mark 32-bit G_FPTOUI as legal Reviewers: arsenm Reviewed By: arsenm Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, rovka, kristof.beyls, dstuttard, tpr, llvm-commits, t-tye Differential Revision: https://reviews.llvm.org/D42152 llvm-svn: 324446	2018-02-07 04:47:59 +00:00
Mark Searles	24c92eeb83	[AMDGPU] Suppress redundant waitcnt instrs. 1. Run the memory legalizer prior to the waitcnt pass; keep the policy that the waitcnt pass does not remove any waitcnts within the incoming IR. 2. The waitcnt pass doesn't (yet) track waitcnts that exist prior to the waitcnt pass (it just skips over them); because the waitcnt pass is ignorant of them, it may insert a redundant waitcnt. To avoid this, check the prev instr. If it and the to-be-inserted waitcnt are the same, suppress the insertion. We keep the existing waitcnt under the assumption that whomever, e.g., the memory legalizer, inserted it knows what they were doing. 3. Follow-on work: teach the waitcnt pass to record the pre-existing waitcnts for better waitcnt production. Differential Revision: https://reviews.llvm.org/D42854 llvm-svn: 324440	2018-02-07 02:21:21 +00:00
Matt Arsenault	a18b3bcf51	AMDGPU: Select BFI patterns with 64-bit ints llvm-svn: 324431	2018-02-07 00:21:34 +00:00
Stanislav Mekhanoshin	ce2d428a98	[AMDGPU] removed dead code handling rmw in memory legalizer It was always using cmpxchg path and in rmw and cmpxchg instructions are not distinguishable in the BE. Differential Revision: https://reviews.llvm.org/D42976 llvm-svn: 324383	2018-02-06 19:11:56 +00:00
Marek Olsak	7d92b7e23a	AMDGPU: Fix S_BUFFER_LOAD_DWORD_SGPR moveToVALU Author: Bas Nieuwenhuizen https://reviews.llvm.org/D42881 llvm-svn: 324353	2018-02-06 15:17:55 +00:00
Tim Renouf	807ecc3d66	[AMDGPU] do not generate .AMDGPU.config for amdpal os type Summary: Now we generate PAL metadata for the amdpal os type, there is no need to generate the .AMDGPU.config section. Reviewers: arsenm, nhaehnle, dstuttard Subscribers: kzhuravl, wdng, yaxunl, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D37760 Change-Id: I303c5fad66656ce97293da60621afac6595b4c18 llvm-svn: 324346	2018-02-06 13:39:38 +00:00
Konstantin Zhuravlyov	8818d13ed2	AMDGPU/MemoryModel: Fix monotonic atomic loads Those should have glc bit set for system and agent synchronization scopes llvm-svn: 324314	2018-02-06 04:06:04 +00:00
Dmitry Preobrazhensky	0a1ff464e1	[AMDGPU][MC] Corrected dst/data size for MIMG opcodes with d16 modifier See bug 36154: https://bugs.llvm.org/show_bug.cgi?id=36154 Differential Revision: https://reviews.llvm.org/D42847 Reviewers: cfang, artem.tamazov, arsenm llvm-svn: 324237	2018-02-05 14:18:53 +00:00
Dmitry Preobrazhensky	e3271aee44	[AMDGPU][MC] Added validation of d16 and r128 modifiers of MIMG opcodes See bugs 36094, 36095: https://bugs.llvm.org/show_bug.cgi?id=36094 https://bugs.llvm.org/show_bug.cgi?id=36095 Differential Revision: https://reviews.llvm.org/D42692 Reviewers: vpykhtin, artem.tamazov, arsenm llvm-svn: 324231	2018-02-05 12:45:43 +00:00
Yaxun Liu	2a22c5deff	[AMDGPU] Switch to the new addr space mapping by default This requires corresponding clang change. Differential Revision: https://reviews.llvm.org/D40955 llvm-svn: 324101	2018-02-02 16:07:16 +00:00
Changpeng Fang	29fcf883fb	AMDGPU/SI: Adjust the encoding family for D16 buffer instructions when the target has UnpackedD16VMem feature. Reviewers: Matt and Brian Differential Revision: https://reviews.llvm.org/D42548 llvm-svn: 323988	2018-02-01 18:41:33 +00:00
Matt Arsenault	af88f0eb44	AMDGPU: Fix missing SCC def from s_xor_b64_term llvm-svn: 323927	2018-01-31 22:54:27 +00:00
Marek Olsak	d4bb329d0e	AMDGPU: Fold inline offset for loads properly in moveToVALU on GFX9 Summary: This enables load merging into x2, x4, which is driven by inline offsets. 6500 shaders are affected: Code Size in affected shaders: -15.14 % Reviewers: arsenm, nhaehnle Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D42078 llvm-svn: 323909	2018-01-31 20:18:11 +00:00
Marek Olsak	13e4741275	AMDGPU: Add intrinsics llvm.amdgcn.cvt.{pknorm.i16, pknorm.u16, pk.i16, pk.u16} Reviewers: arsenm, nhaehnle Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye Differential Revision: https://reviews.llvm.org/D41663 llvm-svn: 323908	2018-01-31 20:18:04 +00:00
Geoff Berry	1d53101387	[AMDGPU] isRenamable fixes to support copy forwarding Mark more opcodes as hasExtraSrcRegAllocReq so that their operands will be marked as not renamable, to avoid copy forwarding violating the constraint that only one operand may use the constant bus. These changes fix a few mis-compiles when copy forwarding is enabled in MachineCopyPropagation by D41835 (and were reviewed as part of that change). llvm-svn: 323794	2018-01-30 17:37:39 +00:00
Mark Searles	94ae3b2f9b	[AMDGPU] Revert "[AMDGPU] Add options for waitcnt pass debugging; add instr count in debug output." Patch caused a buildbot failure; arg; http://lab.llvm.org:8011/builders/lld-x86_64-darwin13/builds/17373/s\ teps/build_Lld/logs/stdio : /Users/buildslave/as-bldslv9/lld-x86_64-darwin13/llvm.src/lib/Target/AMDGPU/SIInsertWaitcnts.cpp:1563:18: error: unused variable 'InstCnt' [-Werror,-Wunused-variable] static int32_t InstCnt = 0; " This reverts commit 4f4a7d61e306b67044d9f16bc2016fee806bc2cc. llvm-svn: 323791	2018-01-30 17:17:06 +00:00
Mark Searles	d6d5a2571f	[AMDGPU] Add options for waitcnt pass debugging; add instr count in debug output. -amdgpu-waitcnt-forcezero={1\|0} Force all waitcnt instrs to be emitted as s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) -amdgpu-waitcnt-forceexp=<n> Force emit a s_waitcnt expcnt(0) before the first <n> instrs -amdgpu-waitcnt-forcelgkm=<n> Force emit a s_waitcnt lgkmcnt(0) before the first <n> instrs -amdgpu-waitcnt-forcevm=<n> Force emit a s_waitcnt vmcnt(0) before the first <n> instrs This patch was pushed ( abb190fd51cd2f9a9eef08c024e109f7f7e909fc ), which caused a buildbot failure, reverted ( 6227480d74da507cf8e1b4bcaffbdb9fb875b4b8 ), and then updated to fix buildbot failures (this patch). Differential Revision: https://reviews.llvm.org/D40091 llvm-svn: 323788	2018-01-30 16:49:38 +00:00
Changpeng Fang	0905870f93	AMDGPU/SI: Add decoding in the GFX80_UNPACKED decoding namespace. Reviewer: Dmitry (dp). Differential Revision: https://reviews.llvm.org/D42596 llvm-svn: 323785	2018-01-30 16:42:40 +00:00
Tom Stellard	3ae38d271e	AMDGPU: Move ADDRIndirect complex pattern into R600Instructions.td Summary: This is only used by R600. Reviewers: arsenm Reviewed By: arsenm Subscribers: kzhuravl, wdng, nhaehnle, mgorny, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D37114 llvm-svn: 323709	2018-01-29 23:29:26 +00:00
Marek Olsak	48057b554c	AMDGPU: Allow a SGPR for the conditional KILL operand Patch by: Bas Nieuwenhuizen Just use the _e64 variant if needed. This should be possible as per def : Pat < (int_amdgcn_kill (i1 (setcc f32:$src, InlineFPImm<f32>:$imm, cond:$cond))), (SI_KILL_F32_COND_IMM_PSEUDO $src, (bitcast_fpimm_to_i32 $imm), (cond_as_i32imm $cond)) > ; I don't think we can get an immediate for the other operand for which we need the second 32-bit word. https://reviews.llvm.org/D42302 llvm-svn: 323706	2018-01-29 23:19:10 +00:00
Geoff Berry	d37dc77b6e	[AMDGPU][X86][Mips] Make sure renamable bit not set for reserved regs Summary: Fix a few places that were modifying code after register allocation to set the renamable bit correctly to avoid failing the validation added in D42449. llvm-svn: 323675	2018-01-29 18:47:48 +00:00
Daniel Sanders	9ade5592d9	[globalisel] Make LegalizerInfo::LegalizeAction available outside of LegalizerInfo. NFC Summary: The improvements to the LegalizerInfo discussed in D42244 require that LegalizerInfo::LegalizeAction be available for use in other classes. As such, it needs to be moved out of LegalizerInfo. This has been done separately to the next patch to minimize the noise in that patch. llvm-svn: 323669	2018-01-29 17:37:29 +00:00
Dmitry Preobrazhensky	4f321aef74	[AMDGPU][MC] Corrected parsing of image opcode modifiers r128 and d16 See bugs 36092, 36093: https://bugs.llvm.org/show_bug.cgi?id=36092 https://bugs.llvm.org/show_bug.cgi?id=36093 Differential Revision: https://reviews.llvm.org/D42583 Reviewers: vpykhtin, artem.tamazov, arsenm llvm-svn: 323651	2018-01-29 14:20:42 +00:00
Hiroshi Inoue	c8e9245816	[NFC] fix trivial typos in comments and documents "to to" -> "to" llvm-svn: 323628	2018-01-29 05:17:03 +00:00
Dmitry Preobrazhensky	706828157f	[AMDGPU][MC] Added validation of image dst/data size (must match dmask and tfe) See bug 36000: https://bugs.llvm.org/show_bug.cgi?id=36000 Differential Revision: https://reviews.llvm.org/D42483 Reviewers: vpykhtin, artem.tamazov, arsenm llvm-svn: 323538	2018-01-26 16:42:51 +00:00
Dmitry Preobrazhensky	0b4eb1ead1	[AMDGPU][MC] Added support of 64-bit image atomics See bug 35998: https://bugs.llvm.org/show_bug.cgi?id=35998 Differential Revision: https://reviews.llvm.org/D42469 Reviewers: vpykhtin, artem.tamazov, arsenm llvm-svn: 323534	2018-01-26 15:43:29 +00:00
Dmitry Preobrazhensky	6cb42e7622	[AMDGPU][MC] Enabled disassembler for image atomic operations See bug 35988: https://bugs.llvm.org/show_bug.cgi?id=35988 Differential Revision: https://reviews.llvm.org/D42186 Reviewers: vpykhtin, artem.tamazov, arsenm llvm-svn: 323527	2018-01-26 14:07:38 +00:00
Daniil Fukalov	6e1dc68117	[AMDGPU] fix LDS f32 intrinsics - using qualified pointer addrspace in intrinsics class to avoid .f32 mangling - changed too common atomic mangling to ds - added missing intrinsics to AMDGPUTTIImpl::getTgtMemIntrinsic Reviewed by: b-sumner Differential Revision: https://reviews.llvm.org/D42383 llvm-svn: 323516	2018-01-26 11:09:38 +00:00
Geoff Berry	c4796d4745	[AMDGPU] Make sure all super regs of reserved regs are marked reserved. Summary: Move reserveRegisterTuples into AMDGPURegisterInfo and use it in R600RegisterInfo::getReservedRegs and R600InstrInfo::reserveIndirectRegisters to ensure that all super registers of reserved registers are also marked as reserved. Before this change, under certain circumstances, the registers %t1_x and %t1_xyzw would be marked as reserved, but %t1_xy and %t1_xyz would not be, leading to the register allocator sometimes assigning a register to %t1_xy, which is invalid since %t1_x is reserved. Reviewers: arsenm, tstellar, MatzeB, qcolombet Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, mcrosier, llvm-commits Differential Revision: https://reviews.llvm.org/D42448 llvm-svn: 323356	2018-01-24 18:09:53 +00:00
Hiroshi Inoue	501931b117	[NFC] fix trivial typos in comments "the the" -> "the" llvm-svn: 323302	2018-01-24 05:04:35 +00:00
Mark Searles	7687d42052	[AMDGPU] SI Load Store Optimizer: When merging with offset, use V_ADD_{I\|U}32_e64 - Change inserted add ( V_ADD_{I\|U}32_e32 ) to _e64 version ( V_ADD_{I\|U}32_e64 ) so that the add uses a vreg for the carry; this prevents inserted v_add from killing VCC; the _e64 version doesn't accept a literal in its encoding, so we need to introduce a mov instr as well to get the imm into a register. - Change pass name to "SI Load Store Optimizer"; this removes the '/', which complicates scripts. Differential Revision: https://reviews.llvm.org/D42124 llvm-svn: 323153	2018-01-22 21:46:43 +00:00
Hiroshi Inoue	290adb3184	[NFC] fix trivial typos in comments "the the" -> "the" llvm-svn: 323074	2018-01-22 05:54:46 +00:00
Dmitry Preobrazhensky	0e074e349d	[AMDGPU][MC] Corrected parsing of image modifiers and encoding of image atomics See bugs 35962: https://bugs.llvm.org/show_bug.cgi?id=35962 35963: https://bugs.llvm.org/show_bug.cgi?id=35963 Differential Revision: https://reviews.llvm.org/D42184 Reviewers: vpykhtin, artem.tamazov, arsenm llvm-svn: 322942	2018-01-19 13:49:53 +00:00
Matthias Braun	4a7c8e7aa2	Split MachineLICM into EarlyMachineLICM and MachineLICM; NFC This avoids playing games with pseudo pass IDs and avoids using an unreliable MRI::isSSA() check to determine whether register allocation has happened. Note that this renames: - MachineLICMID -> EarlyMachineLICM - PostRAMachineLICMID -> MachineLICMID to be consistent with the EarlyTailDuplicate/TailDuplicate naming. llvm-svn: 322927	2018-01-19 06:46:10 +00:00
Changpeng Fang	ba6240cc71	AMDGPU/SI: Fix typos in d16 support patch the buffer intrinsics. llvm-svn: 322906	2018-01-18 22:57:57 +00:00
Changpeng Fang	4737e892de	AMDGPU/SI: Add d16 support for image intrinsics. Summary: This patch implements d16 support for image load, image store and image sample intrinsics. Reviewers: Matt, Brian. Differential Revision: https://reviews.llvm.org/D3991 llvm-svn: 322903	2018-01-18 22:08:53 +00:00
Aditya Nandakumar	18b3f9d384	[GISel] Make constrainSelectedInstRegOperands() available to the legalizer. NFC https://reviews.llvm.org/D42149 llvm-svn: 322743	2018-01-17 19:31:33 +00:00
Matt Arsenault	1491ca8911	AMDGPU: Error in SIAnnotateControlFlow instead of assert This assert typically happens if an unstructured CFG is passed to the pass. This can happen if the pass is run independently without the structurizer. llvm-svn: 322685	2018-01-17 16:30:01 +00:00
Daniil Fukalov	d5fca554e2	[AMDGPU] add LDS f32 intrinsics added llvm.amdgcn.atomic.{add\|min\|max}.f32 intrinsics to allow generate ds_{add\|min\|max}[_rtn]_f32 instructions needed for OpenCL float atomics in LDS Reviewed by: arsenm Differential Revision: https://reviews.llvm.org/D37985 llvm-svn: 322656	2018-01-17 14:05:05 +00:00
Dmitry Preobrazhensky	6b65f7c380	[AMDGPU][MC][GFX9] Enable inline constants for SDWA operands See bug 35771: https://bugs.llvm.org/show_bug.cgi?id=35771 Differential Revision: https://reviews.llvm.org/D42058 Reviewers: vpykhtin, artem.tamazov, arsenm llvm-svn: 322655	2018-01-17 14:00:48 +00:00
Stanislav Mekhanoshin	62875fcd6c	[AMDGPU] Add HW_REG_SH_MEM_BASES symbolic name for s_getreg_b32 Differential Revision: https://reviews.llvm.org/D41617 llvm-svn: 322500	2018-01-15 18:49:15 +00:00
Stanislav Mekhanoshin	f630047ef6	[AMDGPU] Copy impdefs from pseudo to real instructions In some cases we do not copy implicit defs from pseudo to real VOP instructions. It has no visible impact at the moment thus no tests are affected or added. Differential Revision: https://reviews.llvm.org/D41783 llvm-svn: 322496	2018-01-15 17:55:35 +00:00
Tim Renouf	75ced9d5b8	[AMDGPU] stop image_store being moved illegally Summary: A recent change 321556: AMDGPU: Remove mayLoad/hasSideEffects from MIMG stores can allow the machine instruction scheduler to move an image store past an image load using the same descriptor. V2: Fixed by marking image ops as mayAlias and isAliased. This may be overly conservative, and we may need to revisit. V3: Reverted test change done on 321556. Reviewers: arsenm, nhaehnle, dstuttard Subscribers: llvm-commits, t-tye, yaxunl, wdng, kzhuravl Differential Revision: https://reviews.llvm.org/D41969 llvm-svn: 322419	2018-01-12 22:57:24 +00:00
Changpeng Fang	44dfa1de3b	AMDGPU/SI: Add d16 support for buffer intrinsics. Differential Revision: https://reviews.llvm.org/D38906 Reviewers: Matt and Brian. llvm-svn: 322402	2018-01-12 21:12:19 +00:00
Dmitry Preobrazhensky	3afbd825a3	[AMDGPU][MC][GFX8][GFX9] Added XNACK_MASK support See bug 35764: https://bugs.llvm.org/show_bug.cgi?id=35764 Differential Revision: https://reviews.llvm.org/D41614 Reviewers: vpykhtin, artem.tamazov, arsenm llvm-svn: 322189	2018-01-10 14:22:19 +00:00
Tim Renouf	6eaad1e539	[AMDGPU] Fixed incorrect uniform branch condition Summary: I had a case where multiple nested uniform ifs resulted in code that did v_cmp comparisons, combining the results with s_and_b64, s_or_b64 and s_xor_b64 and using the resulting mask in s_cbranch_vccnz, without first ensuring that bits for inactive lanes were clear. There was already code for inserting an "s_and_b64 vcc, exec, vcc" to clear bits for inactive lanes in the case that the branch is instruction selected as s_cbranch_scc1 and is then changed to s_cbranch_vccnz in SIFixSGPRCopies. I have added the same code into SILowerControlFlow for the case that the branch is instruction selected as s_cbranch_vccnz. This de-optimizes the code in some cases where the s_and is not needed, because vcc is the result of a v_cmp, or multiple v_cmp instructions combined by s_and/s_or. We should add a pass to re-optimize those cases. Reviewers: arsenm, kzhuravl Subscribers: wdng, yaxunl, t-tye, llvm-commits, dstuttard, timcorringham, nhaehnle Differential Revision: https://reviews.llvm.org/D41292 llvm-svn: 322119	2018-01-09 21:34:43 +00:00
Matt Arsenault	4ff5e002ea	AMDGPU: Remove dead file llvm-svn: 321752	2018-01-03 18:45:42 +00:00
Alex Bradbury	b22f751fa7	Thread MCSubtargetInfo through Target::createMCAsmBackend Currently it's not possible to access MCSubtargetInfo from a TgtMCAsmBackend. D20830 threaded an MCSubtargetInfo reference through MCAsmBackend::relaxInstruction, but this isn't the only function that would benefit from access. This patch removes the Triple and CPUString arguments from createMCAsmBackend and replaces them with MCSubtargetInfo. This patch just changes the interface without making any intentional functional changes. Once in, several cleanups are possible: * Get rid of the awkward MCSubtargetInfo handling in ARMAsmBackend * Support 16-bit instructions when valid in MipsAsmBackend::writeNopData * Get rid of the CPU string parsing in X86AsmBackend and just use a SubtargetFeature for HasNopl * Emit 16-bit nops in RISCVAsmBackend::writeNopData if the compressed instruction set extension is enabled (see D41221) This change initially exposed PR35686, which has since been resolved in r321026. Differential Revision: https://reviews.llvm.org/D41349 llvm-svn: 321692	2018-01-03 08:53:05 +00:00
Matt Arsenault	e19bc2ee0f	AMDGPU: Use unique PSVs for buffer resources Also fixes using the wrong memory type for some intrinsics when custom lowering them. llvm-svn: 321557	2017-12-29 17:18:21 +00:00
Matt Arsenault	d94b63d765	AMDGPU: Remove mayLoad/hasSideEffects from MIMG stores Atomics still have hasSideEffects set on them because of the mess that is the memory properties. llvm-svn: 321556	2017-12-29 17:18:18 +00:00
Matt Arsenault	905f3518ba	AMDGPU: Implement getTgtMemIntrinsic for images Currently all images are lowered to have a single image PseudoSourceValue. Image stores happen to have overly strict mayLoad/mayStore/hasSideEffects flags set on them, so this happens to work. When these are fixed to be correct, the scheduler breaks this because the identical PSVs are assumed to be the same address. These need to be unique to the image resource value. llvm-svn: 321555	2017-12-29 17:18:14 +00:00
Dmitry Preobrazhensky	414e05383f	[AMDGPU][MC] Incorrect parsing of flat/global atomic modifiers See bug 35730: https://bugs.llvm.org/show_bug.cgi?id=35730 Differential Revision: https://reviews.llvm.org/D41598 Reviewers: vpykhtin, artem.tamazov, arsenm llvm-svn: 321552	2017-12-29 13:55:11 +00:00
Sanjoy Das	26d11ca4b0	(Re-landing) Expose a TargetMachine::getTargetTransformInfo function Re-land r321234. It had to be reverted because it broke the shared library build. The shared library build broke because there was a missing LLVMBuild dependency from lib/Passes (which calls TargetMachine::getTargetIRAnalysis) to lib/Target. As far as I can tell, this problem was always there but was somehow masked before (perhaps because TargetMachine::getTargetIRAnalysis was a virtual function). Original commit message: This makes the TargetMachine interface a bit simpler. We still need the std::function in TargetIRAnalysis to avoid having to add a dependency from Analysis to Target. See discussion: http://lists.llvm.org/pipermail/llvm-dev/2017-December/119749.html I avoided adding all of the backend owners to this review since the change is simple, but let me know if you feel differently about this. Reviewers: echristo, MatzeB, hfinkel Reviewed By: hfinkel Subscribers: jholewinski, jfb, arsenm, dschuff, mcrosier, sdardis, nemanjai, nhaehnle, javed.absar, sbc100, jgravelle-google, aheejin, kbarton, llvm-commits Differential Revision: https://reviews.llvm.org/D41464 llvm-svn: 321375	2017-12-22 18:21:59 +00:00
Dmitry Preobrazhensky	471adf7fdc	[AMDGPU][MC] Corrected handling of negative expressions See bug 35716: https://bugs.llvm.org/show_bug.cgi?id=35716 Reviewers: artem.tamazov, arsenm Differential Revision: https://reviews.llvm.org/D41488 llvm-svn: 321372	2017-12-22 18:03:35 +00:00
Dmitry Preobrazhensky	c5b0c172f6	[AMDGPU][MC] Corrected parsing of optional operands for ds_swizzle_b32 See bug 35645: https://bugs.llvm.org/show_bug.cgi?id=35645 Reviewers: artem.tamazov, arsenm Differential Revision: https://reviews.llvm.org/D41186 llvm-svn: 321367	2017-12-22 17:13:28 +00:00
Dmitry Preobrazhensky	2713495318	[AMDGPU][MC] Added support of 256- and 512-bit tuples of ttmp registers See bug 35561: https://bugs.llvm.org/show_bug.cgi?id=35561 This patch also affects implementation of SGPR and VGPR registers though changes are cosmetic. Reviewers: artem.tamazov, arsenm Differential Revision: https://reviews.llvm.org/D41437 llvm-svn: 321359	2017-12-22 15:18:06 +00:00
Sanjoy Das	747d1114d6	Revert "Expose a TargetMachine::getTargetTransformInfo function" This reverts commit r321234. It breaks the -DBUILD_SHARED_LIBS=ON build. llvm-svn: 321243	2017-12-21 02:34:39 +00:00
Sanjoy Das	0c3de350b4	Expose a TargetMachine::getTargetTransformInfo function Summary: This makes the TargetMachine interface a bit simpler. We still need the std::function in TargetIRAnalysis to avoid having to add a dependency from Analysis to Target. See discussion: http://lists.llvm.org/pipermail/llvm-dev/2017-December/119749.html I avoided adding all of the backend owners to this review since the change is simple, but let me know if you feel differently about this. Reviewers: echristo, MatzeB, hfinkel Reviewed By: hfinkel Subscribers: jholewinski, jfb, arsenm, dschuff, mcrosier, sdardis, nemanjai, nhaehnle, javed.absar, sbc100, jgravelle-google, aheejin, kbarton, llvm-commits Differential Revision: https://reviews.llvm.org/D41464 llvm-svn: 321234	2017-12-21 01:06:58 +00:00
Matt Arsenault	f7f59b5292	[AMDGPU, AsmParser] Enable the mnemonic spell corrector. Patch by Dmitry Venikov llvm-svn: 321202	2017-12-20 18:52:57 +00:00
Mark Searles	e4f067ebe2	[AMDGPU] Turn off MergeConsecutiveStores() before Instruction Selection for AMDGPU. Commit dbbb6c5fc3642987430866dffdf710df4f616ac7 turned on MergeConsecutiveStores() before Instruction Selection for all targets. Enough AMDGPU compiles go into an infinite loop ( MergeConsecutiveStores() merges two stores; LegalizeStoreOps() un-merges; MergeConsecutiveStores() re-merges, etc. ) to warrant turning it off until the issues can be addressed. Differential Revision: https://reviews.llvm.org/D41377 llvm-svn: 321100	2017-12-19 19:26:23 +00:00
Matthias Braun	f1caa2833f	MachineFunction: Return reference from getFunction(); NFC The Function can never be nullptr so we can return a reference. llvm-svn: 320884	2017-12-15 22:22:58 +00:00
Yaxun Liu	c41e2f6e7b	Recommit CodeGen: Fix assertion in machine inst sheduler due to llvm.dbg.value The regression on ppc64 was not due to this commit. llvm-svn: 320788	2017-12-15 03:56:57 +00:00
Matt Arsenault	7d7adf4f2e	TLI: Allow using PSV for intrinsic mem operands llvm-svn: 320756	2017-12-14 22:34:10 +00:00
Matt Arsenault	1117133687	DAG: Expose all MMO flags in getTgtMemIntrinsic Rather than adding more bits to express every MMO flag you could want, just directly use the MMO flags. Also fixes using a bunch of bool arguments to getMemIntrinsicNode. On AMDGPU, buffer and image intrinsics should always have MODereferencable set, but currently there is no way to do that directly during the initial intrinsic lowering. llvm-svn: 320746	2017-12-14 21:39:51 +00:00
Yaxun Liu	f902ef0a5d	Revert CodeGen: Fix assertion in machine inst sheduler due to llvm.dbg.value This commit might have caused regression on ppc64. Revert it to verify that. llvm-svn: 320712	2017-12-14 16:12:04 +00:00
Yaxun Liu	a5315a040d	CodeGen: Fix assertion in machine inst sheduler due to llvm.dbg.value Two issues were found about machine inst scheduler when compiling ProRender with -g for amdgcn target: GCNScheduleDAGMILive::schedule tries to update LiveIntervals for DBG_VALUE, which it should not since DBG_VALUE is not mapped in LiveIntervals. when DBG_VALUE is the last instruction of MBB, ScheduleDAGInstrs::buildSchedGraph and ScheduleDAGMILive::scheduleMI does not move RPTracker properly, which causes assertion. This patch fixes that. Differential Revision: https://reviews.llvm.org/D41132 llvm-svn: 320650	2017-12-13 22:38:09 +00:00
Matt Arsenault	cad7fa857c	AMDGPU: Partially fix disassembly of MIMG instructions Stores failed to decode at all since they didn't have a DecoderNamespace set. Loads worked, but did not change the register width displayed to match the numbmer of enabled channels. The number of printed registers for vaddr is still wrong, but I don't think that's encoded in the instruction so there's not much we can do about that. Image atomics are still broken. MIMG is the same encoding for SI/VI, but the image atomic classes are split up into encoding specific versions unlike every other MIMG instruction. They have isAsmParserOnly set on them for some reason. dmask is also special for these, so we probably should not have it as an explicit operand as it is now. llvm-svn: 320614	2017-12-13 21:07:51 +00:00
Craig Topper	ac59db2efe	[Targets] Don't automatically include the scheduler class enum from *GenInstrInfo.inc with GET_INSTRINFO_ENUM. Make targets request is separately. Most of the targets don't need the scheduler class enum. I have an X86 scheduler model change that causes some names in the enum to become about 18000 characters long. This is because using instregex in scheduler models causes the scheduler class to get named with every instruction that matches the regex concatenated together. MSVC has a limit of 4096 characters for an identifier name. Rather than trying to come up with way to reduce the name length, I'm just going to sidestep the problem by not including the enum in X86. llvm-svn: 320552	2017-12-13 07:26:17 +00:00
Matthias Braun	f842297d50	Rename LiveIntervalAnalysis.h to LiveIntervals.h Headers/Implementation files should be named after the class they declare/define. Also eliminated an `#include "llvm/CodeGen/LiveIntervalAnalysis.h"` in favor of `class LiveIntarvals;` llvm-svn: 320546	2017-12-13 02:51:04 +00:00
Matt Arsenault	3e268cc0dd	LSR: Check more intrinsic pointer operands llvm-svn: 320424	2017-12-11 21:38:43 +00:00
Dmitry Preobrazhensky	ac2b02643b	[AMDGPU][MC][GFX9] Corrected encoding of ttmp registers, disabled tba/tma See bugs 35494 and 35559: https://bugs.llvm.org/show_bug.cgi?id=35494 https://bugs.llvm.org/show_bug.cgi?id=35559 Reviewers: vpykhtin, artem.tamazov, arsenm Differential Revision: https://reviews.llvm.org/D41007 llvm-svn: 320375	2017-12-11 15:23:20 +00:00
Konstantin Zhuravlyov	c40d9f2e5d	AMDGPU/GCN: Bring processors in sync with AMDGPUUsage - Add gfx704 - Change bonaire to gfx704 - Remove gfx804 - Remove gfx901 - Remove gfx903 Differential Revision: https://reviews.llvm.org/D40046 llvm-svn: 320194	2017-12-08 20:52:28 +00:00
Matt Arsenault	73ce93b08b	AMDGPU: Set IntrReadMem on memtime intrinsics llvm-svn: 320188	2017-12-08 20:01:02 +00:00
Matt Arsenault	856777d8c9	AMDGPU: image_getlod and image_getresinfo do not read memory llvm-svn: 320187	2017-12-08 20:00:57 +00:00
Matt Arsenault	ecad0d5364	AMDGPU: Preserve MMO in adjustWritemask Follow up to r319705. Currently the MMO is produced after this in the custom inserter, so this doesn't change anything yet. llvm-svn: 320186	2017-12-08 20:00:45 +00:00
Konstantin Zhuravlyov	e30f88f3a9	AMDGPU: Report Arg's Value name in metadata if kernel_arg_name metadata is not available Differential Revision: https://reviews.llvm.org/D40924 llvm-svn: 320176	2017-12-08 19:22:12 +00:00
Tim Renouf	cead41d42f	[AMDGPU] add labels to +DumpCode output Summary: +DumpCode is a hack to embed disassembly in the ELF file. This commit fixes it to include labels, to make it slightly more useful. Reviewers: arsenm, kzhuravl Subscribers: nhaehnle, timcorringham, dstuttard, llvm-commits, t-tye, yaxunl, wdng, kzhuravl Differential Revision: https://reviews.llvm.org/D40169 llvm-svn: 320146	2017-12-08 14:09:34 +00:00
Mark Searles	9ebdbb433a	[AMDGPU] Revert "[AMDGPU] Add options for waitcnt pass debugging; add instr count in debug output." Patch caused a buildbot failure; http://lab.llvm.org:8011/builders/lld-x86_64-darwin13/builds/15733/steps/build_Lld/logs/stdio : lib/Target/AMDGPU/SIInsertWaitcnts.cpp:396:11: error: private field 'InstCnt' is not used [-Werror,-Wunused-private-field] int32_t InstCnt = 0; ^ 1 error generated. " This reverts commit 71627f79010aafe74fdcba901bba28dd7caa0869. llvm-svn: 320086	2017-12-07 21:14:41 +00:00
Mark Searles	a84d23489a	[AMDGPU] Add options for waitcnt pass debugging; add instr count in debug output. -amdgpu-waitcnt-forcezero={1\|0} Force all waitcnt instrs to be emitted as s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) -amdgpu-waitcnt-forceexp=<n> Force emit a s_waitcnt expcnt(0) before the first <n> instrs -amdgpu-waitcnt-forcelgkm=<n> Force emit a s_waitcnt lgkmcnt(0) before the first <n> instrs -amdgpu-waitcnt-forcevm=<n> Force emit a s_waitcnt vmcnt(0) before the first <n> instrs Differential Revision: https://reviews.llvm.org/D40091 llvm-svn: 320084	2017-12-07 20:36:39 +00:00
Mark Searles	d29f24acfb	[AMDGPU] Add GCNHazardRecognizer::checkInlineAsmHazards() and GCNHazardRecognizer::checkVALUHazardsHelper(). checkInlineAsmHazards() checks INLINEASM for hazards that we particularly care about (so not exhaustive); this patch adds a check for INLINEASM that defs vregs that hold data-to-be stored by immediately preceding store of more than 8 bytes. If the instr were not within an INLINEASM, this scenario would be handled by checkVALUHazard(). Add checkVALUHazardsHelper(), which will be called by both checkVALUHazards() and checkInlineAsmHazards(). Differential Revision: https://reviews.llvm.org/D40098 llvm-svn: 320083	2017-12-07 20:34:25 +00:00
Francis Visoiu Mistrih	a8a83d150f	[CodeGen] Use MachineOperand::print in the MIRPrinter for MO_Register. Work towards the unification of MIR and debug output by refactoring the interfaces. For MachineOperand::print, keep a simple version that can be easily called from `dump()`, and a more complex one which will be called from both the MIRPrinter and MachineInstr::print. Add extra checks inside MachineOperand for detached operands (operands with getParent() == nullptr). https://reviews.llvm.org/D40836 * find . $ -name ".mir" -o -name ".cpp" -o -name ".h" -o -name ".ll" -o -name ".s" $ -type f -print0 \| xargs -0 sed -i '' -E 's/kill: ([^ ]+) ([^ ]+)<def> ([^ ]+)/kill: \1 def \2 \3/g' find . $ -name ".mir" -o -name ".cpp" -o -name ".h" -o -name ".ll" -o -name ".s" $ -type f -print0 \| xargs -0 sed -i '' -E 's/kill: ([^ ]+) ([^ ]+) ([^ ]+)<def>/kill: \1 \2 def \3/g' find . $ -name ".mir" -o -name ".cpp" -o -name ".h" -o -name ".ll" -o -name ".s" $ -type f -print0 \| xargs -0 sed -i '' -E 's/kill: def ([^ ]+) ([^ ]+) ([^ ]+)<def>/kill: def \1 \2 def \3/g' find . $ -name ".mir" -o -name ".cpp" -o -name ".h" -o -name ".ll" -o -name ".s" $ -type f -print0 \| xargs -0 sed -i '' -E 's/<def>//g' find . $ -name ".mir" -o -name ".cpp" -o -name ".h" -o -name ".ll" -o -name ".s" $ -type f -print0 \| xargs -0 sed -i '' -E 's/([^ ]+)<kill>/killed \1/g' find . $ -name ".mir" -o -name ".cpp" -o -name ".h" -o -name ".ll" -o -name ".s" $ -type f -print0 \| xargs -0 sed -i '' -E 's/([^ ]+)<imp-use,kill>/implicit killed \1/g' find . $ -name ".mir" -o -name ".cpp" -o -name ".h" -o -name ".ll" -o -name ".s" $ -type f -print0 \| xargs -0 sed -i '' -E 's/([^ ]+)<dead>/dead \1/g' find . $ -name ".mir" -o -name ".cpp" -o -name ".h" -o -name ".ll" -o -name ".s" $ -type f -print0 \| xargs -0 sed -i '' -E 's/([^ ]+)<def[ ],[ ]dead>/dead \1/g' find . $ -name ".mir" -o -name ".cpp" -o -name ".h" -o -name ".ll" -o -name ".s" $ -type f -print0 \| xargs -0 sed -i '' -E 's/([^ ]+)<imp-def[ ],[ ]dead>/implicit-def dead \1/g' find . $ -name ".mir" -o -name ".cpp" -o -name ".h" -o -name ".ll" -o -name ".s" $ -type f -print0 \| xargs -0 sed -i '' -E 's/([^ ]+)<imp-def>/implicit-def \1/g' find . $ -name ".mir" -o -name ".cpp" -o -name ".h" -o -name ".ll" -o -name ".s" $ -type f -print0 \| xargs -0 sed -i '' -E 's/([^ ]+)<imp-use>/implicit \1/g' find . $ -name ".mir" -o -name ".cpp" -o -name ".h" -o -name ".ll" -o -name ".s" $ -type f -print0 \| xargs -0 sed -i '' -E 's/([^ ]+)<internal>/internal \1/g' find . $ -name ".mir" -o -name ".cpp" -o -name ".h" -o -name ".ll" -o -name "*.s" $ -type f -print0 \| xargs -0 sed -i '' -E 's/([^ ]+)<undef>/undef \1/g' llvm-svn: 320022	2017-12-07 10:40:31 +00:00
Matt Arsenault	8ae38bc0bd	AMDGPU: Fix SDWA crash on inline asm This was only searching for explicit defs, and asserting for any implicit or variadic instruction defs, like inline asm. llvm-svn: 319826	2017-12-05 20:32:01 +00:00
Matt Arsenault	7f0a527300	AMDGPU: Fix infinite loop with dbg_value Surprisingly SIOptimizeExecMaskingPreRA can infinite loop in some case with DBG_VALUE. Most tests using dbg_value are run at -O0, so don't run this pass. This seems to only happen when the value argument is undef. llvm-svn: 319808	2017-12-05 18:23:17 +00:00
Matt Arsenault	e42b08d96d	AMDGPU: Fix missing subtarget feature initializer llvm-svn: 319733	2017-12-05 03:15:44 +00:00
Matt Arsenault	9a60c3ea36	AMDGPU: Fix crash when scheduling DBG_VALUE This calls handleMove with a DBG_VALUE instruction, which isn't tracked by LiveIntervals. I'm not sure this is the correct place to fix this. The generic scheduler seems to have more deliberate region selection that skips dbg_value. The test is also really hard to reduce. I haven't been able to figure out what exactly causes this particular case to try moving the dbg_value. llvm-svn: 319732	2017-12-05 03:09:23 +00:00
Jan Vesely	39aeab4f30	AMDGPU/EG: Add a new FeatureFMA and use it to selectively enable FMA instruction Only used by pre-GCN targets v2: fix predicate setting for FMA_Common Differential Revision: https://reviews.llvm.org/D40692 llvm-svn: 319712	2017-12-04 23:07:28 +00:00
Jan Vesely	d1c9b61e2b	AMDGPU: Disable fp64 support on pre GCN asics It's not implemented. Passing +fp64-fp16-denormal feature enables fp64 even on asics that don't support it v2: fix hasFP64 query Differential Revision: https://reviews.llvm.org/D39931 llvm-svn: 319709	2017-12-04 22:57:29 +00:00
Matt Arsenault	68f0505263	AMDGPU: Fix creating invalid copy when adjusting dmask Move the entire optimization to one place. Before it was possible to adjust dmask without changing the register class of the output instruction, since they were done in separate places. Fix all lane sizes and move all of the optimization into the DAG folding. llvm-svn: 319705	2017-12-04 22:18:27 +00:00
Matt Arsenault	e6667ded4d	AMDGPU: Use return value of MorphNodeTo llvm-svn: 319704	2017-12-04 22:18:22 +00:00
Francis Visoiu Mistrih	25528d6de7	[CodeGen] Unify MBB reference format in both MIR and debug output As part of the unification of the debug format and the MIR format, print MBB references as '%bb.5'. The MIR printer prints the IR name of a MBB only for block definitions. * find . $ -name ".mir" -o -name ".cpp" -o -name ".h" -o -name ".ll" $ -type f -print0 \| xargs -0 sed -i '' -E 's/BB#" << ([a-zA-Z0-9_]+)->getNumber/" << printMBBReference(\1)/g' find . $ -name ".mir" -o -name ".cpp" -o -name ".h" -o -name ".ll" $ -type f -print0 \| xargs -0 sed -i '' -E 's/BB#" << ([a-zA-Z0-9_]+)\.getNumber/" << printMBBReference(\1)/g' * find . $ -name ".txt" -o -name ".s" -o -name ".mir" -o -name ".cpp" -o -name ".h" -o -name ".ll" $ -type f -print0 \| xargs -0 sed -i '' -E 's/BB#([0-9]+)/%bb.\1/g' * grep -nr 'BB#' and fix Differential Revision: https://reviews.llvm.org/D40422 llvm-svn: 319665	2017-12-04 17:18:51 +00:00
Sam Kolton	5f7f32c382	[AMDGPU] SDWA: add support for PRESERVE into SDWA peephole. Summary: Reviewers: arsenm, vpykhtin, rampitec Subscribers: kzhuravl, wdng, nhaehnle, mgorny, yaxunl, dstuttard, tpr, t-tye Differential Revision: https://reviews.llvm.org/D37817 llvm-svn: 319662	2017-12-04 16:22:32 +00:00
Tim Corringham	6c6d5e24cd	AMDGPU: fix missing s_waitcnt Summary: The pass that inserts s_waitcnt instructions where needed propagated info used to track dependencies for each block by iterating over the predecessor blocks. The iteration was terminated when a predecessor that had not yet been processed was encountered. Any info in blocks later in the list was therefore not processed, leading to the possiblility of a required s_waitcnt not being inserted. The fix is simply to change the "break" to "continue" for the relevant loops, so that all visited blocks are processed. This is likely what was intended when the code was written. There is no test case provided for this fix because: 1) the only example that reproduces this is large and resistant to being reduced 2) the change is trivial Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye Differential Revision: https://reviews.llvm.org/D40544 llvm-svn: 319651	2017-12-04 12:30:49 +00:00
Alexander Timofeev	c1425c9d6b	[AMDGPU] SiFixSGPRCopies should not modify non-divergent PHI Differential revision: https://reviews.llvm.org/D40556 llvm-svn: 319534	2017-12-01 11:56:34 +00:00
Matt Arsenault	686d5c728f	AMDGPU: Use carry-less adds in FI elimination llvm-svn: 319501	2017-11-30 23:42:30 +00:00
Matt Arsenault	84445dd13c	AMDGPU: Use gfx9 carry-less add/sub instructions llvm-svn: 319491	2017-11-30 22:51:26 +00:00
Francis Visoiu Mistrih	93ef145862	[CodeGen] Print "%vreg0" as "%0" in both MIR and debug output As part of the unification of the debug format and the MIR format, avoid printing "vreg" for virtual registers (which is one of the current MIR possibilities). Basically: * find . $ -name ".mir" -o -name ".cpp" -o -name ".h" -o -name ".ll" $ -type f -print0 \| xargs -0 sed -i '' -E "s/%vreg([0-9]+)/%\1/g" * grep -nr '%vreg' . and fix if needed * find . $ -name ".mir" -o -name ".cpp" -o -name ".h" -o -name ".ll" $ -type f -print0 \| xargs -0 sed -i '' -E "s/ vreg([0-9]+)/ %\1/g" * grep -nr 'vreg[0-9]\+' . and fix if needed Differential Revision: https://reviews.llvm.org/D40420 llvm-svn: 319427	2017-11-30 12:12:19 +00:00
Matt Arsenault	caf0ed4d74	AMDGPU: Allow negative MUBUF vaddr for gfx9 GFX9 does not enable bounds checking for the resource descriptors used for private access, so it should be OK to use vaddr with a potentially negative value. llvm-svn: 319393	2017-11-30 00:52:40 +00:00
Dmitry Preobrazhensky	1ac7177abb	[AMDGPU][MC][GFX9] Corrected mapping of GFX9 v_add/sub/subrev_u32 When translating pseudo to MC, v_add/sub/subrev_u32 shall be mapped via a separate table as GFX8 has opcodes with the same names. These instructions shall also be labelled as renamed for pseudoToMCOpcode to handle them correctly. Reviewers: arsenm Differential Revision: https://reviews.llvm.org/D40550 llvm-svn: 319311	2017-11-29 13:33:40 +00:00
Matt Arsenault	b655fa9ce2	DAG: Add nuw when splitting loads and stores The object can't straddle the address space wrap around, so I think it's OK to assume any offsets added to the base object pointer can't overflow. Similar logic already appears to be applied in SelectionDAGBuilder when lowering aggregate returns. llvm-svn: 319272	2017-11-29 01:25:12 +00:00
Matt Arsenault	3f71c0e3ee	AMDGPU: Select DS insts without m0 initialization GFX9 stopped using m0 for most DS instructions. Select a different instruction without the use. I think this will be less error prone than trying to manually maintain m0 uses as needed. llvm-svn: 319270	2017-11-29 00:55:57 +00:00
Matt Arsenault	607a756651	AMDGPU: Enable IPRA llvm-svn: 319256	2017-11-28 23:40:12 +00:00
Konstantin Zhuravlyov	06ae4ec78e	AMDGPU: Add num spilled s/vgprs to metadata This was requested by tools. Differential Revision: https://reviews.llvm.org/D40321 llvm-svn: 319192	2017-11-28 17:51:08 +00:00
Francis Visoiu Mistrih	9d7bb0cb40	[CodeGen] Print register names in lowercase in both MIR and debug output As part of the unification of the debug format and the MIR format, always print registers as lowercase. * Only debug printing is affected. It now follows MIR. Differential Revision: https://reviews.llvm.org/D40417 llvm-svn: 319187	2017-11-28 17:15:09 +00:00
Francis Visoiu Mistrih	9d419d3b0c	[CodeGen] Rename functions PrintReg* to printReg* LLVM Coding Standards: Function names should be verb phrases (as they represent actions), and command-like function should be imperative. The name should be camel case, and start with a lower case letter (e.g. openFile() or isFoo()). Differential Revision: https://reviews.llvm.org/D40416 llvm-svn: 319168	2017-11-28 12:42:37 +00:00
Nicolai Haehnle	b4f28deda0	AMDGPU: Re-organize the outer loop of SILoadStoreOptimizer Summary: The entire algorithm operates per basic-block, so for cache locality it should be better to re-optimize a basic-block immediately rather than in a separate loop. I don't have performance measurements. Change-Id: I85106570bd623c4ff277faaa50ee43258e1ddcc5 Reviewers: arsenm, rampitec Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, llvm-commits, t-tye Differential Revision: https://reviews.llvm.org/D40344 llvm-svn: 319156	2017-11-28 08:42:46 +00:00
Nicolai Haehnle	39980dac0b	AMDGPU: Consistently check for immediates in SIInstrInfo::FoldImmediate Summary: The PeepholeOptimizer pass calls this function solely based on checking DefMI->isMoveImmediate(), which only checks the MoveImm bit of the instruction description. So it's up to FoldImmediate itself to properly check that DefMI actually moves from an immediate. I don't have a separate test case for this, but the next patch introduces a test case which happens to crash without this change. This error is caught by the assertion in MachineOperand::getImm(). Change-Id: I88e7cdbcf54d75e1a296822e6fe5f9a5f095bbf8 Reviewers: arsenm, rampitec Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D40342 llvm-svn: 319155	2017-11-28 08:41:50 +00:00
Dmitry Preobrazhensky	16608e67d3	[AMDGPU][MC][DISASSEMBLER][GFX9] Corrected decoding of GLOBAL/SCRATCH opcodes See bug 35433: https://bugs.llvm.org/show_bug.cgi?id=35433 Differential Revision: https://reviews.llvm.org/D40493 Reviewers: artem.tamazov, SamWot, arsenm llvm-svn: 319050	2017-11-27 17:14:35 +00:00
Vedran Miletic	ad21f2687d	[AMDGPU] Add custom lowering for llvm.log{,10}.{f16,f32} intrinsics AMDGPU backend errors with "unsupported call to function" upon encountering a call to llvm.log{,10}.{f16,f32} intrinsics. This patch adds custom lowering to avoid that error on both R600 and SI. Reviewers: arsenm, jvesely Subscribers: tstellar Differential Revision: https://reviews.llvm.org/D29942 llvm-svn: 319025	2017-11-27 13:26:38 +00:00
Dmitry Preobrazhensky	0e8924a5c7	[AMDGPU][MC][GFX9] Added v_interp_p2_f16 and v_interp_p2_legacy_f16 See bug 33629: https://bugs.llvm.org//show_bug.cgi?id=33629 Reviewers: artem.tamazov, SamWot, arsenm Differential Revision: https://reviews.llvm.org/D39488 llvm-svn: 318955	2017-11-24 15:37:14 +00:00
Benjamin Kramer	51ebcaaf25	Make helpers static. NFC. llvm-svn: 318953	2017-11-24 14:55:41 +00:00
Dmitry Preobrazhensky	dd2f1c993e	[AMDGPU][MC][GFX9] Added support of 'inst_offset' modifier for compatibility with SP3 See bug 35329: https://bugs.llvm.org//show_bug.cgi?id=35329 Reviewers: arsenm, vpykhtin, artem.tamazov Differential Revision: https://reviews.llvm.org/D40350 llvm-svn: 318947	2017-11-24 13:22:38 +00:00
Yaxun Liu	c596226604	[AMDGPU] Fix SITargetLowering::LowerCall for pointer info of byval argument SITargetLowering::LowerCall uses dummy pointer info for byval argument, which causes flat load instead of buffer load. This patch fixes that. Differential Revision: https://reviews.llvm.org/D40040 llvm-svn: 318844	2017-11-22 16:13:35 +00:00
Nicolai Haehnle	dd059c161d	AMDGPU: Consider memory dependencies with moved instructions in SILoadStoreOptimizer Summary: This bug seems to have gone unnoticed because critical cases with LDS instructions are eliminated by the peephole optimizer. However, equivalent situations arise with buffer loads and stores as well, so this fixes regressions since r317751 ("AMDGPU: Merge S_BUFFER_LOAD_DWORD_IMM into x2, x4"). Fixes at least: KHR-GL45.shader_storage_buffer_object.basic-operations-case1-cs KHR-GL45.cull_distance.functional piglit tes-input-gl_ClipDistance.shader_test ... and probably more Change-Id: I0e371536288eb8e6afeaa241a185266fd45d129d Reviewers: arsenm, mareko, rampitec Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D40303 llvm-svn: 318829	2017-11-22 12:25:21 +00:00
Dmitry Preobrazhensky	a0342dc9eb	[AMDGPU][MC][GFX8][GFX9] Corrected names of integer v_{add/addc/sub/subrev/subb/subbrev} See bug 34765: https://bugs.llvm.org//show_bug.cgi?id=34765 Reviewers: tamazov, SamWot, arsenm, vpykhtin Differential Revision: https://reviews.llvm.org/D40088 llvm-svn: 318675	2017-11-20 18:24:21 +00:00
Valery Pykhtin	f2fe9725ea	AMDGPU: Partial ILP scheduler port from SelectionDAG to SchedulingDAG (experimental) Differential revision: https://reviews.llvm.org/D39897 llvm-svn: 318649	2017-11-20 14:35:53 +00:00
Matt Arsenault	a41351e37c	AMDGPU: Move hazard avoidance out of waitcnt pass. This is mostly moving VMEM clause breaking into the hazard recognizer. Also move another hazard currently handled in the waitcnt pass. Also stops breaking clauses unless xnack is enabled. llvm-svn: 318557	2017-11-17 21:35:32 +00:00
Dmitry Preobrazhensky	682a654758	[AMDGPU][MC][GFX9][disassembler] Corrected decoding of op_sel_hi for v_mad_mix* See bug 35148: https://bugs.llvm.org//show_bug.cgi?id=35148 Reviewers: tamazov, SamWot, arsenm Differential Revision: https://reviews.llvm.org/D39492 llvm-svn: 318526	2017-11-17 15:15:40 +00:00
Matt Arsenault	4512d0a68b	AMDGPU: Replace list of SMEM buffer opcodes llvm-svn: 318506	2017-11-17 04:18:26 +00:00
Matt Arsenault	03c67d1eb2	AMDGPU: Fix breaking SMEM clauses This was completely ignoring subregisters, so was not very useful. Also only break them if xnack is actually enabled. llvm-svn: 318505	2017-11-17 04:18:24 +00:00
David Blaikie	b3bde2ea50	Fix a bunch more layering of CodeGen headers that are in Target All these headers already depend on CodeGen headers so moving them into CodeGen fixes the layering (since CodeGen depends on Target, not the other way around). llvm-svn: 318490	2017-11-17 01:07:10 +00:00
Eric Christopher	6348188e87	Fix thinko in last commit. llvm-svn: 318374	2017-11-16 03:25:02 +00:00
Eric Christopher	3148a1be88	Add NDEBUG checks around LLVM_DUMP_METHOD functions for Wunused-function warnings. llvm-svn: 318373	2017-11-16 03:18:15 +00:00
Daniel Sanders	f76f315436	[globalisel][tablegen] Generate rule coverage and use it to identify untested rules Summary: This patch adds a LLVM_ENABLE_GISEL_COV which, like LLVM_ENABLE_DAGISEL_COV, causes TableGen to instrument the generated table to collect rule coverage information. However, LLVM_ENABLE_GISEL_COV goes a bit further than LLVM_ENABLE_DAGISEL_COV. The information is written to files (${CMAKE_BINARY_DIR}/gisel-coverage-* by default). These files can then be concatenated into ${LLVM_GISEL_COV_PREFIX}-all after which TableGen will read this information and use it to emit warnings about untested rules. This technique could also be used by SelectionDAG and can be further extended to detect hot rules and give them priority over colder rules. Usage: * Enable LLVM_ENABLE_GISEL_COV in CMake * Build the compiler and run some tests * cat gisel-coverage-[0-9]* > gisel-coverage-all * Delete lib/Target//GenGlobalISel.inc* * Build the compiler Known issues: * ${LLVM_GISEL_COV_PREFIX}-all must be generated as a manual step due to a lack of a portable 'cat' command. It should be the concatenation of all ${LLVM_GISEL_COV_PREFIX}-[0-9]* files. * There's no mechanism to discard coverage information when the ruleset changes Depends on D39742 Reviewers: ab, qcolombet, t.p.northover, aditya_nandakumar, rovka Reviewed By: rovka Subscribers: vsk, arsenm, nhaehnle, mgorny, kristof.beyls, javed.absar, igorb, llvm-commits Differential Revision: https://reviews.llvm.org/D39747 llvm-svn: 318356	2017-11-16 00:46:35 +00:00
Daniel Sanders	725584e26d	Add backend name to Target to enable runtime info to be fed back into TableGen Summary: Make it possible to feed runtime information back to tablegen to enable profile-guided tablegen-eration, detection of untested tablegen definitions, etc. Being a cross-compiler by nature, LLVM will potentially collect data for multiple architectures (e.g. when running 'ninja check'). We therefore need a way for TableGen to figure out what data applies to the backend it is generating at the time. This patch achieves that by including the name of the 'def X : Target ...' for the backend in the TargetRegistry. Reviewers: qcolombet Reviewed By: qcolombet Subscribers: jholewinski, arsenm, jyknight, aditya_nandakumar, sdardis, nemanjai, ab, nhaehnle, t.p.northover, javed.absar, qcolombet, llvm-commits, fedor.sergeev Differential Revision: https://reviews.llvm.org/D39742 llvm-svn: 318352	2017-11-15 23:55:44 +00:00
Matt Arsenault	301162c4fe	AMDGPU: Replace i64 add/sub lowering Use VOP3 add/addc like usual. This has some tradeoffs. Inline immediates fold a little better, but other constants are worse off. SIShrinkInstructions could be made smarter to handle these cases. This allows us to avoid selecting scalar adds where we need to track the carry in scc and replace its users. This makes it easier to use the carryless VALU adds. llvm-svn: 318340	2017-11-15 21:51:43 +00:00
Matt Arsenault	10c472dd83	AMDGPU: Add separate definitions for DS insts without m0 use llvm-svn: 318246	2017-11-15 01:34:06 +00:00
Matt Arsenault	45b98189bd	AMDGPU: Don't use MUBUF vaddr if address may overflow Effectively revert r263964. Before we would not allow this if vaddr was not known to be positive. llvm-svn: 318240	2017-11-15 00:45:43 +00:00
Matt Arsenault	c8903125cd	AMDGPU: Handle or in multi-use shl ptr combine llvm-svn: 318223	2017-11-14 23:46:42 +00:00
Matt Arsenault	9ba465a972	AMDGPU: Error on stack size overflow llvm-svn: 318189	2017-11-14 20:33:14 +00:00
Matt Arsenault	57c37b2dcd	AMDGPU: Fix producing saveexec when the copy is spilled If the register from the copy from exec was spilled, the copy before the spill was deleted leaving a spill of undefined register verifier error and miscompiling. Check for other use instructions of the copy register. llvm-svn: 318132	2017-11-14 02:16:54 +00:00
Matt Arsenault	4b7938c658	AMDGPU: Fix not converting d16 load/stores to offset Fixes missed optimization with new MUBUF instructions. llvm-svn: 318106	2017-11-13 23:24:26 +00:00
Matt Arsenault	4eea3f3da3	AMDGPU: Implement computeKnownBitsForTargetNode for mbcnt llvm-svn: 318100	2017-11-13 22:55:05 +00:00
Jan Vesely	b17f32040c	AMDGPU: Drop duplicate setOperationAction These are set with other scalar int ops few lines up Differential Revision: https://reviews.llvm.org/D39928 llvm-svn: 318051	2017-11-13 16:46:07 +00:00
Matt Arsenault	e5e0c742df	AMDGPU: Preserve nuw in shl add ptr combine llvm-svn: 318017	2017-11-13 05:33:35 +00:00
Matt Arsenault	fbe9533509	AMDGPU: Fix multi-use shl/add combine This was using a custom function that didn't handle the addressing modes properly for private. Use isLegalAddressingMode to avoid duplicating this. Additionally, skip the combine if there is only one use since the standard combine will handle it. llvm-svn: 318013	2017-11-13 05:11:54 +00:00
Matt Arsenault	e1cd482fda	AMDGPU: Select d16 loads into low component of register llvm-svn: 318005	2017-11-13 00:22:09 +00:00
Konstantin Zhuravlyov	27b0a033d8	AMDGPU/NFC: Split Processors.td into GCNProcessors.td and R600Processors.td Differential Revision: https://reviews.llvm.org/D39880 llvm-svn: 317920	2017-11-10 20:01:58 +00:00
Yaxun Liu	35845f06a4	[AMDGPU] Fix pointer info for lowering load/store for r600 for amdgiz environment r600 uses dummy pointer info for lowering load/store. Since dummy pointer info assumes address space 0, this causes isel failure when temporary load/store SDNodes are generated for amdgiz environment. Since the offest is not constant, FixedStack pseudo source value cannot be used to create the pointer info. This patch creates pointer info using llvm undef value. At least this provides correct address space so that isel can be done correctly. Differential Revision: https://reviews.llvm.org/D39698 llvm-svn: 317862	2017-11-10 02:03:28 +00:00
Yaxun Liu	920cc2f813	[AMDGPU] Fix pointer info for pseudo source for r600 The pointer info for pseudo source for r600 is not correct when alloca addr space is not 0, which causes invalid SDNode for r600---amdgiz. This patch fixes that. Differential Revision: https://reviews.llvm.org/D39670 llvm-svn: 317861	2017-11-10 01:53:24 +00:00
Vitaly Buka	bee1964d80	Fix "default label in switch which covers all enumeration values" warning llvm-svn: 317771	2017-11-09 07:46:13 +00:00
Marek Olsak	58410f37ff	AMDGPU: Merge BUFFER_STORE_DWORD_OFFEN/OFFSET into x2, x4 Summary: Only 56 shaders (out of 48486) are affected. Totals from affected shaders (changed stats only): SGPRS: 2420 -> 2460 (1.65 %) Spilled VGPRs: 94 -> 112 (19.15 %) Scratch size: 524 -> 528 (0.76 %) dwords per thread Code Size: 187400 -> 184992 (-1.28 %) bytes One DiRT Showdown shader spills 6 more VGPRs. One Grid Autosport shader spills 12 more VGPRs. The other 54 shaders only have a decrease in code size. (I'm ignoring the SGPR noise) Reviewers: arsenm, nhaehnle Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, llvm-commits, t-tye Differential Revision: https://reviews.llvm.org/D39012 llvm-svn: 317755	2017-11-09 01:52:55 +00:00
Marek Olsak	5cec64195c	AMDGPU: Lower buffer store and atomic intrinsics manually Summary: Without this, SIMemoryLegalizer inserts s_waitcnt vmcnt(0) before every buffer store and atomic instruction. Reviewers: arsenm, nhaehnle Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, llvm-commits, t-tye Differential Revision: https://reviews.llvm.org/D39060 llvm-svn: 317754	2017-11-09 01:52:48 +00:00
Marek Olsak	4c421a2db2	AMDGPU: Merge BUFFER_LOAD_DWORD_OFFSET into x2, x4 Summary: Only 3 (out of 48486) shaders are affected. Reviewers: arsenm, nhaehnle Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D38951 llvm-svn: 317753	2017-11-09 01:52:36 +00:00
Marek Olsak	6a0548acaa	AMDGPU: Merge BUFFER_LOAD_DWORD_OFFEN into x2, x4 Summary: -9.9% code size decrease in affected shaders. Totals (changed stats only): SGPRS: 2151462 -> 2170646 (0.89 %) VGPRS: 1634612 -> 1640288 (0.35 %) Spilled SGPRs: 8942 -> 8940 (-0.02 %) Code Size: 52940672 -> 51727288 (-2.29 %) bytes Max Waves: 373066 -> 371718 (-0.36 %) Totals from affected shaders: SGPRS: 283520 -> 302704 (6.77 %) VGPRS: 227632 -> 233308 (2.49 %) Spilled SGPRs: 3966 -> 3964 (-0.05 %) Code Size: 12203080 -> 10989696 (-9.94 %) bytes Max Waves: 44070 -> 42722 (-3.06 %) Reviewers: arsenm, nhaehnle Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, llvm-commits, t-tye Differential Revision: https://reviews.llvm.org/D38950 llvm-svn: 317752	2017-11-09 01:52:30 +00:00
Marek Olsak	b953cc36e2	AMDGPU: Merge S_BUFFER_LOAD_DWORD_IMM into x2, x4 Summary: Only constant offsets (*_IMM opcodes) are merged. It reuses code for LDS load/store merging. It relies on the scheduler to group loads. The results are mixed, I think they are mostly positive. Most shaders are affected, so here are total stats only: SGPRS: 2072198 -> 2151462 (3.83 %) VGPRS: 1628024 -> 1634612 (0.40 %) Spilled SGPRs: 7883 -> 8942 (13.43 %) Spilled VGPRs: 97 -> 101 (4.12 %) Scratch size: 1488 -> 1492 (0.27 %) dwords per thread Code Size: 60222620 -> 52940672 (-12.09 %) bytes Max Waves: 374337 -> 373066 (-0.34 %) There is 13.4% increase in SGPR spilling, DiRT Showdown spills a few more VGPRs (now 37), but 12% decrease in code size. These are the new stats for SGPR spilling. We already spill a lot SGPRs, so it's uncertain whether more spilling will make any difference since SGPRs are always spilled to VGPRs: SGPR SPILLING APPS Shaders SpillSGPR AvgPerSh alien_isolation 2938 100 0.0 batman_arkham_origins 589 6 0.0 bioshock-infinite 1769 4 0.0 borderlands2 3968 22 0.0 counter_strike_glob.. 1142 60 0.1 deus_ex_mankind_div.. 1410 79 0.1 dirt-showdown 533 4 0.0 dirt_rally 364 1163 3.2 divinity 1052 2 0.0 dota2 1747 7 0.0 f1-2015 776 1515 2.0 grid_autosport 1767 1505 0.9 hitman 1413 273 0.2 left_4_dead_2 1762 4 0.0 life_is_strange 1296 26 0.0 mad_max 358 96 0.3 metro_2033_redux 2670 60 0.0 payday2 1362 22 0.0 portal 474 3 0.0 saints_row_iv 1704 8 0.0 serious_sam_3_bfe 392 1348 3.4 shadow_of_mordor 1418 12 0.0 shadow_warrior 3956 239 0.1 talos_principle 324 1735 5.4 thea 172 17 0.1 tomb_raider 1449 215 0.1 total_war_warhammer 242 56 0.2 ue4_effects_cave 295 55 0.2 ue4_elemental 572 12 0.0 unigine_tropics 210 56 0.3 unigine_valley 278 152 0.5 victor_vran 1262 84 0.1 yofrankie 82 2 0.0 Reviewers: arsenm, nhaehnle Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, llvm-commits, t-tye Differential Revision: https://reviews.llvm.org/D38949 llvm-svn: 317751	2017-11-09 01:52:23 +00:00
Marek Olsak	ffadcb744b	AMDGPU: Fold immediate offset into BUFFER_LOAD_DWORD lowered from SMEM Summary: -5.3% code size in affected shaders. Changed stats only: 48486 shaders in 30489 tests Totals: SGPRS: 2086406 -> 2072430 (-0.67 %) VGPRS: 1626872 -> 1627960 (0.07 %) Spilled SGPRs: 7865 -> 7912 (0.60 %) Code Size: 60978060 -> 60188764 (-1.29 %) bytes Max Waves: 374530 -> 374342 (-0.05 %) Totals from affected shaders: SGPRS: 299664 -> 285688 (-4.66 %) VGPRS: 233844 -> 234932 (0.47 %) Spilled SGPRs: 3959 -> 4006 (1.19 %) Code Size: 14905272 -> 14115976 (-5.30 %) bytes Max Waves: 46202 -> 46014 (-0.41 %) Reviewers: arsenm, nhaehnle Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, llvm-commits, t-tye Differential Revision: https://reviews.llvm.org/D38915 llvm-svn: 317750	2017-11-09 01:52:17 +00:00
David Blaikie	3f833edc7c	Target/TargetInstrInfo.h -> CodeGen/TargetInstrInfo.h to match layering This header includes CodeGen headers, and is not, itself, included by any Target headers, so move it into CodeGen to match the layering of its implementation. llvm-svn: 317647	2017-11-08 01:01:31 +00:00
Matt Arsenault	4709ab9124	AMDGPU: Set correct sched model on v_mad_u64_u32 llvm-svn: 317645	2017-11-08 00:48:25 +00:00
Matt Arsenault	6119f80034	AMDGPU: Remove redundant combine This combine was already done in two places. The generic combiner already has done this since r217610, for adds (with a single use). This one was added in r303641, and added support for handling or as well. r313251 later added support to the generic combine for or. It also turns out the isOrEquivalentToAdd check is not necessary for this combine. Additionally, we already reproduce this combine in yet another place in the backend, although in that version multiple uses of the add are still folded if it will allow a fold into the addressing mode. That version needs to be improved to understand ors though, as well as the correct legal offsets for private. llvm-svn: 317526	2017-11-07 00:06:32 +00:00
Matt Arsenault	4f6318fe1b	AMDGPU: Select v_mad_u64_u32 and v_mad_i64_i32 llvm-svn: 317492	2017-11-06 17:04:37 +00:00
Sanjay Patel	629c411538	[IR] redefine 'UnsafeAlgebra' / 'reassoc' fast-math-flags and add 'trans' fast-math-flag As discussed on llvm-dev: http://lists.llvm.org/pipermail/llvm-dev/2016-November/107104.html and again more recently: http://lists.llvm.org/pipermail/llvm-dev/2017-October/118118.html ...this is a step in cleaning up our fast-math-flags implementation in IR to better match the capabilities of both clang's user-visible flags and the backend's flags for SDNode. As proposed in the above threads, we're replacing the 'UnsafeAlgebra' bit (which had the 'umbrella' meaning that all flags are set) with a new bit that only applies to algebraic reassociation - 'AllowReassoc'. We're also adding a bit to allow approximations for library functions called 'ApproxFunc' (this was initially proposed as 'libm' or similar). ...and we're out of bits. 7 bits ought to be enough for anyone, right? :) FWIW, I did look at getting this out of SubclassOptionalData via SubclassData (spacious 16-bits), but that's apparently already used for other purposes. Also, I don't think we can just add a field to FPMathOperator because Operator is not intended to be instantiated. We'll defer movement of FMF to another day. We keep the 'fast' keyword. I thought about removing that, but seeing IR like this: %f.fast = fadd reassoc nnan ninf nsz arcp contract afn float %op1, %op2 ...made me think we want to keep the shortcut synonym. Finally, this change is binary incompatible with existing IR as seen in the compatibility tests. This statement: "Newer releases can ignore features from older releases, but they cannot miscompile them. For example, if nsw is ever replaced with something else, dropping it would be a valid way to upgrade the IR." ( http://llvm.org/docs/DeveloperPolicy.html#ir-backwards-compatibility ) ...provides the flexibility we want to make this change without requiring a new IR version. Ie, we're not loosening the FP strictness of existing IR. At worst, we will fail to optimize some previously 'fast' code because it's no longer recognized as 'fast'. This should get fixed as we audit/squash all of the uses of 'isFast()'. Note: an inter-dependent clang commit to use the new API name should closely follow commit. Differential Revision: https://reviews.llvm.org/D39304 llvm-svn: 317488	2017-11-06 16:27:15 +00:00
Yaxun Liu	cc56a8b108	[AMDGPU] Change alloca addr space of r600 to 5 for amdgiz environment Differential Revision: https://reviews.llvm.org/D39657 llvm-svn: 317479	2017-11-06 14:32:33 +00:00
Yaxun Liu	1ac16619d2	[AMDGPU] Fix assertion due to assuming pointer in default addr space is 32 bit The backend assumes pointer in default addr space is 32 bit, which is not true for the new addr space mapping and causes assertion for unresolved functions. This patch fixes that. Differential Revision: https://reviews.llvm.org/D39643 llvm-svn: 317476	2017-11-06 13:01:33 +00:00
Yaxun Liu	0d9673cff2	[AMDGPU] Remove hardcoded address space value from AMDGPULibFunc AMDGPULibFunc hardcodes address space values of the old address space mapping, which causes invalid addrspacecast instructions and undefined functions in APPSDK sample MonteCarloAsianDP. This patch fixes that. Differential Revision: https://reviews.llvm.org/D39616 llvm-svn: 317409	2017-11-04 17:37:43 +00:00
David Blaikie	1be62f0327	Move TargetFrameLowering.h to CodeGen where it's implemented This header already includes a CodeGen header and is implemented in lib/CodeGen, so move the header there to match. This fixes a link error with modular codegeneration builds - where a header and its implementation are circularly dependent and so need to be in the same library, not split between two like this. llvm-svn: 317379	2017-11-03 22:32:11 +00:00
Konstantin Zhuravlyov	275a4f76c4	AMDGPU: Fix warning discovered by r317266 [-Wunused-private-field] llvm-svn: 317280	2017-11-02 22:35:22 +00:00
Konstantin Zhuravlyov	b695cd41b3	AMDGPU: Remove outdated fixme (it was already fixed) llvm-svn: 317266	2017-11-02 20:48:06 +00:00
Konstantin Zhuravlyov	435151ad75	AMDGPU: Fix set but not used warnings related to AMDGPUAS Differential Revision: https://reviews.llvm.org/D39499 llvm-svn: 317114	2017-11-01 19:12:38 +00:00
NAKAMURA Takumi	1657f2ad99	Fix warnings discovered by rL317076. [-Wunused-private-field] llvm-svn: 317091	2017-11-01 13:47:55 +00:00
Benjamin Kramer	f9ab3ddb8f	[AMDGPU] Clean up symbols in the global namespace. llvm-svn: 317051	2017-10-31 23:21:30 +00:00
Marek Olsak	5914ece6aa	AMDGPU: Select s_buffer_load_dword with a non-constant SGPR offset Summary: Apps that benefit: - alien isolation - bioshock infinite - civilization: beyond earth - company of heroes 2 - dirt showdown - dota 2 - F1 2015 - grid autosport - hitman - legend of grimrock - serious sam 3: bfe - shadow warrior - talos principle - total war: warhammer - UE4 demos: effects cave, elemental, sun temple Reviewers: arsenm, nhaehnle Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, llvm-commits, t-tye Differential Revision: https://reviews.llvm.org/D38914 llvm-svn: 317038	2017-10-31 21:06:42 +00:00
Yaxun Liu	c928f2a6d4	[AMDGPU] Emit metadata for hidden arguments for kernel enqueue Identifies kernels which performs device side kernel enqueues and emit metadata for the associated hidden kernel arguments. Such kernels are marked with calls-enqueue-kernel function attribute by AMDGPUOpenCLEnqueueKernelLowering pass and later on hidden kernel arguments metadata HiddenDefaultQueue and HiddenCompletionAction are emitted for them. Differential Revision: https://reviews.llvm.org/D39255 llvm-svn: 316907	2017-10-30 14:30:28 +00:00
Tom Stellard	d0c6cf2e8c	AMDGPU/GlobalISel: Mark 32-bit G_FADD as legal Reviewers: arsenm Reviewed By: arsenm Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, rovka, kristof.beyls, igorb, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D38439 llvm-svn: 316815	2017-10-27 23:57:41 +00:00
Marek Olsak	2232243863	AMDGPU: Handle s_buffer_load_dword hazard on SI Reviewers: arsenm, nhaehnle Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, llvm-commits, t-tye Differential Revision: https://reviews.llvm.org/D39171 llvm-svn: 316666	2017-10-26 14:43:02 +00:00
Matt Arsenault	28f52e51f1	AMDGPU: Add max-mix-insts subtarget feature llvm-svn: 316553	2017-10-25 07:00:51 +00:00
Marek Olsak	ce76ea0394	AMDGPU: Add new intrinsic llvm.amdgcn.kill(i1) Summary: Kill the thread if operand 0 == false. llvm.amdgcn.wqm.vote can be applied to the operand. Also allow kill in all shader stages. Reviewers: arsenm, nhaehnle Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D38544 llvm-svn: 316427	2017-10-24 10:27:13 +00:00
Marek Olsak	2114fc3bcb	AMDGPU: Add llvm.amdgcn.wqm.vote intrinsic Reviewers: arsenm, nhaehnle Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye Differential Revision: https://reviews.llvm.org/D38543 llvm-svn: 316426	2017-10-24 10:26:59 +00:00
Konstantin Zhuravlyov	339e74440a	AMDGPU: Initialize WavefrontSize from TD files Differential Revision: https://reviews.llvm.org/D39205 llvm-svn: 316389	2017-10-23 23:02:39 +00:00
Matt Arsenault	a030e2688f	AMDGPU: Cleanup local atomic node names llvm-svn: 316349	2017-10-23 17:16:43 +00:00
Matt Arsenault	b791802aef	AMDGPU: Fix default range in non-kernel functions The range should be assumed to be the hardware maximum if a workitem intrinsic is used in a callable function which does not know the restricted limit of the calling kernel. llvm-svn: 316346	2017-10-23 17:09:35 +00:00
Konstantin Zhuravlyov	8d5e9e110c	AMDGPU: Rename MaxFlatWorkgroupSize to MaxFlatWorkGroupSize for consistency Differential Revision: https://reviews.llvm.org/D38957 llvm-svn: 316097	2017-10-18 17:31:09 +00:00
NAKAMURA Takumi	6f43bd4bde	Untabify. llvm-svn: 316079	2017-10-18 13:31:28 +00:00
Wei Ding	7ab1f7a421	AMDGPU : Fix an error for the llvm.cttz implementation. Differential Revision: http://reviews.llvm.org/D39014 llvm-svn: 316037	2017-10-17 21:49:52 +00:00
Eugene Zelenko	6cadde7f40	[Transforms] Fix some Clang-tidy modernize and Include What You Use warnings; other minor fixes (NFC). llvm-svn: 316034	2017-10-17 21:27:42 +00:00
Konstantin Zhuravlyov	7dabe9ced7	AMDGPU: Start generating metadata for MaxFlatWorkGroupSize Differential Revision: https://reviews.llvm.org/D38958 llvm-svn: 316024	2017-10-17 20:03:21 +00:00
Mark Searles	4e3d6160db	Use the return value of UpdateNodeOperands(); in some cases, UpdateNodeOperands() modifies the node in-place and using the return value isn’t strictly necessary. However, it does not necessarily modify the node, but may return a resultant node if it already exists in the DAG. See comments in UpdateNodeOperands(). In that case, the return value must be used to avoid such scenarios as an infinite loop (node is assumed to have been updated, so added back to the worklist, and re-processed; however, node hasn’t changed so it is once again passed to UpdateNodeOperands(), assumed modified, added back to worklist; cycle infinitely repeats). Differential Revision: https://reviews.llvm.org/D38466 llvm-svn: 315957	2017-10-16 23:38:53 +00:00
Krzysztof Parzyszek	72518eaa6f	Add iterator range MachineRegisterInfo::liveins(), adopt users, NFC llvm-svn: 315927	2017-10-16 19:08:41 +00:00
Aaron Ballman	615eb47035	Reverting r315590; it did not include changes for llvm-tblgen, which is causing link errors for several people. Error LNK2019 unresolved external symbol "public: void __cdecl `anonymous namespace'::MatchableInfo::dump(void)const " (?dump@MatchableInfo@?A0xf4f1c304@@QEBAXXZ) referenced in function "public: void __cdecl `anonymous namespace'::AsmMatcherEmitter::run(class llvm::raw_ostream &)" (?run@AsmMatcherEmitter@?A0xf4f1c304@@QEAAXAEAVraw_ostream@llvm@@@Z) llvm-tblgen D:\llvm\2017\utils\TableGen\AsmMatcherEmitter.obj 1 llvm-svn: 315854	2017-10-15 14:32:27 +00:00
Vitaly Buka	7450398e01	Remove unused variables llvm-svn: 315847	2017-10-15 05:35:02 +00:00
Konstantin Zhuravlyov	8c18f5b3d4	AMDGPU: Don't use TargetStreamer if it has not been initialized Fixes cfe/trunk/test/Misc/backend-resource-limit-diagnostics.cl test after r315808 We may hit few other similar issues, but I want to discuss good solution offline. llvm-svn: 315830	2017-10-14 22:16:26 +00:00
Konstantin Zhuravlyov	a01d8b0b63	AMDGPU: Bring HSA metadata on par with the specification Differential Revision: https://reviews.llvm.org/D38753 llvm-svn: 315821	2017-10-14 19:03:51 +00:00
Konstantin Zhuravlyov	219066bab8	AMDGPU: Improve note directive verification in assembler - Do not allow amd_amdgpu_isa directives on non-amdgcn architectures - Do not allow amd_amdgpu_hsa_metadata on non-amdhsa OSes - Do not allow amd_amdgpu_pal_metadata on non-amdpal OSes Differential Revision: https://reviews.llvm.org/D38750 llvm-svn: 315812	2017-10-14 16:15:28 +00:00
Konstantin Zhuravlyov	eda425edd4	AMDGPU: Do not emit deprecated notes for code object v3 Differential Revision: https://reviews.llvm.org/D38749 llvm-svn: 315810	2017-10-14 15:59:07 +00:00
Konstantin Zhuravlyov	9c05b2bc3b	AMDGPU: Add support for isa version note - Emit NT_AMD_AMDGPU_ISA - Add assembler parsing for isa version directive - If isa version directive does not match command line arguments, then return error Differential Revision: https://reviews.llvm.org/D38748 llvm-svn: 315808	2017-10-14 15:40:33 +00:00
Matt Arsenault	e11d8aca77	AMDGPU: Implement hasBitPreservingFPLogic llvm-svn: 315754	2017-10-13 21:10:22 +00:00
Matt Arsenault	550c66d10f	AMDGPU: Look for src mods before fp_extend When selecting modifiers for mad_mix instructions, look at fneg/fabs that occur before the conversion. llvm-svn: 315748	2017-10-13 20:45:49 +00:00
Matt Arsenault	4d70754e3c	AMDGPU: Implement isFPExtFoldable This helps match v_mad_mix* in some cases. llvm-svn: 315744	2017-10-13 20:18:59 +00:00
Matthias Braun	bb8507e63c	Revert "TargetMachine: Merge TargetMachine and LLVMTargetMachine" Reverting to investigate layering effects of MCJIT not linking libCodeGen but using TargetMachine::getNameWithPrefix() breaking the lldb bots. This reverts commit r315633. llvm-svn: 315637	2017-10-12 22:57:28 +00:00
Matthias Braun	3a9c114b24	TargetMachine: Merge TargetMachine and LLVMTargetMachine Merge LLVMTargetMachine into TargetMachine. - There is no in-tree target anymore that just implements TargetMachine but not LLVMTargetMachine. - It should still be possible to stub out all the various functions in case a target does not want to use lib/CodeGen - This simplifies the code and avoids methods ending up in the wrong interface. Differential Revision: https://reviews.llvm.org/D38489 llvm-svn: 315633	2017-10-12 22:28:54 +00:00
Wei Ding	5676acad9e	Implement custom lowering for ISD::CTTZ_ZERO_UNDEF and ISD::CTTZ. Differential Revision: http://reviews.llvm.org/D37348 llvm-svn: 315610	2017-10-12 19:37:14 +00:00
Konstantin Zhuravlyov	70303c011f	AMDGPU/NFC: Move AMDGPU specific note types to ELF.h Differential Revision: https://reviews.llvm.org/D38747 llvm-svn: 315608	2017-10-12 18:59:54 +00:00
Konstantin Zhuravlyov	63e87f5a02	AMDGPU: Fix warnings introduced in r315526 llvm-svn: 315596	2017-10-12 17:34:05 +00:00
Tim Renouf	c8ffffe462	[AMDGPU] For amdpal, widen interpolation mode workaround Summary: The interpolation mode workaround ensures that at least one interpolation mode is enabled in PSInputAddr. It does not also check PSInputEna on the basis that the user might enable bits in that depending on run-time state. However, for amdpal os type, the user does not enable some bits after compilation based on run-time states; the register values being generated here are the final ones set in the hardware. Therefore, apply the workaround to PSInputAddr and PSInputEnable together. (The case where a bit is set in PSInputAddr but not in PSInputEnable is where the frontend set up an input arg for a particular interpolation mode, but nothing uses that input arg. Really we should have an earlier pass that removes such an arg.) Reviewers: arsenm, nhaehnle, dstuttard Subscribers: kzhuravl, wdng, yaxunl, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D37758 llvm-svn: 315591	2017-10-12 16:16:41 +00:00
Don Hinton	3e0199f7eb	[dump] Remove NDEBUG from test to enable dump methods [NFC] Summary: Add LLVM_FORCE_ENABLE_DUMP cmake option, and use it along with LLVM_ENABLE_ASSERTIONS to set LLVM_ENABLE_DUMP. Remove NDEBUG and only use LLVM_ENABLE_DUMP to enable dump methods. Move definition of LLVM_ENABLE_DUMP from config.h to llvm-config.h so it'll be picked up by public headers. Differential Revision: https://reviews.llvm.org/D38406 llvm-svn: 315590	2017-10-12 16:16:06 +00:00
Reid Kleckner	c18c12e385	Fix AMDGPU build issue llvm-svn: 315535	2017-10-11 23:53:36 +00:00
Lang Hames	2241ffa43c	[MC] Have MCObjectStreamer take its MCAsmBackend argument via unique_ptr. MCObjectStreamer owns its MCCodeEmitter -- this fixes the types to reflect that, and allows us to remove the last instance of MCObjectStreamer's weird "holding ownership via someone else's reference" trick. llvm-svn: 315531	2017-10-11 23:34:47 +00:00
Konstantin Zhuravlyov	516651b154	AMDGPU/NFC: Minor clean ups in HSA metadata - Use HSA metadata streamer directly from AMDGPUAsmPrinter - Make naming consistent with PAL metadata Differential Revision: https://reviews.llvm.org/D38746 llvm-svn: 315526	2017-10-11 22:59:35 +00:00
Konstantin Zhuravlyov	c3beb6a075	AMDGPU/NFC: Minor clean ups in PAL metadata - Move PAL metadata definitions to AMDGPUMetadata - Make naming consistent with HSA metadata Differential Revision: https://reviews.llvm.org/D38745 llvm-svn: 315523	2017-10-11 22:41:09 +00:00
Konstantin Zhuravlyov	a63b0f9d20	AMDGPU/NFC: Rename code object metadata as HSA metadata - Rename AMDGPUCodeObjectMetadata to AMDGPUMetadata (PAL metadata will be included in this file in the follow up change) - Rename AMDGPUCodeObjectMetadataStreamer to AMDGPUHSAMetadataStreamer - Introduce HSAMD namespace - Other minor name changes in function and test names llvm-svn: 315522	2017-10-11 22:18:53 +00:00
Oliver Stannard	4191b9eaea	[Asm] Add debug tracing in table-generated assembly matcher This adds debug tracing to the table-generated assembly instruction matcher, enabled by the -debug-only=asm-matcher option. The changes in the target AsmParsers are to add an MCInstrInfo reference under a consistent name, so that we can use it from table-generated code. This was already being used this way for targets that use deprecation warnings, but 5 targets did not have it, and Hexagon had it under a different name to the other backends. llvm-svn: 315445	2017-10-11 09:17:43 +00:00
Lang Hames	02d330548d	[MC] Have MCObjectStreamer take its MCAsmBackend argument via unique_ptr. MCObjectStreamer owns its MCAsmBackend -- this fixes the types to reflect that, and allows us to remove another instance of MCObjectStreamer's weird "holding ownership via someone else's reference" trick. llvm-svn: 315410	2017-10-11 01:57:21 +00:00
Matt Arsenault	f42074b699	AMDGPU: Fix missing skipFunction calls llvm-svn: 315361	2017-10-10 20:48:36 +00:00
Matt Arsenault	d674e0ac0d	AMDGPU: Fix failure to select branch with optnone opt-bisect/optnone disable the AMDGPUUniformAnnotateValues pass. The heuristic in the custom selector for brcond deferred the branch uniformity check to the pattern, which would fail. llvm-svn: 315360	2017-10-10 20:34:49 +00:00
Matt Arsenault	cc85223f87	AMDGPU: Fix incorrect selection of pseudo-branches These should only be used if the machine structurizer is enabled. llvm-svn: 315357	2017-10-10 20:22:07 +00:00
Yaxun Liu	de4b88d9a1	[AMDGPU] Lower enqueued blocks and generate runtime metadata This patch adds a post-linking pass which replaces the function pointer of enqueued block kernel with a global variable (runtime handle) and adds runtime-handle attribute to the enqueued block kernel. In LLVM CodeGen the runtime-handle metadata will be translated to RuntimeHandle metadata in code object. Runtime allocates a global buffer for each kernel with RuntimeHandel metadata and saves the kernel address required for the AQL packet into the buffer. __enqueue_kernel function in device library knows that the invoke function pointer in the block literal is actually runtime handle and loads the kernel address from it and puts it into AQL packet for dispatching. This cannot be done in FE since FE cannot create a unique global variable with external linkage across LLVM modules. The global variable with internal linkage does not work since optimization passes will try to replace loads of the global variable with its initialization value. Differential Revision: https://reviews.llvm.org/D38610 llvm-svn: 315352	2017-10-10 19:39:48 +00:00
Lang Hames	60fbc7cc38	[MC] Thread unique_ptr<MCObjectWriter> through the create.*ObjectWriter functions. This makes the ownership of the resulting MCObjectWriter clear, and allows us to remove one instance of MCObjectStreamer's bizarre "holding ownership via someone else's reference" trick. llvm-svn: 315327	2017-10-10 16:28:07 +00:00
Nicolai Haehnle	312b64f4d7	AMDGPU: Split MUBUF offset into aligned components Summary: Atomic buffer operations do not work (and trap on gfx9) when the components are unaligned, even if their sum is aligned. Previously, we generated an offset of 4156 without an SGPR by splitting it as 4095 + 61 (immediate + inline constant). The highest offset for which we can do this correctly is 4156 = 4092 + 64. Fixes dEQP-GLES31.functional.ssbo.atomic.* Reviewers: arsenm Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, llvm-commits, t-tye Differential Revision: https://reviews.llvm.org/D37850 llvm-svn: 315302	2017-10-10 12:22:23 +00:00
NAKAMURA Takumi	aba2b3d1f3	SILoadStoreOptimizer.cpp: Fix build; Clang doesn't like "using anonymous struct" since rL315256. llvm-svn: 315283	2017-10-10 08:30:53 +00:00
Lang Hames	dcb312bdb9	[MC] Plumb unique_ptr<MCELFObjectTargetWriter> through createELFObjectWriter to ELFObjectWriter's constructor. Fixes the same ownership issue for ELF that r315245 did for MachO: ELFObjectWriter takes ownership of its MCELFObjectTargetWriter, so we want to pass this through to the constructor via a unique_ptr, rather than a raw ptr. llvm-svn: 315254	2017-10-09 23:53:15 +00:00
Stanislav Mekhanoshin	de42c29a68	[AMDGPU] New 64 bit div/rem expansion Old expansion was 20 VGPRs, 78 SGPRs and ~380 instructions. This expansion is 11 VGPRs, 12 SGPRs and ~120 instructions. Passes OpenCL conformance test_integer_ops quick_[u]long_math Differential Revision: https://reviews.llvm.org/D38607 llvm-svn: 315081	2017-10-06 17:24:45 +00:00
Matt Arsenault	2d3f8f333d	AMDGPU: Set v2i32 any_extend to expand llvm-svn: 314993	2017-10-05 17:38:30 +00:00
Konstantin Zhuravlyov	aa0835a7ab	AMDGPU: Add and set AMDGPU-specific e_flags Differential Revision: https://reviews.llvm.org/D38556 llvm-svn: 314987	2017-10-05 16:19:18 +00:00
Matt Arsenault	f48e5c9ce5	AMDGPU: Add comment about clamps llvm-svn: 314952	2017-10-05 00:13:20 +00:00
Matt Arsenault	aafff87dda	AMDGPU: Do not fold clamp instructions when sources are different Patch by hakzsam (Samuel Pitoiset) llvm-svn: 314951	2017-10-05 00:13:17 +00:00
Matt Arsenault	9ab1fa6803	AMDGPU: Fix not accounting for instruction size in bundles These were counted as 0. Fixes branch limit exceeded errors in some large programs. llvm-svn: 314944	2017-10-04 22:59:12 +00:00
Konstantin Zhuravlyov	8684f7b4f9	AMDGPU: Correctly set EI_OSABI based on the os Differential Revision: https://reviews.llvm.org/D38555 llvm-svn: 314943	2017-10-04 22:44:13 +00:00
Sanjay Patel	4c33d5213b	[SimplifyCFG] put the optional assumption cache pointer in the options struct; NFCI This is a follow-up to https://reviews.llvm.org/D38138. I fixed the capitalization of some functions because we're changing those lines anyway and that helped verify that we weren't accidentally dropping any options by using default param values. llvm-svn: 314930	2017-10-04 20:26:25 +00:00
Konstantin Zhuravlyov	22bc039c89	AMDGPU: Expand setcc for v2f32 and v4f32 llvm-svn: 314853	2017-10-03 21:45:01 +00:00
Konstantin Zhuravlyov	908fa90b51	AMDGPU: Expand setcc for v2i32 and v4i32 llvm-svn: 314852	2017-10-03 21:31:24 +00:00
Tim Renouf	72800f0436	[AMDGPU] implemented pal metadata Summary: For the amdpal OS type: We write an AMDGPU_PAL_METADATA record in the .note section in the ELF (or as an assembler directive). It contains key=value pairs of 32 bit ints. It is a merge of metadata from codegen of the shaders, and metadata provided by the frontend as _amdgpu_pal_metadata IR metadata. Where both sources have a key=value with the same key, the two values are ORed together. This .note record is part of the amdpal ABI and will be documented in docs/AMDGPUUsage.rst in a future commit. Eventually the amdpal OS type will stop generating the .AMDGPU.config section once the frontend has safely moved over to using the .note records above instead of .AMDGPU.config. Reviewers: arsenm, nhaehnle, dstuttard Subscribers: kzhuravl, wdng, yaxunl, llvm-commits, t-tye Differential Revision: https://reviews.llvm.org/D37753 llvm-svn: 314829	2017-10-03 19:03:52 +00:00
Alexander Timofeev	4651396584	[AMDGPU] Avoid predicated execution of the basic blocks containing scalar instructions. Differential revision: https://reviews.llvm.org/D38293 llvm-svn: 314828	2017-10-03 18:55:36 +00:00
Matt Arsenault	90c7593a75	AMDGPU: Remove global isGCN predicates These are problematic because they apply to everything, and can easily clobber whatever more specific predicate you are trying to add to a function. Currently instructions use SubtargetPredicate/PredicateControl to apply this to patterns applied to an instruction definition, but not to free standing Pats. Add a wrapper around Pat so the special PredicateControls requirements can be appended to the final predicate list like how Mips does it. llvm-svn: 314742	2017-10-03 00:06:41 +00:00
Matt Arsenault	c6baa85fc6	AMDGPU: Fix typos llvm-svn: 314715	2017-10-02 20:31:18 +00:00
Stanislav Mekhanoshin	1d8cf2be89	[AMDGPU] Set fast-math flags on functions given the options We have a single library build without relaxation options. When inlined library functions remove fast math attributes from the functions they are integrated into. This patch sets relaxation attributes on the functions after linking provided corresponding relaxation options are given. Math instructions inside the inlined functions remain to have no fast flags, but inlining does not prevent fast math transformations of a surrounding caller code anymore. Differential Revision: https://reviews.llvm.org/D38325 llvm-svn: 314568	2017-09-29 23:40:19 +00:00
Nicolai Haehnle	ce4ddd06da	AMDGPU: VALU carry-in and v_cndmask condition cannot be EXEC The hardware will only forward EXEC_LO; the high 32 bits will be zero. Additionally, inline constants do not work. At least, v_addc_u32_e64 v0, vcc, v0, v1, -1 which could conceivably be used to combine (v0 + v1 + 1) into a single instruction, acts as if all carry-in bits are zero. The llvm.amdgcn.ps.live test is adjusted; it would be nice to combine s_mov_b64 s[0:1], exec v_cndmask_b32_e64 v0, v1, v2, s[0:1] into v_mov_b32 v0, v3 but it's not particularly high priority. Fixes dEQP-GLES31.functional.shaders.helper_invocation.value.* llvm-svn: 314522	2017-09-29 15:37:31 +00:00
Jonas Paulsson	c9e363ac69	[SystemZ] implement shouldCoalesce() Implement shouldCoalesce() to help regalloc avoid running out of GR128 registers. If a COPY involving a subreg of a GR128 is coalesced, the live range of the GR128 virtual register will be extended. If this happens where there are enough phys-reg clobbers present, regalloc will run out of registers (if there is not a single GR128 allocatable register available). This patch tries to allow coalescing only when it can prove that this will be safe by checking the (local) interval in question. Review: Ulrich Weigand, Quentin Colombet https://reviews.llvm.org/D37899 https://bugs.llvm.org/show_bug.cgi?id=34610 llvm-svn: 314516	2017-09-29 14:31:39 +00:00
Tim Renouf	ef1ae8ffac	[AMDGPU] calling conventions for AMDPAL OS type Summary: This commit adds comments on how the AMDPAL OS type overloads the existing AMDGPU_ calling conventions used by Mesa, and adds a couple of new ones. Reviewers: arsenm, nhaehnle, dstuttard Subscribers: mehdi_amini, kzhuravl, wdng, yaxunl, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D37752 llvm-svn: 314502	2017-09-29 09:51:22 +00:00
Tim Renouf	132291589f	[AMDGPU] AMDPAL scratch buffer support Summary: Added support for scratch (including spilling) for OS type amdpal: generates code to set up the scratch descriptor if it is needed. With amdpal, the scratch resource descriptor is loaded from offset 0 of the global information table. The low 32 bits of the address of the global information table is passed in s0. Added amdgpu-git-ptr-high function attribute to hard-wire the high 32 bits of the address of the global information table. If the function attribute is not specified, or is 0xffffffff, then the backend generates code to use the high 32 bits of pc. The documentation for the AMDPAL ABI will be added in a later commit. Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, t-tye Differential Revision: https://reviews.llvm.org/D37483 llvm-svn: 314501	2017-09-29 09:49:35 +00:00
Tim Renouf	9f7ead3334	[Triple] Add AMDPAL operating system type Summary: This operating system type represents the AMDGPU PAL runtime, and will be required by the AMDGPU backend in order to generate correct code for this runtime. Currently it generates the same code as not specifying an OS at all. That will change in future commits. Patch from Tim Corringham. Subscribers: arsenm, nhaehnle Differential Revision: https://reviews.llvm.org/D37380 llvm-svn: 314500	2017-09-29 09:48:12 +00:00
Sanjay Patel	0f9b4773c1	[SimplifyCFG] add a struct to house optional folds (PR34603) This was intended to be no-functional-change, but it's not - there's a test diff. So I thought I should stop here and post it as-is to see if this looks like what was expected based on the discussion in PR34603: https://bugs.llvm.org/show_bug.cgi?id=34603 Notes: 1. The test improvement occurs because the existing 'LateSimplifyCFG' marker is not carried through the recursive calls to 'SimplifyCFG()->SimplifyCFGOpt().run()->SimplifyCFG()'. The parameter isn't passed down, so we pick up the default value from the function signature after the first level. I assumed that was a bug, so I've passed 'Options' down in all of the 'SimplifyCFG' calls. 2. I split 'LateSimplifyCFG' into 2 bits: ConvertSwitchToLookupTable and KeepCanonicalLoops. This would theoretically allow us to differentiate the transforms controlled by those params independently. 3. We could stash the optional AssumptionCache pointer and 'LoopHeaders' pointer in the struct too. I just stopped here to minimize the diffs. 4. Similarly, I stopped short of messing with the pass manager layer. I have another question that could wait for the follow-up: why is the new pass manager creating the pass with LateSimplifyCFG set to true no matter where in the pipeline it's creating SimplifyCFG passes? // Create an early function pass manager to cleanup the output of the // frontend. EarlyFPM.addPass(SimplifyCFGPass()); --> /// \brief Construct a pass with the default thresholds /// and switch optimizations. SimplifyCFGPass::SimplifyCFGPass() : BonusInstThreshold(UserBonusInstThreshold), LateSimplifyCFG(true) {} <-- switches get converted to lookup tables and loops may not be in canonical form If this is unintended, then it's possible that the current behavior of dropping the 'LateSimplifyCFG' setting via recursion was masking this bug. Differential Revision: https://reviews.llvm.org/D38138 llvm-svn: 314308	2017-09-27 14:54:16 +00:00
Matt Arsenault	1390af2dd2	AMDGPU: Add option to stress calls This inverts the behavior of the AlwaysInline pass to mark every function not already marked alwaysinline as noinline. llvm-svn: 313865	2017-09-21 07:00:48 +00:00
Matt Arsenault	fdcdd88d57	AMDGPU: Fix crash on immediate operand We can have a v_mac with an immediate src0. We can still fold if it's an inline immediate, otherwise it already uses the constant bus. llvm-svn: 313852	2017-09-21 00:45:59 +00:00
Matt Arsenault	8cbb4884a5	AMDGPU: Start selecting v_mad_mixhi_f16 llvm-svn: 313814	2017-09-20 21:01:24 +00:00
Matt Arsenault	e135c4c6a6	AMDGPU: Add tied operands to v_mad_mix{lo\|hi}_f16 These write to the low and high half of the destination register and leave the other 16-bits unchanged. This is true for most 16-bit instructions on gfx9, but we don't use that now. llvm-svn: 313812	2017-09-20 20:53:49 +00:00
Matt Arsenault	76935122cc	AMDGPU: Start selecting v_mad_mixlo_f16 Also add some tests that should be able to use v_mad_mixhi_f16, but do not yet. This is trickier because we don't really model the partial update of the register done by 16-bit instructions. llvm-svn: 313806	2017-09-20 20:28:39 +00:00
Matt Arsenault	644883ff07	AMDGPU: Fix encoding of op_sel for mad_mix* opcodes llvm-svn: 313797	2017-09-20 19:09:28 +00:00
Stanislav Mekhanoshin	2e3bf37ec4	[AMDGPU] Fixed memory leak with inliner replaced Delete inliner before replacing it. llvm-svn: 313723	2017-09-20 06:34:28 +00:00
Matt Arsenault	c8aea66627	AMDGPU: Move r600 only code into r600 only td file llvm-svn: 313719	2017-09-20 06:11:25 +00:00
Stanislav Mekhanoshin	5641820141	[AMDGPU] Fix regression in test clang/test/CodeGen/backend-unsupported-error.ll llvm-svn: 313718	2017-09-20 06:10:15 +00:00
Matt Arsenault	b81495dccb	AMDGPU: Match load d16 hi instructions Also starts selecting global loads for constant address in some cases. Some end up selecting to mubuf still, which requires investigation. We still get sub-optimal regalloc and extra waitcnts inserted due to not really tracking the liveness of the separate register halves. llvm-svn: 313716	2017-09-20 05:01:53 +00:00
Stanislav Mekhanoshin	5670e6d482	[AMDGPU] Port of HSAIL inliner Differential Revision: https://reviews.llvm.org/D36849 llvm-svn: 313714	2017-09-20 04:25:58 +00:00
Matt Arsenault	bc68383166	AMDGPU: Cleanup load/store PatFrags Try to use a consistent naming scheme. llvm-svn: 313713	2017-09-20 03:43:35 +00:00
Matt Arsenault	fcc213fab7	AMDGPU: Match store d16_hi instructions llvm-svn: 313712	2017-09-20 03:20:09 +00:00
Stanislav Mekhanoshin	d4ae470d2e	[AMDGPU] Prevent post-RA scheduler from breaking memory clauses The pre-RA scheduler does load/store clustering, but post-RA scheduler undoes it. Add mutation to prevent it. Differential Revision: https://reviews.llvm.org/D38014 llvm-svn: 313670	2017-09-19 20:54:38 +00:00
Matt Arsenault	e745d9963e	AMDGPU: Run internalize symbols at -O0 The relocations used for externally visible functions aren't supported, so the direct call emitted ends up hitting a linker error. llvm-svn: 313616	2017-09-19 07:40:11 +00:00
Konstantin Zhuravlyov	ca8946a376	AMDGPU: Start selecting s_xnor_{b32, b64} Differential Revision: https://reviews.llvm.org/D37981 llvm-svn: 313565	2017-09-18 21:22:45 +00:00
Jan Sjodin	1f2f57a7ea	Fix warnings in r313297. llvm-svn: 313302	2017-09-14 21:49:52 +00:00
Matt Arsenault	c317287fde	AMDGPU: Fix violating constant bus restriction You can't use madmk/madmk if it already uses an SGPR input. llvm-svn: 313298	2017-09-14 20:54:29 +00:00
Jan Sjodin	312ccf761c	Add AddresSpace to PseudoSourceValue. Differential Revision: https://reviews.llvm.org/D35089 llvm-svn: 313297	2017-09-14 20:53:51 +00:00
Matt Arsenault	37ab4cf8b8	AMDGPU: Fix assert on alloca of array of struct llvm-svn: 313282	2017-09-14 18:02:29 +00:00
Matt Arsenault	defe371771	AMDGPU: Stop modifying SP in call sequences Because the stack growth direction and addressing is done in the same direction, modifying SP at the beginning of the call sequence was incorrect. If we had a stack passed argument, we would end up skipping that number of bytes before pushing arguments, leaving unused/inconsistent space. The callee creates fixed stack objects in its frame, so the space necessary for these is already logically allocated in the callee, so we just let the callee increment SP if it really requires it. llvm-svn: 313279	2017-09-14 17:37:40 +00:00
Matt Arsenault	6efd082c01	AMDGPU: Make frame register caller preserved Using SplitCSR for the frame register was very broken. Often the copies in the prolog and epilog were optimized out, in addition to them being inserted after the true prolog where the FP was clobbered. I have a hacky solution which works that continues to use split CSR, but for now this is simpler and will get to working programs. llvm-svn: 313274	2017-09-14 17:14:57 +00:00
Matt Arsenault	ecb43ef1bc	AMDGPU: Don't spill SP reg like a normal CSR llvm-svn: 313217	2017-09-13 23:47:01 +00:00
Stanislav Mekhanoshin	7fe9a5d9b4	Allow target to decide when to cluster loads/stores in misched MachineScheduler when clustering loads or stores checks if base pointers point to the same memory. This check is done through comparison of base registers of two memory instructions. This works fine when instructions have separate offset operand. If they require a full calculated pointer such instructions can never be clustered according to such logic. Changed shouldClusterMemOps to accept base registers as well and let it decide what to do about it. Differential Revision: https://reviews.llvm.org/D37698 llvm-svn: 313208	2017-09-13 22:20:47 +00:00
Matt Arsenault	fb017ae155	AMDGPU: Handle coldcc in more places Missed in r312936 llvm-svn: 313205	2017-09-13 21:55:52 +00:00
Matt Arsenault	537bd3b906	AMDGPU: Allow coldcc calls llvm-svn: 312936	2017-09-11 18:54:20 +00:00
Stanislav Mekhanoshin	710da42b86	[AMDGPU] Produce madak and madmk from the two-address pass These two instructions are normally selected, but when the two address pass converts mac into mad we end up with the mad where we could have one of these. Differential Revision: https://reviews.llvm.org/D37389 llvm-svn: 312928	2017-09-11 17:13:57 +00:00
Tim Renouf	660ba2b8af	[AMDGPU] exp should not be in WQM mode A mrt exp with vm=1 must be in exact (non-WQM) mode, as it also exports the exec mask as the valid mask to determine which pixels to render. This commit marks any exp as needing to be in exact mode. Actually, if there are multiple mrt exps, only one needs to have vm=1, and only that one needs to be in exact mode. But that is an optimization for another day. Differential Revision: https://reviews.llvm.org/D36305 llvm-svn: 312915	2017-09-11 13:55:39 +00:00
Tim Renouf	6cb007fc72	AMDGPU: trivial comment change ... to check commit access for new committer. llvm-svn: 312900	2017-09-11 08:31:32 +00:00
Davide Italiano	0731a4f52a	[AMDGPU] Remove unused function. NFCI. llvm-svn: 312836	2017-09-08 23:54:11 +00:00
Matt Arsenault	461ed08fbd	AMDGPU: Start using !con operator We have a lot of operand definition work essentially producing every valid permutation of operands to workaround builiding operand lists based on the instruction features. Apparently tablegen already has a mostly undocumented operator to concat dags which simplies this. Convert one simple place to use this. The BUF instruction definitions have much more complicated logic that can be totally rewritten now. llvm-svn: 312822	2017-09-08 19:09:13 +00:00
Matt Arsenault	2f4df7ec41	AMDGPU: Recompute scc liveness The various scalar bit operations set SCC, so one is erased or moved it needs to be recomputed. Not sure why the existing tests don't fail on this. llvm-svn: 312819	2017-09-08 18:51:26 +00:00
Matt Arsenault	d7e2303df2	AMDGPU: Start selecting v_mad_mix_f32 llvm-svn: 312732	2017-09-07 18:05:07 +00:00
Konstantin Zhuravlyov	5f5b586c99	AMDGPU: Handle non-temporal loads and stores Differential Revision: https://reviews.llvm.org/D36862 llvm-svn: 312729	2017-09-07 17:14:54 +00:00
Konstantin Zhuravlyov	c8c9d4a0a6	AMDGPU: Handle more than one memory operand in SIMemoryLegalizer Differential Revision: https://reviews.llvm.org/D37397 llvm-svn: 312725	2017-09-07 16:14:21 +00:00
Matt Arsenault	65ca292a8d	AMDGPU: Don't legalize i16 extloads to i32 with legal i16 Keeping non-i16 extloads makes it easier to match some new gfx9 load instructions. llvm-svn: 312699	2017-09-07 05:37:34 +00:00
Stanislav Mekhanoshin	442e28dd42	[AMDGPU] Use v_pk_max_f16 for fcanonicalize Differential Revision: https://reviews.llvm.org/D37325 llvm-svn: 312676	2017-09-06 22:27:29 +00:00
Stanislav Mekhanoshin	ea134bcb13	[AMDGPU] Fixed encoding of v_pk_mul_f16 in fcanonicalize Differential Revision: https://reviews.llvm.org/D37522 llvm-svn: 312660	2017-09-06 18:29:51 +00:00
Stanislav Mekhanoshin	949fac9e40	[AMDGPU] Fix shouldClusterMemOps to process flat loads Flat loads do not have vdata operand but have vdst instead. Differential Revision: https://reviews.llvm.org/D37502 llvm-svn: 312640	2017-09-06 15:31:30 +00:00
Nicolai Haehnle	523827145b	AMDGPU: Make worst-case assumption about the wait states in inline assembly Summary: Mesa still uses a hack where empty inline assembly is used as a kind of optimization barrier. This exposed a problem where not enough wait states were inserted, because the hazard recognizer implicitly assumed that each inline assembly "instruction" has at least one wait state. Reviewers: arsenm Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, llvm-commits, t-tye Differential Revision: https://reviews.llvm.org/D37205 llvm-svn: 312635	2017-09-06 13:50:13 +00:00
Yaxun Liu	fc5121a722	[AMDGPU] Transform __read_pipe_* and __write_pipe_* When packet size equals packet align and is power of 2, transform __read_pipe* and __write_pipe* to specialized library function. Differential Revision: https://reviews.llvm.org/D36831 llvm-svn: 312598	2017-09-06 00:30:27 +00:00
Konstantin Zhuravlyov	80528702c9	AMDGPU: Cleanup/refactor SIMemoryLegalizer [3]: - Refactor SIMemOpInfo's constructors - Allow construction of NotAtomic SIMemOpInfo Differential Revision: https://reviews.llvm.org/D37396 llvm-svn: 312563	2017-09-05 19:01:10 +00:00
Matt Arsenault	22cdb61a78	AMDGPU: Fix not accounting for tail call resource usage If the only call in a function is a tail call, the function isn't considered to have a call since it's a type of return. llvm-svn: 312561	2017-09-05 18:36:36 +00:00
Konstantin Zhuravlyov	1aa667fe64	AMDGPU/NFC: Cleanup/refactor SIMemoryLegalizer [2]: - Make SIMemOpInfo a class - Add accessor methods to SIMemOpInfo - Move get*Info methods to SIMemOpInfo Differential Revision: https://reviews.llvm.org/D37395 llvm-svn: 312541	2017-09-05 16:41:25 +00:00
Konstantin Zhuravlyov	844845ae06	AMDGPU/NFC: Cleanup/refactor SIMemoryLegalizer [1]: - Rename MemOpInfo -> SIMemOpInfo - Move SIMemOpInfo class out of SIMemoryLegalizer class Differential Revision: https://reviews.llvm.org/D37394 llvm-svn: 312540	2017-09-05 16:18:05 +00:00
Stanislav Mekhanoshin	dbfda5b601	[AMDGPU] Prevent infinite recursion in DAG.computeKnownBits() Differential Revision: https://reviews.llvm.org/D37392 llvm-svn: 312364	2017-09-01 20:43:20 +00:00
Matt Arsenault	efa1d655d4	AMDGPU: Add ds_{read\|write}_addtid_b32 definitions llvm-svn: 312349	2017-09-01 18:38:02 +00:00
Matt Arsenault	ed6e8f0a90	AMDGPU: Add most d16 load/store instruction definitions Doesn't include the tied operand necessary for the loads, but is enough for the assembler to work. llvm-svn: 312347	2017-09-01 18:36:06 +00:00
Nicolai Haehnle	75c98c365b	AMDGPU: IMPLICIT_DEFs and DBG_VALUEs do not contribute to wait states Summary: This fixes a bug that was exposed on gfx9 in various GL45-CTS.shaders.loops.*_iterations.select_iteration_count_fragment tests, e.g. GL45-CTS.shaders.loops.do_while_uniform_iterations.select_iteration_count_fragment Reviewers: arsenm Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D36193 llvm-svn: 312337	2017-09-01 16:56:32 +00:00
Matt Arsenault	ab4a5cd335	AMDGPU: Fold clamp modifier for packed instructions llvm-svn: 312297	2017-08-31 23:53:50 +00:00
Eugene Zelenko	fa6434bebb	[Analysis] Fix some Clang-tidy modernize-use-using and Include What You Use warnings; other minor fixes. Also affected in files (NFC). llvm-svn: 312289	2017-08-31 21:56:16 +00:00
Matt Arsenault	fe003f347a	AMDGPU: Turn int pack pattern into build_vector build_vector is a more useful canonical form when pattern matching packed operations, so turn shift into high element into a build_vector. Should show no change for now. llvm-svn: 312282	2017-08-31 21:17:22 +00:00
Matt Arsenault	376f1bd73c	AMDGPU: Don't assert in TTI with fp32 denorms enabled Also refine for f16 and rcp cases. llvm-svn: 312213	2017-08-31 05:47:00 +00:00
Matt Arsenault	67e72dee79	AMDGPU: Use set for tracked registers The majority of the time spent in the pass checking for the register reads. Rather than searching all of the defined registers for uses in each instruction, use a set of defined registers and check the operands of the instruction. This process still is algorithmically not great, but with the additional trick of skipping the analysis for addresses with one use, this brings one slow testcase into a reasonable range. llvm-svn: 312206	2017-08-31 01:53:09 +00:00
Matt Arsenault	c8f8cda0cd	AMDGPU: Correct operand types for v_mad_mix* These aren't really packed instructions, so the default op_sel_hi should be 0 since this indicates a conversion. The operand types are scalar values that behave similar to an f16 scalar that may be converted to f32. Doesn't change the default printing for op_sel_hi, just the parsing. llvm-svn: 312179	2017-08-30 22:18:40 +00:00
Matt Arsenault	3cb61634ff	AMDGPU: Don't look for DS merge candidates with one use address The merge is only possible if the base address register is the same for the two instructions. If there is only the one use, there's no point in doing an expensive forward scan checking for memory interference looking for a merge candidate. This gives a signficant improvement in one extreme testcase. The code to do the scan is still algorithmically terrible, so this is still the slowest pass in that example. llvm-svn: 312096	2017-08-30 03:26:18 +00:00
Stanislav Mekhanoshin	06cab79e50	[AMDGPU] Use v_max_f* for fcanonicalize If denorms are not flushed we can use max instead of multiplication by 1. For double that is simply faster, while for float and half it is shorter, because mul uses constant bus and VOP3. Differential Revision: https://reviews.llvm.org/D36856 llvm-svn: 312095	2017-08-30 03:03:38 +00:00
Matt Arsenault	6b114d2c50	AMDGPU: Select clamp pattern with v2f16 llvm-svn: 312087	2017-08-30 01:20:17 +00:00
Matt Arsenault	2d69c924f7	AMDGPU: Fix typo llvm-svn: 312040	2017-08-29 21:25:51 +00:00
Stanislav Mekhanoshin	312c557b3b	[AMDGPU] Fix regression in AMDGPULibCalls allowing native for doubles Under -cl-fast-relaxed-math we could use native_sqrt, but f64 was allowed to produce HSAIL's nsqrt instruction. HSAIL is not here and we stick with non-existing native_sqrt(double) as a result. Add check for f64 to not return native functions and also remove handling of f64 case for fold_sqrt. Differential Revision: https://reviews.llvm.org/D37223 llvm-svn: 311900	2017-08-28 18:00:08 +00:00
Stanislav Mekhanoshin	dad7cf62de	[AMDGPU] computeKnownBitsForTargetNode for 24 bit mul Differential Revision: https://reviews.llvm.org/D37168 llvm-svn: 311896	2017-08-28 16:35:37 +00:00
Konstantin Zhuravlyov	68107657d4	AMDGPU: Fix gfx801 features gfx801 has 1/2 rate F64, Fast F32 FMA Differential Revision: https://reviews.llvm.org/D36981 llvm-svn: 311694	2017-08-24 20:03:07 +00:00
Benjamin Kramer	49a49fe816	Move helper classes into anonymous namespaces. No functionality change intended. llvm-svn: 311288	2017-08-20 13:03:48 +00:00
Konstantin Zhuravlyov	89377c440c	AMDGPU/NFC: Reorder functions in SIMemoryLegalizer: - Move load functions before atomic functions - Move store functions before atomic functions llvm-svn: 311256	2017-08-19 18:44:27 +00:00
Konstantin Zhuravlyov	f5d826a294	AMDGPU/NFC: Rename few things in SIMemoryLegalizer: - AtomicInfo -> MemOpInfo - getAtomicLoadInfo -> getLoadInfo - getAtomicStoreInfo -> getStoreInfo - expandAtomicLoad -> expandLoad - expandAtomicStore -> expandStore Differential Revision: https://reviews.llvm.org/D36861 llvm-svn: 311179	2017-08-18 17:30:02 +00:00
Tom Stellard	a096b12628	AMDGPU: Add R600InstPrinter class Summary: This is step towards separating the GCN and R600 tablegen'd code. This is a little awkward for now, because the R600 functions won't have the MCSubtargetInfo parameter, so we need to have AMDMGPUInstPrinter delegate to R600InstPrinter, but once the tablegen'd code is split, we will be able to drop the delegation and use R600InstPrinter directly. Reviewers: arsenm Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D36444 llvm-svn: 311128	2017-08-17 22:20:04 +00:00
Evgeny Mankov	bf9751760a	[AMDGPU] NFC: test commit llvm-svn: 311019	2017-08-16 16:47:29 +00:00
Konstantin Zhuravlyov	d3d89efa3e	AMDGPU/NFC: Sort files in CMakeLists.txt alphabetically llvm-svn: 311017	2017-08-16 16:23:32 +00:00
Dmitry Preobrazhensky	b865ef534a	[AMDGPU][MC][GFX9] Added op_sel support for v_mad_*16, v_fma_f16, v_div_fixup_f16 This change implements features postponed in https://reviews.llvm.org/D35424 because of a dependency on https://reviews.llvm.org/D36322 Reviewers: SamWot, artem.tamazov, arsenm Differential Revision: https://reviews.llvm.org/D36694 llvm-svn: 311011	2017-08-16 15:16:32 +00:00
Dmitry Preobrazhensky	ff64aa514b	[AMDGPU][MC][GFX9] Added integer clamping support for VOP3 opcodes See Bug 34152: https://bugs.llvm.org//show_bug.cgi?id=34152 Reviewers: SamWot, artem.tamazov, arsenm Differential Revision: https://reviews.llvm.org/D36674 llvm-svn: 311006	2017-08-16 13:51:56 +00:00
Stanislav Mekhanoshin	a9487d92d7	[AMDGPU] Eliminate no effect instructions before s_endpgm Differential Revision: https://reviews.llvm.org/D36585 llvm-svn: 310987	2017-08-16 04:43:49 +00:00
Quentin Colombet	61d71a138b	Reapply "[GlobalISel] Remove the GISelAccessor API." This reverts commit r310425, thus reapplying r310335 with a fix for link issue of the AArch64 unittests on Linux bots when BUILD_SHARED_LIBS is ON. Original commit message: [GlobalISel] Remove the GISelAccessor API. Its sole purpose was to avoid spreading around ifdefs related to building global-isel. Since r309990, GlobalISel is not optional anymore, thus, we can get rid of this mechanism all together. NFC. ---- The fix for the link issue consists in adding the GlobalISel library in the list of dependencies for the AArch64 unittests. This dependency comes from the use of AArch64Subtarget that needs to know how to destruct the GISel related APIs when being detroyed. Thanks to Bill Seurer and Ahmed Bougacha for helping me reproducing and understand the problem. llvm-svn: 310969	2017-08-15 22:31:51 +00:00
Matt Arsenault	71bcbd451f	AMDGPU: Start adding tail call support Handle the sibling call cases. llvm-svn: 310753	2017-08-11 20:42:08 +00:00
Stanislav Mekhanoshin	976cedda26	[AMDGPU] Fix santizer error after last commit Removed useless assert. llvm-svn: 310738	2017-08-11 17:54:43 +00:00
Stanislav Mekhanoshin	7f37794ebd	[AMDGPU] Ported and adopted AMDLibCalls pass The pass does simplifications of well known AMD library calls. If given -amdgpu-prelink option it works in a pre-link mode which allows to reference new library functions which will be linked in later. In addition it also used to process traditional AMD option -fuse-native which allows to replace some of the functions with their fast native implementations from the library. The necessary glue to pass the prelink option and translate -fuse-native is to be added to the driver. Differential Revision: https://reviews.llvm.org/D36436 llvm-svn: 310731	2017-08-11 16:42:09 +00:00
Eugene Zelenko	c8fbf6ffea	[AMDGPU] Fix some Clang-tidy modernize-use-using and Include What You Use warnings; other minor fixes (NFC). llvm-svn: 310541	2017-08-10 00:46:15 +00:00
Matt Arsenault	36cd1859f3	AMDGPU: Fix assert on n inline asm constraint llvm-svn: 310515	2017-08-09 20:09:35 +00:00
Dmitry Preobrazhensky	1e32550de6	[AMDGPU][MC][GFX9] Added 16-bit renamed and "_legacy" VALU opcodes See Bug 33629: https://bugs.llvm.org//show_bug.cgi?id=33629 Reviewers: vpykhtin, SamWot, arsenm Differential Revision: https://reviews.llvm.org/D36322 llvm-svn: 310497	2017-08-09 17:10:47 +00:00
Gabor Horvath	18bda5b0f2	Suppress a warning. NFC. llvm-svn: 310459	2017-08-09 10:38:53 +00:00
Eugene Zelenko	d8e3efd5c2	[AMDGPU] Revert r310429 changes in AMDKernelCodeT.h which broke some build bots. llvm-svn: 310430	2017-08-09 00:06:29 +00:00
Eugene Zelenko	d16eff816b	[AMDGPU] Fix some Clang-tidy modernize-use-using and Include What You Use warnings; other minor fixes (NFC). llvm-svn: 310429	2017-08-08 23:53:55 +00:00
Quentin Colombet	8dd90fb54b	Revert "[GlobalISel] Remove the GISelAccessor API." This reverts commit r310115. It causes a linker failure for the one of the unittests of AArch64 on one of the linux bot: http://lab.llvm.org:8011/builders/clang-ppc64le-linux-multistage/builds/3429 : && /home/fedora/gcc/install/gcc-7.1.0/bin/g++ -fPIC -fvisibility-inlines-hidden -Werror=date-time -std=c++11 -Wall -W -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wno-missing-field-initializers -pedantic -Wno-long-long -Wno-maybe-uninitialized -Wdelete-non-virtual-dtor -Wno-comment -ffunction-sections -fdata-sections -O2 -L/home/fedora/gcc/install/gcc-7.1.0/lib64 -Wl,-allow-shlib-undefined -Wl,-O3 -Wl,--gc-sections unittests/Target/AArch64/CMakeFiles/AArch64Tests.dir/InstSizes.cpp.o -o unittests/Target/AArch64/AArch64Tests lib/libLLVMAArch64CodeGen.so.6.0.0svn lib/libLLVMAArch64Desc.so.6.0.0svn lib/libLLVMAArch64Info.so.6.0.0svn lib/libLLVMCodeGen.so.6.0.0svn lib/libLLVMCore.so.6.0.0svn lib/libLLVMMC.so.6.0.0svn lib/libLLVMMIRParser.so.6.0.0svn lib/libLLVMSelectionDAG.so.6.0.0svn lib/libLLVMTarget.so.6.0.0svn lib/libLLVMSupport.so.6.0.0svn -lpthread lib/libgtest_main.so.6.0.0svn lib/libgtest.so.6.0.0svn -lpthread -Wl,-rpath,/home/buildbots/ppc64le-clang-multistage-test/clang-ppc64le-multistage/stage1/lib && : unittests/Target/AArch64/CMakeFiles/AArch64Tests.dir/InstSizes.cpp.o:(.toc+0x0): undefined reference to `vtable for llvm::LegalizerInfo' unittests/Target/AArch64/CMakeFiles/AArch64Tests.dir/InstSizes.cpp.o:(.toc+0x8): undefined reference to `vtable for llvm::RegisterBankInfo' The particularity of this bot is that it is built with BUILD_SHARED_LIBS=ON However, I was not able to reproduce the problem so far. Reverting to unblock the bot. llvm-svn: 310425	2017-08-08 22:22:30 +00:00
Connor Abbott	249fc7bd2a	[AMDGPU] Add llvm.amdgpu.update.dpp intrinsic Summary: Now that we've made all the necessary backend changes, we can add a new intrinsic which exposes the new capabilities to IR producers. Since llvm.amdgpu.update.dpp is a strict superset of llvm.amdgpu.mov.dpp, we should deprecate the former. We also add tests for all the functionality that was added in previous changes, now that we can access it via an IR construct. Reviewers: tstellar, arsenm Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye Differential Revision: https://reviews.llvm.org/D34718 llvm-svn: 310399	2017-08-08 18:52:22 +00:00
Tom Stellard	03aa3aee11	AMDGPU: Fix warnings introduced by r310336 llvm-svn: 310337	2017-08-08 05:52:00 +00:00
Tom Stellard	20287697f8	AMDGPU: Move R600 parts of AMDGPUISelDAGToDAG into their own class Summary: This refactoring is required in order to split the R600 and GCN tablegen files. Reviewers: arsenm Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, llvm-commits, t-tye Differential Revision: https://reviews.llvm.org/D36286 llvm-svn: 310336	2017-08-08 04:57:55 +00:00
Eugene Zelenko	59e128266c	[AMDGPU] Fix some Clang-tidy modernize-use-using and Include What You Use warnings; other minor fixes (NFC). llvm-svn: 310328	2017-08-08 00:47:13 +00:00
Matt Arsenault	bd57cea6e4	AMDGPU: Implement getMinimumNopSize llvm-svn: 310310	2017-08-07 22:00:58 +00:00
Connor Abbott	79f3ade51a	[AMDGPU] Add pseudo "old" source to all DPP instructions Summary: All instructions with the DPP modifier may not write to certain lanes of the output if bound_ctrl=1 is set or any bits in bank_mask or row_mask aren't set, so the destination register may be both defined and modified. The right way to handle this is to add a constraint that the destination register is the same as one of the inputs. We could tie the destination to the first source, but that would be too restrictive for some use-cases where we want the destination to be some other value before the instruction executes. Instead, add a fake "old" source and tie it to the destination. Effectively, the "old" source defines what value unwritten lanes will get. We'll expose this functionality to users with a new intrinsic later. Also, we want to use DPP instructions for computing derivatives, which means we need to set WQM for them. We also need to enable the entire wavefront when using DPP intrinsics to implement nonuniform subgroup reductions, since otherwise we'll get incorrect results in some cases. To accomodate this, add a new operand to all DPP instructions which will be interpreted by the SI WQM pass. This will be exposed with a new intrinsic later. We'll also add support for Whole Wavefront Mode later. I also fixed llvm.amdgcn.mov.dpp to overwrite the source and fixed up the test. However, I could also keep the old behavior (where lanes that aren't written are undefined) if people want it. Reviewers: tstellar, arsenm Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye Differential Revision: https://reviews.llvm.org/D34716 llvm-svn: 310283	2017-08-07 19:10:56 +00:00
Matt Arsenault	36b4b0bed7	AMDGPU: Remove -mcpu=SI Leftover from before amdgcn/r600 split. llvm-svn: 310277	2017-08-07 18:30:35 +00:00
Matt Arsenault	9d288e69f5	AMDGPU: Remove redundant opt level check addOptimizedRegAlloc isn't used for -O0 already. llvm-svn: 310275	2017-08-07 18:12:48 +00:00
Matt Arsenault	3db456820d	AMDGPU: Remove FixControlFlowLiveIntervals pass This hasn't done anything in a long time. This was running after the the control flow pseudos were expanded, so this would never find them. The control flow pseudo expansion was moved to solve the problem this pass was supposed to solve in the first place, except handling it earlier also fixes it for fast regalloc which doesn't use LiveIntervals. Noticed by checking LCOV reports. llvm-svn: 310274	2017-08-07 18:12:47 +00:00
Matt Arsenault	aac47c1c00	AMDGPU: Use a custom areInlineCompatible Fixes not inlining OpenCL library functions on AMDGPU, which don't have an explicitly set target-cpu. llvm-svn: 310269	2017-08-07 17:08:44 +00:00
Matt Arsenault	8728c5f2db	AMDGPU: Cleanup subtarget features Try to avoid mutually exclusive features. Don't use a real default GPU, and use a fake "generic". The goal is to make it easier to see which set of features are incompatible between feature strings. Most of the test changes are due to random scheduling changes from not having a default fullspeed model. llvm-svn: 310258	2017-08-07 14:58:04 +00:00
Dmitry Preobrazhensky	50805a0b83	[AMDGPU][MC] Corrected VOP3 version of v_interp_* instructions for VI See bug 32621: https://bugs.llvm.org//show_bug.cgi?id=32621 Reviewers: vpykhtin, SamWot, arsenm Differential Revision: https://reviews.llvm.org/D35902 llvm-svn: 310251	2017-08-07 13:14:12 +00:00
Matt Arsenault	a7eb14afc7	AMDGPU: Fix typo in feature description llvm-svn: 310217	2017-08-06 18:13:23 +00:00
Quentin Colombet	c046208c52	[GlobalISel] Remove the GISelAccessor API. Its sole purpose was to avoid spreading around ifdefs related to building global-isel. Since r309990, GlobalISel is not optional anymore, thus, we can get rid of this mechanism all together. NFC. llvm-svn: 310115	2017-08-04 20:15:46 +00:00
Connor Abbott	66b9bd6e50	[AMDGPU] Implement llvm.amdgcn.set.inactive intrinsic Summary: This intrinsic lets us set inactive lanes to an identity value when implementing wavefront reductions. In combination with Whole Wavefront Mode, it lets inactive lanes be skipped over as required by GLSL/Vulkan. Lowering the intrinsic needs to happen post-RA so that RA knows that the destination isn't completely overwritten due to the EXEC shenanigans, so we need another pseudo-instruction to represent the un-lowered intrinsic. Reviewers: tstellar, arsenm Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye Differential Revision: https://reviews.llvm.org/D34719 llvm-svn: 310088	2017-08-04 18:36:54 +00:00
Connor Abbott	92638ab625	[AMDGPU] Add support for Whole Wavefront Mode Summary: Whole Wavefront Wode (WWM) is similar to WQM, except that all of the lanes are always enabled, regardless of control flow. This is required for implementing wavefront reductions in non-uniform control flow, where we need to use the inactive lanes to propagate intermediate results, so they need to be enabled. We need to propagate WWM to uses (unless they're explicitly marked as exact) so that they also propagate intermediate results correctly. We do the analysis and exec mask munging during the WQM pass, since there are interactions with WQM for things that require both WQM and WWM. For simplicity, WWM is entirely block-local -- blocks are never WWM on entry or exit of a block, and WWM is not propagated to the block level. This means that computations involving WWM cannot involve control flow, but we only ever plan to use WWM for a few limited purposes (none of which involve control flow) anyways. Shaders can ask for WWM using the @llvm.amdgcn.wwm intrinsic. There isn't yet a way to turn WWM off -- that will be added in a future change. Finally, it turns out that turning on inactive lanes causes a number of problems with register allocation. While the best long-term solution seems like teaching LLVM's register allocator about predication, for now we need to add some hacks to prevent ourselves from getting into trouble due to constraints that aren't currently expressed in LLVM. For the gory details, see the comments at the top of SIFixWWMLiveness.cpp. Reviewers: arsenm, nhaehnle, tpr Subscribers: kzhuravl, wdng, mgorny, yaxunl, dstuttard, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D35524 llvm-svn: 310087	2017-08-04 18:36:52 +00:00
Connor Abbott	de068fe8b4	[AMDGPU] refactor WQM pass in preparation for WWM (NFCI) Summary: Right now, the WQM pass conflates two different things when tracking the Needs of an instruction: 1. Needs can be StateWQM, which is propagated to other instructions, and means that this instruction (and everything it depends on) must be calculated in WQM. 2. Needs can be StateExact, which is not propagated to other instructions, and means that this instruction must not be calculated in WQM and WQM-ness must not be propagated past this instruction. This works now because there are only two different states, but in the future we want to be able to express things like "calculate this in WQM, but please disable WWM and don't propagate it" (to implement @llvm.amdgcn.set.inactive). In order to do this, we need to split the per-instruction Needs field in two: a new Needs field, which can only contain StateWQM (and in the future, StateWWM) and is propagated to sources, and a Disables field, which can also contain just StateWQM or nothing for now. We keep the per-block tracking the same for now, by translating Needs/Disables to the old representation with only StateWQM or StateExact. The other place that needs special handling is when we emit the state transitions. We could just translate back to the old representation there as well, which we almost do, but instead of 0 as a placeholder value for "any state," we explicitly or together all the states an instruction is allowed to be in. This lets us refactor the code in preparation for WWM, where we'll need to be able to handle things like "this instruction must be in Exact or WQM, but not WWM." Reviewers: arsenm, nhaehnle, tpr Subscribers: kzhuravl, wdng, yaxunl, dstuttard, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D35523 llvm-svn: 310086	2017-08-04 18:36:50 +00:00
Connor Abbott	8c217d0a29	[AMDGPU] Add an llvm.amdgcn.wqm intrinsic for WQM Summary: Previously, we assumed that certain types of instructions needed WQM in pixel shaders, particularly DS instructions and image sampling instructions. This was ok because with OpenGL, the assumption was correct. But we want to start using DPP instructions for derivatives as well as other things, so the assumption that we can infer whether to use WQM based on the instruction won't continue to hold. This intrinsic lets frontends like Mesa indicate what things need WQM based on their knowledge of the API, rather than second-guessing them in the backend. We need to keep around the old method of enabling WQM, but eventually we should remove it once Mesa catches up. For now, this will let us use DPP instructions for computing derivatives correctly. Reviewers: arsenm, tpr, nhaehnle Subscribers: kzhuravl, wdng, yaxunl, dstuttard, llvm-commits, t-tye Differential Revision: https://reviews.llvm.org/D35167 llvm-svn: 310085	2017-08-04 18:36:49 +00:00
Dmitry Preobrazhensky	4b11a78a6e	[AMDGPU][MC] Enabled expressions as operands See bug 33579: https://bugs.llvm.org//show_bug.cgi?id=33579 Reviewers: vpykhtin, SamWot, arsenm Differential Revision: https://reviews.llvm.org/D36091 llvm-svn: 310059	2017-08-04 13:55:24 +00:00
Florian Gross	2feb105882	[AMDGPU] Fixed MSVC build break Error was: field of type 'llvm::ArgDescriptor' has private default constructor const AMDGPUFunctionArgInfo AMDGPUArgumentUsageInfo::ExternFunctionInfo{}; ^ llvm-svn: 310048	2017-08-04 10:53:07 +00:00

... 14 15 16 17 18 ...

3557 Commits