llvm-project

Commit Graph

Author	SHA1	Message	Date
Nicolai Haehnle	b48275f134	Add IntrWrite[Arg]Mem intrinsic property Summary: This property is used to mark an intrinsic that only writes to memory, but neither reads from memory nor has other side effects. An example where this is useful is the llvm.amdgcn.buffer.store.format.* intrinsic, which corresponds to a store instruction that goes through a special buffer descriptor rather than through a plain pointer. With this property, the intrinsic should still be handled as having side effects at the LLVM IR level, but machine scheduling can make smarter decisions. Reviewers: tstellarAMD, arsenm, joker.eph, reames Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D18291 llvm-svn: 266826	2016-04-19 21:58:33 +00:00
Nicolai Haehnle	e2dda4f750	AMDGPU: Guard VOPC instructions against incorrect commute Summary: The added testcase, which triggered this, was derived from a shader-db case via bugpoint. A separate question is why scalar branching wasn't used. Reviewers: arsenm, tstellarAMD Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D19208 llvm-svn: 266825	2016-04-19 21:58:22 +00:00
Nicolai Haehnle	7483937bf0	AMDGPU/SI: SGPR accounting in getSIProgramInfo must ignore exec_lo/hi Summary: A shader stored the live mask (initial exec mask) in an SGPR which was then spilled during register allocation. The allocator quite reasonably optimized turned the spill into v_writelane_b32 %vgpr, exec_lo, N v_writelane_b32 %vgpr, exec_hi, N+1 at the beginning of the shader, confusing the SGPR accounting. No test case, because si-sgpr-spill.ll together with an upcoming patch for WQM handling exhibits the problem. Reviewers: arsenm, tstellarAMD Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D19199 llvm-svn: 266824	2016-04-19 21:58:17 +00:00
Konstantin Zhuravlyov	8c273ad719	[AMDGPU] Add insert nops pass based on subtarget features instead of cl::opt Also, - Skip pass if machine module does not have debug info - Minor comment changes - Added test Differential Revision: http://reviews.llvm.org/D19079 llvm-svn: 266626	2016-04-18 16:28:23 +00:00
Artem Tamazov	e2762423c2	[AMDGPU][llvm-mc] s_setreg* - Fix order of operands Order should match the sp3 syntax, where destination (simm16 denoting the hwreg) is coming first. Differential Revision: http://reviews.llvm.org/D19161 llvm-svn: 266617	2016-04-18 14:54:26 +00:00
Aaron Ballman	2eeefe8ed8	Silence some "initialized but unused" warnings from MSVC -- the function being called is a static function, so there's no need for an instance variable. NFC. llvm-svn: 266616	2016-04-18 14:47:19 +00:00
Mehdi Amini	b550cb1750	[NFC] Header cleanup Removed some unused headers, replaced some headers with forward class declarations. Found using simple scripts like this one: clear && ack --cpp -l '#include "llvm/ADT/IndexedMap.h"' \| xargs grep -L 'IndexedMap[<]' \| xargs grep -n --color=auto 'IndexedMap' Patch by Eugene Kosov <claprix@yandex.ru> Differential Revision: http://reviews.llvm.org/D19219 From: Mehdi Amini <mehdi.amini@apple.com> llvm-svn: 266595	2016-04-18 09:17:29 +00:00
Matt Arsenault	c10783c42d	AMDGPU: Enable LocalStackSlotAllocation pass This resolves more frame indexes early and folds the immediate offsets into the scratch mubuf instructions. This cleans up a lot of the mess that's currently emitted, such as emitting add 0s and repeatedly initializing the same register to 0 when spilling. llvm-svn: 266508	2016-04-16 02:13:37 +00:00
Matt Arsenault	b6be202779	AMDGPU: Use s_addk_i32 / s_mulk_i32 llvm-svn: 266506	2016-04-16 01:46:49 +00:00
Jun Bum Lim	4c5bd58ebe	[MachineScheduler]Add support for store clustering Perform store clustering just like load clustering. This change add StoreClusterMutation in machine-scheduler. To control StoreClusterMutation, added enableClusterStores() in TargetInstrInfo.h. This is enabled only on AArch64 for now. This change also add support for unscaled stores which were not handled in getMemOpBaseRegImmOfs(). llvm-svn: 266437	2016-04-15 14:58:38 +00:00
Nicolai Haehnle	750082d1fe	AMDGPU/SI: Fix regression with no-return atomics Summary: In the added test-case, the atomic instruction feeds into a non-machine CopyToReg node which hasn't been selected yet, so guard against non-machine opcodes here. Reviewers: arsenm, tstellarAMD Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D19043 llvm-svn: 266433	2016-04-15 14:42:36 +00:00
Matt Arsenault	9c499c3a74	AMDGPU: Remove custom load/store scalarization llvm-svn: 266385	2016-04-14 23:31:26 +00:00
Matt Arsenault	fd8ab09c0e	AMDGPU: Include LDS size in printed comment llvm-svn: 266382	2016-04-14 22:11:51 +00:00
Matt Arsenault	3d1c1deb04	AMDGPU: Run SIFoldOperands after PeepholeOptimizer PeepholeOptimizer cleans up redundant copies, which makes the operand folding more effective. shader-db stats: Totals: SGPRS: 34200 -> 34336 (0.40 %) VGPRS: 22118 -> 21655 (-2.09 %) Code Size: 632144 -> 633460 (0.21 %) bytes LDS: 11 -> 11 (0.00 %) blocks Scratch: 10240 -> 11264 (10.00 %) bytes per wave Max Waves: 8822 -> 8918 (1.09 %) Wait states: 0 -> 0 (0.00 %) Totals from affected shaders: SGPRS: 7704 -> 7840 (1.77 %) VGPRS: 5169 -> 4706 (-8.96 %) Code Size: 234444 -> 235760 (0.56 %) bytes LDS: 2 -> 2 (0.00 %) blocks Scratch: 0 -> 1024 (0.00 %) bytes per wave Max Waves: 1188 -> 1284 (8.08 %) Wait states: 0 -> 0 (0.00 %) Increases: SGPRS: 35 (0.01 %) VGPRS: 1 (0.00 %) Code Size: 59 (0.02 %) LDS: 0 (0.00 %) Scratch: 1 (0.00 %) Max Waves: 48 (0.02 %) Wait states: 0 (0.00 %) Decreases: SGPRS: 26 (0.01 %) VGPRS: 54 (0.02 %) Code Size: 68 (0.03 %) LDS: 0 (0.00 %) Scratch: 0 (0.00 %) Max Waves: 4 (0.00 %) Wait states: 0 (0.00 %) llvm-svn: 266378	2016-04-14 21:58:24 +00:00
Matt Arsenault	4ac341c8b3	AMDGPU: Directly emit m0 initialization with s_mov_b32 Currently what comes out of instruction selection is a register initialized to -1, and then copied to m0. MachineCSE doesn't consider copies, but we want these to be CSEed. This isn't much of a problem currently, because SIFoldOperands is run immediately after. This avoids regressions when SIFoldOperands is run later from leaving all copies to m0. llvm-svn: 266377	2016-04-14 21:58:15 +00:00
Matt Arsenault	7900334dd5	AMDGPU: Fold bitcasts of scalar constants to vectors This cleans up some messes since the individual scalar components can be CSEed. llvm-svn: 266376	2016-04-14 21:58:07 +00:00
Tom Stellard	000c5af3e6	AMDGPU: Add skeleton GlobalIsel implementation Summary: This adds the necessary target code to be able to run the ir translator. Lowering function arguments and returns is a nop and there is no support for RegBankSelect. Reviewers: arsenm, qcolombet Subscribers: arsenm, joker.eph, vkalintiris, llvm-commits Differential Revision: http://reviews.llvm.org/D19077 llvm-svn: 266356	2016-04-14 19:09:28 +00:00
Nicolai Haehnle	05b127da06	[StructurizeCFG] Annotate branches that were treated as uniform Summary: This fully solves the problem where the StructurizeCFG pass does not consider the same branches as uniform as the SIAnnotateControlFlow pass. The patch in D19013 helps with this problem, but is not sufficient (and, interestingly, causes a "regression" with one of the existing test cases). No tests included here, because tests in D19013 already cover this. Reviewers: arsenm, tstellarAMD Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D19018 llvm-svn: 266346	2016-04-14 17:42:35 +00:00
Nicolai Haehnle	723b73b4eb	AMDGPU: Remove SIFixSGPRLiveRanges pass Summary: This pass is unnecessary and overly conservative. It was motivated by situations like def %vreg0:SGPR_32 ... if-block: .. def %vreg1:SGPR_32 ... else-block: ... use %vreg0:SGPR_32 ... and similar situations with uses after the non-uniform control flow, where we are not allowed to assign %vreg0 and %vreg1 to the same physical register, even though in the original, thread/workitem-based CFG, it looks like the live ranges of these registers do not overlap. However, by the time register allocation runs, we have moved to a wave-based CFG that accurately represents the fact that the wave may run through both the if- and the else-block. So the live ranges of %vreg0 and %vreg1 already overlap even without the SIFixSGPRLiveRanges pass. In addition to proving this change correct, I have tested it with Piglit and a small number of other tests. Reviewers: arsenm, tstellarAMD Subscribers: MatzeB, arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D19041 llvm-svn: 266345	2016-04-14 17:42:29 +00:00
Nicolai Haehnle	19f0f5177d	AMDGPU: change a redundant if () to an assert(). NFC Summary: I've been carrying this change around with me for a while, because the if () managed to confuse me while following the code. All callers ensure that the assertion holds. Reviewers: arsenm, tstellarAMD Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D19042 llvm-svn: 266344	2016-04-14 17:42:18 +00:00
Tom Stellard	79a1fd718c	AMDGPU: allow specifying a workgroup size that needs to fit in a compute unit Summary: For GL_ARB_compute_shader we need to support workgroup sizes of at least 1024. However, if we want to allow large workgroup sizes, we may need to use less registers, as we have to run more waves per SIMD. This patch adds an attribute to specify the maximum work group size the compiled program needs to support. It defaults, to 256, as that has no wave restrictions. Reducing the number of registers available is done similarly to how the registers were reserved for chips with the sgpr init bug. Reviewers: mareko, arsenm, tstellarAMD, nhaehnle Subscribers: FireBurn, kerberizer, llvm-commits, arsenm Differential Revision: http://reviews.llvm.org/D18340 Patch By: Bas Nieuwenhuizen llvm-svn: 266337	2016-04-14 16:27:07 +00:00
Tom Stellard	f110f8f9f7	AMDGPU/SI: Use the correct scratch wave offset register for shaders. Summary: The code previously always used s1 as it was using the user + system SGPR information for compute kernels. This is incorrect for Mesa shaders though, The register should be the next SGPR after all user and system SGPR's. We use that Mesa adds arguments for all input and system SGPR's and take the next available SGPR for the scratch wave offset register. Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Reviewers: mareko, arsenm, nhaehnle, tstellarAMD Subscribers: qcolombet, arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D18941 Patch By: Bas Nieuwenhuizen llvm-svn: 266336	2016-04-14 16:27:03 +00:00
Matt Arsenault	9cd90712f0	AMDGPU: Implement canonicalize Also add generic DAG node for it. llvm-svn: 266272	2016-04-14 01:42:16 +00:00
Tom Stellard	b951f10701	AMDGPU/SI: Add support for spilling VGPRs without having to scavenge registers Summary: When we are spilling SGPRs to scratch memory, we usually don't have free SGPRs to do the address calculation, so we need to re-use the ScratchOffset register for the calculation. Reviewers: arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D18917 llvm-svn: 266244	2016-04-13 20:44:16 +00:00
Artem Tamazov	eb4d5a9b0b	[AMDGPU][llvm-mc] Support of Trap Handler registers (TTMP0..11 and TBA/TMA)git status Tests added along with implemented feature. Note that there is a small leftover of unecessary MI sheduling issue (more info in the review). CodeGen/AMDGPU/salu-to-valu.ll updated to fix the false regression. TODO: Support for TTMP quads, comma-separated syntax in "[]" and more. Differential Revision: http://reviews.llvm.org/D17825 llvm-svn: 266205	2016-04-13 16:18:41 +00:00
Tom Stellard	703b2ec43f	AMDGPU/SI: Fix spilling of 96-bit registers Summary: It seems like this was broken in r252327. I thought we had test cases for this, but it's really hard to tirgger spills of this exact register size since they aren't used very much. Reviewers: arsenm, nhaehnle Subscribers: nhaehnle, arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D19021 llvm-svn: 266152	2016-04-12 23:57:30 +00:00
Nicolai Haehnle	df77c9ada4	AMDGPU: add llvm.amdgcn.buffer.load/store intrinsics Summary: They correspond to BUFFER_LOAD/STORE_DWORD[_X2,X3,X4] and mostly behave like llvm.amdgcn.buffer.load/store.format. They will be used by Mesa for SSBO and atomic counters at least when robust buffer access behavior is desired. (These instructions perform no format conversion and do buffer range checking per component.) As a side effect of sharing patterns with llvm.amdgcn.buffer.store.format, it has become trivial to add support for the f32 and v2f32 variants of that intrinsic, so the patch does so. Also DAG-ify (and fix) some tests that I noticed intermittent failures in while developing this patch. Some tests were (temporarily) adjusted for the required mayLoad/hasSideEffects changes to the BUFFER_STORE_DWORD* instructions. See also http://reviews.llvm.org/D18291. Reviewers: arsenm, tstellarAMD, mareko Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D18292 llvm-svn: 266126	2016-04-12 21:18:10 +00:00
Tom Stellard	ab1d3a9d50	AMDGPU/SI: Insert wait states required after v_readfirstlane on SI Summary: We will be able to handle this case much better once the hazard recognizer is finished, but this conservative implementation fixes a hang with the piglit test: spec/arb_arrays_of_arrays/execution/sampler/fs-nested-struct-arrays-nonconst-nested-arra Reviewers: arsenm, nhaehnle Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D18988 llvm-svn: 266105	2016-04-12 18:40:43 +00:00
Matt Arsenault	3b08238f78	AMDGPU: Eliminate half of i64 or if one operand is zero_extend from i32 This helps clean up some of the mess when expanding unaligned 64-bit loads when changed to be promote to v2i32, and fixes situations where or x, 0 was emitted after splitting 64-bit ors during moveToVALU. I think this could be a generic combine but I'm not sure. llvm-svn: 266104	2016-04-12 18:24:38 +00:00
Nicolai Haehnle	279970c0dc	AMDGPU/SI: Fix a mis-compilation of multi-level breaks Summary: Under certain circumstances, multi-level breaks (or what is understood by the control flow passes as such) could be miscompiled in a way that causes infinite loops, by emitting incorrect control flow intrinsics. This fixes a hang in dEQP-GLES3.functional.shaders.loops.while_dynamic_iterations.conditional_continue_vertex Reviewers: arsenm, tstellarAMD Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D18967 llvm-svn: 266088	2016-04-12 16:10:38 +00:00
Matt Arsenault	64fa2f4513	AMDGPU: Implement i64 global atomics llvm-svn: 266075	2016-04-12 14:05:11 +00:00
Matt Arsenault	a9dbdcae04	AMDGPU: Add atomic_inc + atomic_dec intrinsics These are different than atomicrmw add 1 because they have an additional input value to clamp the result. llvm-svn: 266074	2016-04-12 14:05:04 +00:00
Matt Arsenault	21ecfe43ba	AMDGPU: Remove trailing whitespace llvm-svn: 266073	2016-04-12 14:04:54 +00:00
Tom Stellard	0ffdf65eaa	Revert "AMDGPU/SI: Do not generate s_waitcnt after ds_permute/ds_bpermute" This reverts commit r263720. Just confirmed that s_waitcnt is required after ds_permute/ds_bpermute. llvm-svn: 265992	2016-04-11 20:38:40 +00:00
Jan Vesely	43b7b5b846	AMDGPU/SI: Implement atomic load/store for i32 and i64 Standard load/store instructions with GLC bit set. Reviewers: tstellardAMD, arsenm Differential Revision: http://reviews.llvm.org/D18760 llvm-svn: 265709	2016-04-07 19:23:11 +00:00
Tom Stellard	9112758077	AMDGPU/SI: Add latency for export instructions Reviewers: arsenm, nhaehnle Subscribers: nhaehnle, arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D18599 llvm-svn: 265708	2016-04-07 18:30:05 +00:00
Tom Stellard	d37630e461	AMDGPU/SI: Add MachineBasicBlock parameter to SIInstrInfo::insertWaitStates Summary: This makes it possible to insert nops at the end of blocks. Reviewers: arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D18549 llvm-svn: 265678	2016-04-07 14:47:07 +00:00
Valery Pykhtin	e23b6deb01	[AMDGPU] fix readlane/readfirstlane src vgpr operand type. For VGPR_32 operand disassembler expects a VGPR register encoded as 0..255 (enum8 src operand). readfirstlane/readline actually has enum9 operand and this change fixes VGPR_32 to VS_32 (enum9 encoding). Differential Revision: http://reviews.llvm.org/D18696 llvm-svn: 265670	2016-04-07 13:41:51 +00:00
Benjamin Kramer	4fb78518f1	Make helper functions static. NFC. llvm-svn: 265653	2016-04-07 10:10:09 +00:00
Nicolai Haehnle	df3a20cd80	AMDGPU: Add a shader calling convention This makes it possible to distinguish between mesa shaders and other kernels even in the presence of compute shaders. Patch By: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Differential Revision: http://reviews.llvm.org/D18559 llvm-svn: 265589	2016-04-06 19:40:20 +00:00
Sam Kolton	ff90c60a78	[AMDGPU] AsmParser: disable DPP for unsupported instructions. New dpp tests. Fix v_nop_dpp. Summary: 1. Disable DPP encoding for instructions that do not support it: - VOP1: - v_readfirstlane_b32 - v_clrexcp - v_movreld_b32 - v_movrels_b32 - v_movrelsd_b32 - VOP2: - v_madmk_f16/32 - v_madak_f16/32 - VOPC, VINTRP, VOP3 2. Fix DPP for v_nop 3. New DPP tests for VOP1 and VOP2 instructions Reviewers: nhaustov, tstellarAMD, vpykhtin Subscribers: tstellarAMD, arsenm Differential Revision: http://reviews.llvm.org/D18552 llvm-svn: 265538	2016-04-06 13:29:59 +00:00
Matthias Braun	7dc03f060e	RegisterScavenger: Take a reference as enterBasicBlock() argument. Make it obvious that the argument cannot be nullptr. Remove an unnecessary nullptr check in initRegState. llvm-svn: 265511	2016-04-06 02:47:09 +00:00
Konstantin Zhuravlyov	e63e02cb0c	[AMDGPU] Emit linkonce and linkonce_odr symbols Differential Revision: http://reviews.llvm.org/D18726 llvm-svn: 265408	2016-04-05 16:00:58 +00:00
Tom Stellard	354a43c7bc	AMDGPU: Implement {BUFFER,FLAT}_ATOMIC_CMPSWAP{,_X2} Summary: Implement BUFFER_ATOMIC_CMPSWAP{,_X2} instructions on all GCN targets, and FLAT_ATOMIC_CMPSWAP{,_X2} on CI+. 32-bit instruction variants tested manually on Kabini and Bonaire. Tests and parts of code provided by Jan Veselý. Patch by: Vedran Miletić Reviewers: arsenm, tstellarAMD, nhaehnle Subscribers: jvesely, scchan, kanarayan, arsenm Differential Revision: http://reviews.llvm.org/D17280 llvm-svn: 265170	2016-04-01 18:27:37 +00:00
Valery Pykhtin	5b3559c1ec	[AMDGPU] fix MADAK/MADMK instructions operand namings to match encoding fields. $vsrc1 -> $src1, $k -> $imm Differential Revision: http://reviews.llvm.org/D18659 llvm-svn: 265141	2016-04-01 13:13:12 +00:00
Sam Kolton	1048fb1818	[AMDGPU] Disassembler: support for DPP Review: http://reviews.llvm.org/D18642 llvm-svn: 265015	2016-03-31 14:15:04 +00:00
Matt Arsenault	2fe4fbc184	AMDGPU: Add frexp_exp intrinsic llvm-svn: 264944	2016-03-30 22:28:52 +00:00
Aaron Ballman	ef0fe1eed8	Silencing warnings from MSVC 2015 Update 2. All of these changes silence "C4334 '<<': result of 32-bit shift implicitly converted to 64 bits (was 64-bit shift intended?)". NFC. llvm-svn: 264929	2016-03-30 21:30:00 +00:00
Tom Stellard	1d5e6d4bdc	AMDGPU/SI: Improve MachineSchedModel definition This patch contains a few improvements to the model, including: - Using a single resource with a defined buffers size for each memory unit. - Setting the IssueWidth correctly. - Fixing latency values for memory instructions. shader-db stats: 16429 shaders in 3231 tests Totals: SGPRS: 318232 -> 312328 (-1.86 %) VGPRS: 208996 -> 209346 (0.17 %) Code Size: 7147044 -> 7166440 (0.27 %) bytes LDS: 83 -> 83 (0.00 %) blocks Scratch: 1862656 -> 1459200 (-21.66 %) bytes per wave Max Waves: 49182 -> 49243 (0.12 %) Wait states: 0 -> 0 (0.00 %)A Differential Revision: http://reviews.llvm.org/D18453 llvm-svn: 264877	2016-03-30 16:35:13 +00:00
Tom Stellard	0bc954e3bc	AMDGPU/SI: Enable lanemask tracking in misched Summary: This results in higher register usage, but should make it easier for the compiler to hide latency. This pass is a prerequisite for some more scheduler improvements, and I think the increase register usage with this patch is acceptable, because when combined with the scheduler improvements, the total register usage will decrease. shader-db stats: 2382 shaders in 478 tests Totals: SGPRS: 48672 -> 49088 (0.85 %) VGPRS: 34148 -> 34847 (2.05 %) Code Size: 1285816 -> 1289128 (0.26 %) bytes LDS: 28 -> 28 (0.00 %) blocks Scratch: 492544 -> 573440 (16.42 %) bytes per wave Max Waves: 6856 -> 6846 (-0.15 %) Wait states: 0 -> 0 (0.00 %) Depends on D18451 Reviewers: nhaehnle, arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D18452 llvm-svn: 264876	2016-03-30 16:35:09 +00:00
Konstantin Zhuravlyov	ecc7cbf611	Test commit access llvm-svn: 264736	2016-03-29 15:15:44 +00:00
Tom Stellard	a76bcc2ea1	AMDGPU/SI: Limit load clustering to 16 bytes instead of 4 instructions Summary: This helps prevent load clustering from drastically increasing register pressure by trying to cluster 4 SMRDx8 loads together. The limit of 16 bytes was chosen, because it seems like that was the original intent of setting the limit to 4 instructions, but more analysis could show that a different limit is better. This fixes yields small decreases in register usage with shader-db, but also helps avoid a large increase in register usage when lane mask tracking is enabled in the machine scheduler, because lane mask tracking enables more opportunities for load clustering. shader-db stats: 2379 shaders in 477 tests Totals: SGPRS: 49744 -> 48600 (-2.30 %) VGPRS: 34120 -> 34076 (-0.13 %) Code Size: 1282888 -> 1283184 (0.02 %) bytes LDS: 28 -> 28 (0.00 %) blocks Scratch: 495616 -> 492544 (-0.62 %) bytes per wave Max Waves: 6843 -> 6853 (0.15 %) Wait states: 0 -> 0 (0.00 %) Reviewers: nhaehnle, arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D18451 llvm-svn: 264589	2016-03-28 16:10:13 +00:00
Justin Bogner	f2a0d349a6	AMDGPU: Fix a use-after free and a missing break We're erasing MI here, but then immediately using it again inside the `if`. This moves the erase after we're done using it. Doing that reveals a second problem though - this case is missing a break, so we fall through to the default and dereference MI again. This is obviously a bug, though I don't know how to write a test that triggers it - all we do in the error case is print some extra debug output. Both of these issue crash on lots of tests under ASAN with the recycling allocator changes from PR26808 applied. llvm-svn: 264442	2016-03-25 18:33:16 +00:00
Matt Arsenault	8c8fcb2585	AMDGPU: Cost model for basic integer operations This resolves bug 21148 by preventing promotion to i64 induction variables. llvm-svn: 264376	2016-03-25 01:16:40 +00:00
Matt Arsenault	9651813ee0	AMDGPU: Partially implement getArithmeticInstrCost for FP ops llvm-svn: 264374	2016-03-25 01:00:32 +00:00
Matt Arsenault	59767cea79	AMDGPU: TTI: Make insertelement free. We don't want to have a cost to scalarizing operations. llvm-svn: 264364	2016-03-25 00:14:11 +00:00
Tom Stellard	9babad25e5	AMDGPU/SI: Add Polaris support Patch By: Sonny Jiang llvm-svn: 264295	2016-03-24 15:31:05 +00:00
Vasileios Kalintiris	b8a37205d2	Fix sequence point warning. NFC. llvm-svn: 264255	2016-03-24 10:53:28 +00:00
Matt Arsenault	30d37a74da	AMDGPU: Remove atomic inc/dec patterns There is no benefit to these since materializing the constant 1 requires the same number of instructions as materializing uint_max llvm-svn: 264215	2016-03-23 23:23:38 +00:00
Matt Arsenault	0a30e456b4	AMDGPU: Promote alloca should skip volatiles llvm-svn: 264214	2016-03-23 23:17:29 +00:00
Matt Arsenault	f43c2a0b49	AMDGPU: Insert moves of frame index to value operands Strengthen tests of storing frame indices. Right now this just creates irrelevant scheduling changes. We don't want to have multiple frame index operands on an instruction. There seem to be various assumptions that at least the same frame index will not appear twice in the LocalStackSlotAllocation pass. There's no reason to have this happen, and it just makes it easy to introduce bugs where the immediate offset is appplied to the storing instruction when it should really be applied to the value being stored as a separate add. This might not be sufficient. It might still be problematic to have an add fi, fi situation, but that's even less unlikely to happen in real code. llvm-svn: 264200	2016-03-23 21:49:25 +00:00
Valery Pykhtin	c0a77c5064	[AMDGPU] Fix missing assembler predicates. Differential Revision: http://reviews.llvm.org/D18351 llvm-svn: 264137	2016-03-23 04:27:26 +00:00
Tom Stellard	52ecd2d69b	AMDGPU: Cache information about register pressure sets We can statically decide whether or not a register pressure set is for SGPRs or VGPRs, so we don't need to re-compute this information in SIRegisterInfo::getRegPressureSetLimit(). Differential Revision: http://reviews.llvm.org/D14805 llvm-svn: 264126	2016-03-23 01:53:22 +00:00
Nicolai Haehnle	0a33abdfd2	AMDGPU: Fix dangling references introduced by r263982 Fixes Valgrind errors on the test cases that were reported as failing by buildbots. llvm-svn: 264000	2016-03-21 22:54:02 +00:00
Nicolai Haehnle	a56e6b6a53	AMDGPU: Coding style fixes I meant to add these before committing r263982 as per the review, but I forgot to squash. llvm-svn: 263983	2016-03-21 20:39:24 +00:00
Nicolai Haehnle	213e87f2ee	AMDGPU: Add SIWholeQuadMode pass Summary: Whole quad mode is already enabled for pixel shaders that compute derivatives, but it must be suspended for instructions that cause a shader to have side effects (i.e. stores and atomics). This pass addresses the issue by storing the real (initial) live mask in a register, masking EXEC before instructions that require exact execution and (re-)enabling WQM where required. This pass is run before register coalescing so that we can use machine SSA for analysis. The changes in this patch expose a problem with the second machine scheduling pass: target independent instructions like COPY implicitly use EXEC when they operate on VGPRs, but this fact is not encoded in the MIR. This can lead to miscompilation because instructions are moved past changes to EXEC. This patch fixes the problem by adding use-implicit operands to target independent instructions. Some general codegen passes are relaxed to work with such implicit use operands. Reviewers: arsenm, tstellarAMD, mareko Subscribers: MatzeB, arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D18162 llvm-svn: 263982	2016-03-21 20:28:33 +00:00
Tom Stellard	92339e888f	AMDGPU/SI: Fix threshold calculation for branching when exec is zero Summary: When control flow is implemented using the exec mask, the compiler will insert branch instructions to skip over the masked section when exec is zero if the section contains more than a certain number of instructions. The previous code would only count instructions in successor blocks, and this patch modifies the code to start counting instructions in all blocks between the start and end of the branch. Reviewers: nhaehnle, arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D18282 llvm-svn: 263969	2016-03-21 18:56:58 +00:00
Matt Arsenault	cb38a6bd35	AMDGPU: Remove SignBitIsZero for mubuf scratch offsets These instructions do not have the same negative base address problem that DS instructions do on SI. llvm-svn: 263964	2016-03-21 18:02:18 +00:00
Matt Arsenault	b96b57347a	AMDGPU: Add frexp_mant intrinsic llvm-svn: 263948	2016-03-21 16:11:05 +00:00
Nicolai Haehnle	fa771811b3	AMDGPU: add missing braces around multi-line if block This fixes an issue with rL263658 pointed out by Tom Stellard. llvm-svn: 263823	2016-03-18 20:32:04 +00:00
Nicolai Haehnle	95e8ffd398	AMDGPU: Overload return type of llvm.amdgcn.buffer.load.format Summary: Allow the selection of BUFFER_LOAD_FORMAT_x and _XY. Do this now before the frontend patches land in Mesa. Eventually, we may want to automatically reduce the size of loads at the LLVM IR level, which requires such overloads, and in some cases Mesa can generate them directly. Reviewers: tstellarAMD, arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D18255 llvm-svn: 263792	2016-03-18 16:24:40 +00:00
Nicolai Haehnle	ad63638f6d	AMDGPU/SI: Add llvm.amdgcn.buffer.atomic.* intrinsics Summary: These intrinsics expose the BUFFER_ATOMIC_* instructions and will be used by Mesa to implement atomics with buffer semantics. The intrinsic interface matches that of buffer.load.format and buffer.store.format, except that the GLC bit is not exposed (it is automatically deduced based on whether the return value is used). The change of hasSideEffects is required for TableGen to accept the pattern that matches the intrinsic. Reviewers: tstellarAMD, arsenm Subscribers: arsenm, rivanvx, llvm-commits Differential Revision: http://reviews.llvm.org/D18151 llvm-svn: 263791	2016-03-18 16:24:31 +00:00
Nicolai Haehnle	3003ba00a3	AMDGPU: use ComplexPattern for offsets in llvm.amdgcn.buffer.load/store.format Summary: We cannot easily deduce that an offset is in an SGPR, but the Mesa frontend cannot easily make use of an explicit soffset parameter either. Furthermore, it is likely that in the future, LLVM will be in a better position than the frontend to choose an SGPR offset if possible. Since there aren't any frontend uses of these intrinsics in upstream repositories yet, I would like to take this opportunity to change the intrinsic signatures to a single offset parameter, which is then selected to immediate offsets or voffsets using a ComplexPattern. Reviewers: arsenm, tstellarAMD, mareko Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D18218 llvm-svn: 263790	2016-03-18 16:24:20 +00:00
Sam Kolton	a74cd526e9	[AMDGPU] Assembler: Change dpp_ctrl syntax to match sp3 Review: http://reviews.llvm.org/D18267 llvm-svn: 263789	2016-03-18 15:35:51 +00:00
Changpeng Fang	234fcb81d3	AMDGPU/SI: Do not generate s_waitcnt after ds_permute/ds_bpermute Symmary: ds_permute/ds_bpermute do not read memory so s_waitcnt is not needed. Reviewers arsenm, tstellarAMD Subscribers llvm-commits, arsenm Differential Revision: http://reviews.llvm.org/D18197 llvm-svn: 263720	2016-03-17 16:43:50 +00:00
Nicolai Haehnle	79cad857a0	AMDGPU: mark atomic instructions as sources of divergence Summary: As explained by the comment, threads will typically see different values returned by atomic instructions even if the arguments are equal. Reviewers: arsenm, tstellarAMD Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D18156 llvm-svn: 263719	2016-03-17 16:21:59 +00:00
Nicolai Haehnle	ef160de3e5	AMDGPU: Prevent uniform loops from becoming infinite Summary: Uniform loops where the branch leaving the loop is predicated on VCCNZ must be skipped if EXEC = 0, otherwise they will be infinite. Reviewers: tstellarAMD, arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D18137 llvm-svn: 263658	2016-03-16 20:14:33 +00:00
Michel Danzer	302f83ac4e	AMDGPU: Verify instructions in non-debug builds as well And emit an error if it fails. This prevents illegal instructions from getting sent to the GPU, which would potentially result in a hang. This is a candidate for the stable branch(es). Reviewed-by: Marek Olšák <marek.olsak@amd.com> llvm-svn: 263627	2016-03-16 09:10:42 +00:00
Michel Danzer	beb79ceb19	AMDGPU/SI: Clean up indentation in SIInstrInfo::getDefaultRsrcDataFormat Reviewed-by: Tom Stellard <thomas.stellard@amd.com> llvm-svn: 263626	2016-03-16 09:10:35 +00:00
Changpeng Fang	01f6062227	AMDGPU/SI: Implement GroupStaticSize Intrinsic for Dynamic LDS Summary: Static LDS size is saved in MachineFunctionInfo::LDSSize, We define a pseudo instruction with usesCustomInserter bit set. Then, in EmitInstrWithCustomInserter, we replace this pseudo instruction with a mov of MachineFunctionInfo::LDSSize. Reviewers: arsenm tstellarAMD Subscribers llvm-commits, arsenm Differential Revision: http://reviews.llvm.org/D18064 llvm-svn: 263563	2016-03-15 17:28:44 +00:00
Sanjay Patel	5719584129	[DAG] use isUndef() ; NFCI llvm-svn: 263448	2016-03-14 17:28:46 +00:00
Tom Stellard	331f981cc9	AMDGPU/SI: Handle wait states required for DPP instructions Reviewers: arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D17543 llvm-svn: 263447	2016-03-14 17:05:56 +00:00
Marek Olsak	ed2213e6ef	AMDGPU/SI: Incomplete shader binaries need to finish execution at the end Reviewers: tstellarAMD, arsenm Subscribers: arsenm Differential Revision: http://reviews.llvm.org/D18058 llvm-svn: 263441	2016-03-14 15:57:14 +00:00
Nicolai Haehnle	74127fe8d7	AMDGPU: mark llvm.amdgcn.image.atomic.* as a source of divergence Summary: When multiple threads perform an atomic op with the same arguments, they will usually see different return values. Reviewers: arsenm, tstellarAMD Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D18101 llvm-svn: 263440	2016-03-14 15:37:18 +00:00
Nikolay Haustov	79af6b33e0	[AMDGPU] Assembler: SOP* instruction fixes s_bitset0_b64, s_bitset1_b64 has 32-bit src0, not 64-bit. s_rfe_b64 has just one destination operand and no source. Uncomment S_BITCMP* and S_SETVSKIP, adjust SOPC_* classes for that. Add s_memrealtime test and change comments in smem.s to follow common style. Change test for s_memtime to use non-zero register to make it really test encoding. Add tests for s_buffer_load*. Add tests for SOPC instructions (same for SI and VI) Differential Revision: http://reviews.llvm.org/D18040 llvm-svn: 263420	2016-03-14 11:17:19 +00:00
Valery Pykhtin	0f97f17152	[AMDGPU] AsmParser: Factor out parseRegister. NFC. llvm-svn: 263411	2016-03-14 07:43:42 +00:00
Valery Pykhtin	9e33c7f5d3	[AMDGPU] AsmParser: refactor post push_back vector access. NFC. llvm-svn: 263409	2016-03-14 05:25:44 +00:00
Valery Pykhtin	f91911c3ae	[AMDGPU] AsmParser: remove redundant isReg checks. NFC. llvm-svn: 263407	2016-03-14 05:01:45 +00:00
Valery Pykhtin	a7f480b4e9	[AMDGPU] Fix VOPC instruction operand namings Differential Revision: http://reviews.llvm.org/D17966 llvm-svn: 263242	2016-03-11 14:53:28 +00:00
Nikolay Haustov	6560781c4f	[AMDGPU] Assembler: change v_madmk operands to have same order as mad. The constant is now at source operand 1 (previously at 2). This is also how it is in legacy AMD sp3 assembler. Update tests. Differential Revision: http://reviews.llvm.org/D17984 llvm-svn: 263212	2016-03-11 09:27:25 +00:00
Matt Arsenault	bafc9dc591	AMDGPU: Don't use InstVisitor for AMDGPUPromoteAlloca Frontend authors are strongly encouraged to keep allocas in the entry block, so don't bother visiting every instruction in the other blocks of the function. llvm-svn: 263206	2016-03-11 08:20:50 +00:00
Matt Arsenault	6b6a2c37bc	AMDGPU: R600 code splitting cleanup Move a few functions only used by R600 to R600 specific code, fix header macros to stop using R600, mark classes as final. llvm-svn: 263204	2016-03-11 08:00:27 +00:00
Matt Arsenault	9a19c240c0	AMDGPU: Materialize sign bits with bfrev If a constant is the same as the reverse of an inline immediate, this is 4 bytes smaller than having to embed a 32-bit literal. llvm-svn: 263201	2016-03-11 07:42:49 +00:00
Nicolai Haehnle	b142770bfe	AMDGPU/SI: add llvm.amdgcn.buffer.load/store.format intrinsics Summary: They correspond to BUFFER_LOAD/STORE_FORMAT_XYZW and will be used by Mesa to implement the GL_ARB_shader_image_load_store extension. The intention is that for llvm.amdgcn.buffer.load.format, LLVM will decide whether one of the _X/_XY/_XYZ opcodes can be used (similar to image sampling and loads). However, this is not currently implemented. For llvm.amdgcn.buffer.store, LLVM cannot decide to use one of the "smaller" opcodes and therefore the intrinsic is overloaded. Currently, only the v4f32 is actually implemented since GLSL also only has a vec4 variant of the store instructions, although it's conceivable that Mesa will want to be smarter about this in the future. BUFFER_LOAD_FORMAT_XYZW is already exposed via llvm.SI.vs.load.input, which has a legacy name, pretends not to access memory, and does not capture the full flexibility of the instruction. Reviewers: arsenm, tstellarAMD, mareko Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D17277 llvm-svn: 263140	2016-03-10 18:43:50 +00:00
Changpeng Fang	278a5b31a5	AMDGPU/SI: Define S_GETREG Intrinsic Summary: Define s_getreg intrinsic to generate s_getreg instruction to read hardware registers. Reviewers: tstellarAMD, arsenm Subscribers: llvm-commits, arsenm Differential Revision: http://reviews.llvm.org/D17892 llvm-svn: 263124	2016-03-10 16:47:15 +00:00
Valery Pykhtin	a4db224d54	[AMDGPU] Fix SMEM instructions encoding/operand namings Differential Revision: http://reviews.llvm.org/D17651 llvm-svn: 263108	2016-03-10 13:06:08 +00:00
Chad Rosier	c27a18f39f	[TII] Allow getMemOpBaseRegImmOfs() to accept negative offsets. NFC. http://reviews.llvm.org/D17967 llvm-svn: 263021	2016-03-09 16:00:35 +00:00
Teresa Johnson	e50b23c67f	Fix build error due to unsigned compare >= 0 in r263008 (NFC) Fixes error from building with clang: /usr/local/google/home/tejohnson/llvm/llvm_15/lib/Target/AMDGPU/InstPrinter/AMDGPUInstPrinter.cpp:407:12: error: comparison of unsigned expression >= 0 is always true [-Werror,-Wtautological-compare] if ((Imm >= 0x000) && (Imm <= 0x0ff)) { ~~~ ^ ~~~~~ llvm-svn: 263014	2016-03-09 14:58:23 +00:00
Sam Kolton	dfa29f7c5b	[AMDGPU] Assembler: Support DPP instructions. Supprot DPP syntax as used in SP3 (except several operands syntax). Added dpp-specific operands in td-files. Added DPP flag to TSFlags to determine if instruction is dpp in InstPrinter. Support for VOP2 DPP instructions in td-files. Some tests for DPP instructions. ToDo: - VOP2bInst: - vcc is considered as operand - AsmMatcher doesn't apply mnemonic aliases when parsing operands - v_mac_f32 - v_nop - disable instructions with 64-bit operands - change dpp_ctrl assembler representation to conform sp3 Review: http://reviews.llvm.org/D17804 llvm-svn: 263008	2016-03-09 12:29:31 +00:00
Nikolay Haustov	9b7577ed22	[AMDGPU] Assembler: Support abs() syntax. Support legacy SP3 abs(v1) syntax. InstPrinter still uses \|v1\|. Add tests. Differential Revision: http://reviews.llvm.org/D17887 llvm-svn: 263006	2016-03-09 11:03:21 +00:00

1 2 3 4 5 ...

683 Commits