llvm-project

Commit Graph

Author	SHA1	Message	Date
Davide Italiano	6fdfede10d	[AMDGPU] Throw away more dead code. NFCI. llvm-svn: 308055	2017-07-14 21:20:29 +00:00
Davide Italiano	502ac724ac	[AMDGPU] Garbage collect dead code. NFCI. Unbreaks the build with GCC7. llvm-svn: 308047	2017-07-14 18:47:29 +00:00
Alfred Huang	5b27072f57	[AMDGPU] Do not insert an instruction into worklist twice in movetovalu In moveToVALU(), move to vector ALU is performed, all instrs in the use chain will be visited. We do not want the same node to be pushed to the visit worklist more than once. Differential Revision: https://reviews.llvm.org/D34726 llvm-svn: 308039	2017-07-14 17:56:55 +00:00
Matt Arsenault	23e4df6a59	AMDGPU: Detect kernarg segment pointer This is necessary to pass the kernarg segment pointer to callee functions. Also don't unconditionally enable for kernels. llvm-svn: 307978	2017-07-14 00:11:13 +00:00
Stanislav Mekhanoshin	dc2890a887	[AMDGPU] fcaninicalize optimization for GFX9+ Since GFX9 supports denorm modes for v_min_f32/v_max_f32 that is possible to further optimize fcanonicalize and remove it if applied to min/max given their operands are known not to be an sNaN or that sNaNs are not supported. Additionally we can remove fcanonicalize if denorms are supported for the VT and we know that its argument is never a NaN. Differential Revision: https://reviews.llvm.org/D35335 llvm-svn: 307976	2017-07-13 23:59:15 +00:00
Matt Arsenault	6b93046f29	AMDGPU: Annotate call graph with used features Previously this wouldn't detect used features indirectly used in callee functions. llvm-svn: 307967	2017-07-13 21:43:42 +00:00
Hiroshi Inoue	e9dea6e613	fix typos in comments and error messges; NFC llvm-svn: 307885	2017-07-13 06:48:39 +00:00
Matt Arsenault	ce34ac588e	AMDGPU: Fix converting unanalyzable global loads to SMRD Not all memory dependence queries succeed, so this needs to be conservative if it fails. llvm-svn: 307861	2017-07-12 23:06:18 +00:00
Stanislav Mekhanoshin	5680b0ca9f	[AMDGPU] fcanonicalize elimination optimization We are using multiplication by 1.0 to flush denormals and quiet sNaNs. That is possible to omit this multiplication if source of the fcanonicalize instruction is known to be flushed/quieted, i.e. if it comes from another instruction known to do the normalization and we are using IEEE mode to quiet sNaNs. Differential Revision: https://reviews.llvm.org/D35218 llvm-svn: 307848	2017-07-12 21:20:28 +00:00
Rafael Espindola	1beb702ba2	Fully fix the movw/movt addend. The issue is not if the value is pcrel. It is whether we have a relocation or not. If we have a relocation, the static linker will select the upper bits. If we don't have a relocation, we have to do it. llvm-svn: 307730	2017-07-11 23:18:25 +00:00
Evandro Menezes	0cd23f5642	[CodeGen] Rename DEBUG_TYPE to match passnames Rename missing DEBUG_TYPE "machine-scheduler" from backend files, which were absent from https://reviews.llvm.org/rL303921. Differential revision: https://reviews.llvm.org/D35231 llvm-svn: 307719	2017-07-11 22:08:28 +00:00
Konstantin Zhuravlyov	94b3b47c73	Revert "AMDGPU: Do not test for SI in getIsaVersion" This reverts commit r307573. This breaks downstream test. llvm-svn: 307678	2017-07-11 17:57:41 +00:00
Nirav Dave	4dcad5dc6b	Add DAG argument to canMergeStoresTo NFC. llvm-svn: 307583	2017-07-10 20:25:54 +00:00
Matt Arsenault	9cff06f37b	AMDGPU: Allow SIShrinkInstructions to fold FrameIndexes llvm-svn: 307576	2017-07-10 20:04:35 +00:00
Matt Arsenault	6c29c5acfe	AMDGPU: Allow SIShrinkInstructions to work in non-SSA Immediates can be folded as long as the immediate is a vreg. Also undo commuting instructions if it didn't fold an immediate. llvm-svn: 307575	2017-07-10 19:53:57 +00:00
Matt Arsenault	fda5318204	AMDGPU: Remove unnecessary check for constant operands An instruction that has an immediate operand can't reach this point. This is only called for a freshly shrunk instruction, which prevously couldn't have had a literal constant operand. This was also not conservative enough since it woudl also have had to filter other constant-like inputs like frame indexes. llvm-svn: 307574	2017-07-10 19:33:38 +00:00
Konstantin Zhuravlyov	a46241909a	AMDGPU: Do not test for SI in getIsaVersion SI is being tested by isa version in the first two if statements of the function. llvm-svn: 307573	2017-07-10 19:24:05 +00:00
Simon Pilgrim	d362d27c27	[AMDGPU] Fix -Wimplicit-fallthrough warning. NFCI. llvm-svn: 307485	2017-07-08 19:50:03 +00:00
Simon Pilgrim	cb07d67a5c	Fix some more -Wimplicit-fallthrough warnings. NFCI. llvm-svn: 307411	2017-07-07 16:40:06 +00:00
Sam Kolton	10ac2fd2eb	[AMDGPU] Assembler: refactor convert methods (VOP3 and MIMG) Summary: Simplified converter methods for VOP3 and MIMG. Reviewers: dp, artem.tamazov Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, vpykhtin, t-tye Differential Revision: https://reviews.llvm.org/D35047 llvm-svn: 307407	2017-07-07 15:21:52 +00:00
Dmitry Preobrazhensky	b2d24e23ce	[AMDGPU][mc][gfx9] Added support of op_sel/op_sel_hi for V_MAD_MIX* See https://bugs.llvm.org//show_bug.cgi?id=33595 Reviewers: vpykhtin, artem.tamazov, arsenm Differential Revision: https://reviews.llvm.org/D35021 llvm-svn: 307402	2017-07-07 14:29:06 +00:00
Simon Pilgrim	0f5b35059d	[AMDGPU] Fix -Wimplicit-fallthrough warnings. NFCI. llvm-svn: 307381	2017-07-07 10:18:57 +00:00
Sean Fertile	9cd1cdf814	Extend memcpy expansion in Transform/Utils to handle wider operand types. Adds loop expansions for known-size and unknown-sized memcpy calls, allowing the target to provide the operand types through TTI callbacks. The default values for the TTI callbacks use int8 operand types and matches the existing behaviour if they aren't overridden by the target. Differential revision: https://reviews.llvm.org/D32536 llvm-svn: 307346	2017-07-07 02:00:06 +00:00
Matt Arsenault	9aa45f047f	AMDGPU: Add macro fusion schedule DAG mutation Try to increase opportunities to shrink vcc uses. llvm-svn: 307313	2017-07-06 20:57:05 +00:00
Matt Arsenault	a81198d82d	AMDGPU: Minor cleanup of shrinking logic llvm-svn: 307312	2017-07-06 20:56:59 +00:00
Stanislav Mekhanoshin	9d7b1c9ddb	[AMDGPU] Always use rcp + mul with fast math Regardless of relaxation options such as -cl-fast-relaxed-math we are producing rather long code for fdiv via amdgcn_fdiv_fast intrinsic. This intrinsic is used to replace fdiv with 2.5ulp metadata and does not handle denormals, thus believed to be fast. An fdiv instruction can also have fast math flag either by itself or together with fpmath metadata. Clang used with a relaxation flag always produces both metadata and fast flag: %div = fdiv fast float %v, %0, !fpmath !12 !12 = !{float 2.500000e+00} Current implementation ignores fast flag and favors metadata. An instruction with just fast flag would be lowered to a fastest rcp + mul, but that never happen on practice because of described mutual clang and BE behavior. This change allows an "fdiv fast" to be always lowered as rcp + mul. Differential Revision: https://reviews.llvm.org/D34844 llvm-svn: 307308	2017-07-06 20:34:21 +00:00
Craig Topper	79ab643da8	[Constants] If we already have a ConstantInt*, prefer to use isZero/isOne/isMinusOne instead of isNullValue/isOneValue/isAllOnesValue inherited from Constant. NFCI Going through the Constant methods requires redetermining that the Constant is a ConstantInt and then calling isZero/isOne/isMinusOne. llvm-svn: 307292	2017-07-06 18:39:47 +00:00
Quentin Colombet	f3f7d4d64b	[AMDGPU] Move GISel accessor initialization from TargetMachine to Subtarget. NFC llvm-svn: 307186	2017-07-05 18:40:56 +00:00
Alexander Timofeev	982aee6a38	[AMDGPU] Switch scalarize global loads ON by default Differential revision: https://reviews.llvm.org/D34407 llvm-svn: 307097	2017-07-04 17:32:00 +00:00
Marek Olsak	b83f5c99ba	[AMDGPU] Fix latency of MIMG instructions Patch by cwabbott (Connor Abbott). llvm-svn: 307081	2017-07-04 14:43:38 +00:00
NAKAMURA Takumi	e4a741376b	Revert r307026, "[AMDGPU] Switch scalarize global loads ON by default" It broke a testcase. Failing Tests (1): LLVM :: CodeGen/AMDGPU/alignbit-pat.ll llvm-svn: 307054	2017-07-04 02:14:18 +00:00
Alexander Timofeev	ea7f08bee5	[AMDGPU] Switch scalarize global loads ON by default Differential revision: https://reviews.llvm.org/D34407 llvm-svn: 307026	2017-07-03 14:54:11 +00:00
Matt Arsenault	3f031e75aa	AMDGPU: Add operand target flags serialization llvm-svn: 306995	2017-07-02 23:21:48 +00:00
Hiroshi Inoue	bb703e8960	fix trivial typos; NFC suport -> support llvm-svn: 306968	2017-07-02 03:24:54 +00:00
Matt Arsenault	7c525903ef	AMDGPU: Remove SITypeRewriter This was an old workaround for using v16i8 in some old intrinsics for resource descriptors. llvm-svn: 306603	2017-06-28 21:38:50 +00:00
Geoff Berry	66d9bdbca8	[LoopUnroll] Pass SCEV to getUnrollingPreferences hook. NFCI. Reviewers: sanjoy, anna, reames, apilipenko, igor-laevsky, mkuper Subscribers: jholewinski, arsenm, mzolotukhin, nemanjai, nhaehnle, javed.absar, mcrosier, llvm-commits Differential Revision: https://reviews.llvm.org/D34531 llvm-svn: 306554	2017-06-28 15:53:17 +00:00
Stanislav Mekhanoshin	d445455643	[AMDGPU] Add pattern for v_alignbit_b32 with immediate If immediate in shift is less than 32 we can use alignbit too. Differential Revision: https://reviews.llvm.org/D34729 llvm-svn: 306500	2017-06-28 02:52:39 +00:00
Stanislav Mekhanoshin	e8bf6c9629	[AMDGPU] Add 2 new alignbit patterns Differential Revision: https://reviews.llvm.org/D34655 llvm-svn: 306449	2017-06-27 19:10:47 +00:00
Stanislav Mekhanoshin	c9bd53ab59	[AMDGPU] Simplify setcc (sext from i1 b), -1\|0, cc Depending on the compare code that can be either an argument of sext or negate of it. This helps to avoid v_cndmask_b64 instruction for sext. A reversed value can be further simplified and folded into its parent comparison if possible. Differential Revision: https://reviews.llvm.org/D34545 llvm-svn: 306446	2017-06-27 18:53:03 +00:00
Stanislav Mekhanoshin	6851ddf942	[AMDGPU] Combine and x, (sext cc from i1) => select cc, x, 0 Also factored out function to check if a boolean is an already deserialized value which does not require v_cndmask_b32 to be loaded. Added binary logical operators to its check. Differential Revision: https://reviews.llvm.org/D34500 llvm-svn: 306439	2017-06-27 18:25:26 +00:00
Sam Kolton	a179d25b99	[AMDGPU] SDWA: several fixes for V_CVT and VOPC instructions Summary: 1. Instruction V_CVT_U32_F32 allow omod operand (see SIInstrInfo.td:1435). In fact this operand shouldn't be allowed here. This fix checks if SDWA pseudo instruction has OMod operand and then copy it. 2. There were several problems with support of VOPC instructions in SDWA peephole pass. Reviewers: tstellar, arsenm, vpykhtin, airlied, kzhuravl Subscribers: wdng, nhaehnle, yaxunl, dstuttard, tpr, sarnex, t-tye Differential Revision: https://reviews.llvm.org/D34626 llvm-svn: 306413	2017-06-27 15:02:23 +00:00
Hiroshi Inoue	6a391bbf40	fix trivial typos, NFC succesor -> successor llvm-svn: 306393	2017-06-27 10:35:37 +00:00
Nicolai Haehnle	43cc6c4e0f	AMDGPU: M0 operands to spill/restore opcodes are dead Summary: With scalar stores, M0 is clobbered and therefore marked as implicitly defined. However, it is also dead. This fixes an assertion when the Greedy Register Allocator decides to optimize a spill/restore pair away again (via tryHintsRecoloring). Reviewers: arsenm Subscribers: qcolombet, kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D33319 llvm-svn: 306375	2017-06-27 08:04:13 +00:00
Matt Arsenault	f28683cf51	AMDGPU: Setup SP/FP in callee function prolog/epilog llvm-svn: 306312	2017-06-26 17:53:59 +00:00
Tom Stellard	eb8f1e27d9	AMDGPU/GlobalISel: Mark 32-bit G_SHL as legal Reviewers: arsenm Reviewed By: arsenm Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, rovka, kristof.beyls, igorb, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D34589 llvm-svn: 306298	2017-06-26 15:56:52 +00:00
Matt Arsenault	8bcf2f20a7	AMDGPU: Whitespace fixes llvm-svn: 306265	2017-06-26 03:01:36 +00:00
Matt Arsenault	10fc062b2b	AMDGPU: Partially fix implicit.buffer.ptr intrinsic handling This should not be treated as a different version of private_segment_buffer. These are distinct things with different uses and register classes, and requires the function argument info to have more context about the function's type and environment. Also add missing test coverage for the intrinsic, and emit an error for HSA. This also encovers that the intrinsic is broken unless there happen to be stack objects. llvm-svn: 306264	2017-06-26 03:01:31 +00:00
Rafael Espindola	daaee7151b	Remove a processFixupValue hack. The intention of processFixupValue is not to redefine the semantics of MCExpr. It is odd enough that a expression lowers to a PCRel MCExpr or not depending on what it looks like. At least it is a local hack now. I left a fix for anyone trying to figure out what producers should be producing a different expression. llvm-svn: 306200	2017-06-24 05:12:29 +00:00
Rafael Espindola	f351292141	Remove redundant argument. llvm-svn: 306189	2017-06-24 00:26:57 +00:00
Rafael Espindola	86c664f9d7	Move Value adjustment to applyFixup. NFC. llvm-svn: 306178	2017-06-23 23:05:15 +00:00
Rafael Espindola	801b42de31	ARM: move some logic from processFixupValue to applyFixup. processFixupValue is called on every relaxation iteration. applyFixup is only called once at the very end. applyFixup is then the correct place to do last minute changes and value checks. While here, do proper range checks again for fixup_arm_thumb_bl. We used to do it, but dropped because of thumb2. We now do it again, but use the thumb2 range. llvm-svn: 306177	2017-06-23 22:52:36 +00:00
Tom Stellard	af552dc352	AMDGPU/GlobalISel: Mark 32-bit G_AND as legal Reviewers: arsenm Reviewed By: arsenm Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, rovka, kristof.beyls, igorb, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D34349 llvm-svn: 306112	2017-06-23 15:17:17 +00:00
David Stuttard	f677966e2e	[AMDGPU] Add intrinsics for tbuffer load and store - build error fix Variable was unused in non-debug build (used in assert) causing compile time warning and eventual build failure llvm-svn: 306034	2017-06-22 17:15:49 +00:00
David Stuttard	70e8bc1bf3	[AMDGPU] Add intrinsics for tbuffer load and store Intrinsic already existed for llvm.SI.tbuffer.store Needed tbuffer.load and also re-implementing the intrinsic as llvm.amdgcn.tbuffer.* Added CodeGen tests for the 2 new variants added. Left the original llvm.SI.tbuffer.store implementation to avoid issues with existing code Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, tony-tye, tpr Differential Revision: https://reviews.llvm.org/D30687 llvm-svn: 306031	2017-06-22 16:29:22 +00:00
Sam Kolton	ca5a30ed74	[AMDGPU] SDWA: remove support for VOP2 instructions that have only 64-bit encoding Summary: Despite that this instructions are listed in VOP2, they are treated as VOP3 in specs. They should not support SDWA. There are no real instructions for them, but there are pseudo instructions. Reviewers: arsenm, vpykhtin, cfang Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye Differential Revision: https://reviews.llvm.org/D34403 llvm-svn: 305999	2017-06-22 12:42:14 +00:00
Sam Kolton	3c4933fcc6	[AMDGPU] SDWA: add support for GFX9 in peephole pass Summary: Added support based on merged SDWA pseudo instructions. Now peephole allow one scalar operand, omod and clamp modifiers. Added several subtarget features for GFX9 SDWA. This diff also contains changes from D34026. Depends D34026 Reviewers: vpykhtin, rampitec, arsenm Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye Differential Revision: https://reviews.llvm.org/D34241 llvm-svn: 305986	2017-06-22 06:26:41 +00:00
Stanislav Mekhanoshin	3ed38c601a	[AMDGPU] Add FP_CLASS to the add/setcc combine This is one of the nodes which also compile as v_cmp_*. Differential Revision: https://reviews.llvm.org/D34485 llvm-svn: 305970	2017-06-21 23:46:22 +00:00
Rafael Espindola	88d9e37ec8	Use a MutableArrayRef. NFC. llvm-svn: 305968	2017-06-21 23:06:53 +00:00
Stanislav Mekhanoshin	a8b26936d0	[AMDGPU] Combine add and adde, sub and sube If one of the arguments of adde/sube is zero we can fold another add/sub into it. Differential Revision: https://reviews.llvm.org/D34374 llvm-svn: 305964	2017-06-21 22:30:01 +00:00
Stanislav Mekhanoshin	e3eb42cef6	[AMDGPU] simplify add x, *ext (setcc) => addc\|subb x, 0, setcc This simplification allows to avoid generating v_cndmask_b32 to serialize condition code between compare and use. Differential Revision: https://reviews.llvm.org/D34300 llvm-svn: 305962	2017-06-21 22:05:06 +00:00
Dmitry Preobrazhensky	851a3d9f05	[AMDGPU][MC][GFX9] Corrected VOP3P relevant code to fix disassembler failures See Bug 33509: https://bugs.llvm.org//show_bug.cgi?id=33509 Reviewers: Sam Kolton, Artem Tamazov, Valery Pykhtin Differential Revision: https://reviews.llvm.org/D34360 llvm-svn: 305923	2017-06-21 16:00:54 +00:00
Dmitry Preobrazhensky	dc4ac823ec	[AMDGPU][MC] Corrected V_QSAD instructions to check that dest register is different than any of the src See Bug 33279: https://bugs.llvm.org//show_bug.cgi?id=33279 Reviewers: artem.tamazov, vpykhtin Differential Revision: https://reviews.llvm.org/D34003 llvm-svn: 305915	2017-06-21 14:41:34 +00:00
Sam Kolton	549c89d2c9	[AMDGPU] SDWA: merge VI and GFX9 pseudo instructions Summary: Previously there were two separate pseudo instruction for SDWA on VI and on GFX9. Created one pseudo instruction that is union of both of them. Added verifier to check that operands conform either VI or GFX9. Reviewers: dp, arsenm, vpykhtin Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, artem.tamazov Differential Revision: https://reviews.llvm.org/D34026 llvm-svn: 305886	2017-06-21 08:53:38 +00:00
Matt Arsenault	67cd347e93	AMDGPU: Allow vectorization of packed types llvm-svn: 305844	2017-06-20 20:38:06 +00:00
Stanislav Mekhanoshin	a9d846c6ef	[AMDGPU] Fix illegal shrink of V_SUBB_U32 and V_ADDC_U32 If there is an immediate operand we shall not shrink V_SUBB_U32 and V_ADDC_U32, it does not fit e32 encoding. Differential Revison: https://reviews.llvm.org/D34291 llvm-svn: 305840	2017-06-20 20:33:44 +00:00
Matt Arsenault	9698f1c862	AMDGPU: Start adding global_* instructions llvm-svn: 305838	2017-06-20 19:54:14 +00:00
Matt Arsenault	ff3f912e74	AMDGPU: Do operand folding in program order Before it was possible to partially fold use instructions before the defs. After the xor is folded into a copy, the same mov can end up in the fold list twice, so on the second attempt it will fail expecting to see a register to fold. llvm-svn: 305821	2017-06-20 18:56:32 +00:00
Matt Arsenault	76858f5a1d	AMDGPU: Preserve undef when folding register operands If the source was a copy of an undef register, this would produce a read of an undefined register which is a verifier error. llvm-svn: 305816	2017-06-20 18:41:31 +00:00
Stanislav Mekhanoshin	465a1ff193	[AMDGPU] Eliminate SGPR to VGPR copy when possible SGPRs are generally cheaper, so try to use them over VGPRs. Differential Revision: https://reviews.llvm.org/D34130 llvm-svn: 305815	2017-06-20 18:32:42 +00:00
Matt Arsenault	7f67b35901	AMDGPU: Fix crash with undef vreg input operand llvm-svn: 305814	2017-06-20 18:28:02 +00:00
Matt Arsenault	c595185f8f	AMDGPU: Fix scratch wave offset relative FI expansion The offset may not be an inline immediate, so this needs to be materialized into a register. The post-RA run of SIShrinkInstructions is able to fold it later if it can. llvm-svn: 305761	2017-06-19 23:47:21 +00:00
Stanislav Mekhanoshin	50c2f251f5	[AMDGPU] Add infer address spaces pass before SROA It adds it for the target after inlining but before SROA where we can get most out of it. Differential Revision: https://reviews.llvm.org/D34366 llvm-svn: 305759	2017-06-19 23:17:36 +00:00
Matt Arsenault	e0e68a757e	AMDGPU: Cleanup CreateLiveInRegister llvm-svn: 305748	2017-06-19 21:52:45 +00:00
Tom Stellard	ff63ee0db5	AMDGPU/GlobalISel: Mark G_BITCAST s32 <--> <2 x s16> legal Reviewers: arsenm Reviewed By: arsenm Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, rovka, kristof.beyls, igorb, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D34129 llvm-svn: 305692	2017-06-19 13:15:45 +00:00
Alfred Huang	f9b521fdaf	[AMDGPU] Testing commit access only, no real change llvm-svn: 305523	2017-06-15 23:02:55 +00:00
Alexander Timofeev	0f9c84cd93	DivergencyAnalysis patch for review llvm-svn: 305494	2017-06-15 19:33:10 +00:00
Davide Italiano	36559b2527	[AMDGPU] Remove now dead defaultOffsetS13(). NFCI. Fixes the GCC7 build with -Werror. llvm-svn: 305329	2017-06-13 22:24:24 +00:00
Tom Stellard	ee6e6452df	AMDGPU/GlobalISel: Mark 32-bit G_ADD as legal Reviewers: arsenm Reviewed By: arsenm Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, rovka, kristof.beyls, igorb, dstuttard, tpr, llvm-commits, t-tye Differential Revision: https://reviews.llvm.org/D33992 llvm-svn: 305232	2017-06-12 20:54:56 +00:00
Matt Arsenault	05c26472fa	AMDGPU: Don't add same implicit use multiple times For the last component, the same register use was added as an implicit use and another implicit kill use. llvm-svn: 305205	2017-06-12 17:19:20 +00:00
Matt Arsenault	d9b77848f2	AMDGPU: Teach isLegalAddressingMode about flat offsets Also fix reporting r+r as a valid addressing mode without offsets. llvm-svn: 305203	2017-06-12 17:06:35 +00:00
Matt Arsenault	db7c6a8731	AMDGPU: Start selecting flat instruction offsets llvm-svn: 305201	2017-06-12 16:53:51 +00:00
Matt Arsenault	89ad17ce4c	AMDGPU: Verify that flat offsets aren't used pre-GFX9 For convenience the operand is always present in the instruction, but it isn't valid to use except on GFX9. llvm-svn: 305200	2017-06-12 16:37:55 +00:00
Matt Arsenault	fd02314113	AMDGPU: Start adding offset fields to flat instructions llvm-svn: 305194	2017-06-12 15:55:58 +00:00
Daniel Neilson	c0112ae8da	Const correctness for TTI::getRegisterBitWidth Summary: The method TargetTransformInfo::getRegisterBitWidth() is declared const, but the type erasing implementation classes (TargetTransformInfo::Concept & TargetTransformInfo::Model) that were introduced by Chandler in https://reviews.llvm.org/D7293 do not have the method declared const. This is an NFC to tidy up the const consistency between TTI and its implementation. Reviewers: chandlerc, rnk, reames Reviewed By: reames Subscribers: reames, jfb, arsenm, dschuff, nemanjai, nhaehnle, javed.absar, sbc100, jgravelle-google, llvm-commits Differential Revision: https://reviews.llvm.org/D33903 llvm-svn: 305189	2017-06-12 14:22:21 +00:00
Wei Ding	7c3e5115a5	AMDGPU : Fix ISA Version Definitions. Differential Revision: http://reviews.llvm.org/D28531 llvm-svn: 305137	2017-06-10 03:53:19 +00:00
Stanislav Mekhanoshin	1a61ab8172	[AMDGPU] Add intrinsics for alignbit and alignbyte instructions Differential Revision: https://reviews.llvm.org/D34046 llvm-svn: 305098	2017-06-09 19:03:00 +00:00
David Stuttard	82618baa0f	[AMDGPU] Fix for issue in alloca to vector promotion pass Summary: Alloca promotion pass not dealing with non-canonical input Added some additional checks so the pass simply backs-off forms it can't deal with (non-canonical) Also added some test cases in non-canonical form to check that it no longer crashes Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, tpr, t-tye Differential Revision: https://reviews.llvm.org/D31710 llvm-svn: 305079	2017-06-09 14:16:22 +00:00
Matt Arsenault	f1202e650a	AMDGPU: Work around build special casing .inc files It complains because it assumes these were autogenerated files in the source directory. llvm-svn: 305005	2017-06-08 19:25:21 +00:00
Matt Arsenault	3c7581bbeb	AMDGPU: Use correct register names in inline assembly Fixes using physical registers in inline asm from clang. llvm-svn: 305004	2017-06-08 19:03:20 +00:00
Mark Searles	e5c7832311	[AMDGPU] Force qsads instrs to use different dest register than source registers The V_MQSAD_PK_U16_U8, V_QSAD_PK_U16_U8, and V_MQSAD_U32_U8 take more than 1 pass in hardware. For these three instructions, the destination registers must be different than all sources, so that the first pass does not overwrite sources for the following passes. Differential Revision: https://reviews.llvm.org/D33783 llvm-svn: 304998	2017-06-08 18:21:19 +00:00
Dmitry Preobrazhensky	5a2f881b39	[AMDGPU][MC] Corrected error message for s_waitcnt helpers See Bug 32711: https://bugs.llvm.org//show_bug.cgi?id=32711 Reviewers: artem.tamazov Differential Revision: https://reviews.llvm.org/D33781 llvm-svn: 304922	2017-06-07 16:08:02 +00:00
Tom Stellard	2860a428f7	AMDGPU/GlobalISel: Mark 32-bit G_SELECT as legal Reviewers: arsenm Reviewed By: arsenm Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, rovka, kristof.beyls, igorb, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D33949 llvm-svn: 304910	2017-06-07 13:54:51 +00:00
Zachary Turner	264b5d9e88	Move Object format code to lib/BinaryFormat. This creates a new library called BinaryFormat that has all of the headers from llvm/Support containing structure and layout definitions for various types of binary formats like dwarf, coff, elf, etc as well as the code for identifying a file from its magic. Differential Revision: https://reviews.llvm.org/D33843 llvm-svn: 304864	2017-06-07 03:48:56 +00:00
Konstantin Zhuravlyov	1e2b87893b	AMDGPU/NFC: Move amdgpu code object metadata to support Differential Revision: https://reviews.llvm.org/D31437 llvm-svn: 304812	2017-06-06 18:35:50 +00:00
Stanislav Mekhanoshin	e4cda7417c	[AMDGPU] Return correct value from SDWA pass Differential Revision: https://reviews.llvm.org/D33927 llvm-svn: 304805	2017-06-06 16:42:30 +00:00
Tom Stellard	8cd60a5067	AMDGPU/GlobalISel: Mark 32-bit G_ICMP as legal Reviewers: arsenm Reviewed By: arsenm Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, rovka, kristof.beyls, igorb, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D33890 llvm-svn: 304797	2017-06-06 14:16:50 +00:00
Chandler Carruth	6bda14b313	Sort the remaining #include lines in include/... and lib/.... I did this a long time ago with a janky python script, but now clang-format has built-in support for this. I fed clang-format every line with a #include and let it re-sort things according to the precise LLVM rules for include ordering baked into clang-format these days. I've reverted a number of files where the results of sorting includes isn't healthy. Either places where we have legacy code relying on particular include ordering (where possible, I'll fix these separately) or where we have particular formatting around #include lines that I didn't want to disturb in this patch. This patch is entirely mechanical. If you get merge conflicts or anything, just ignore the changes in this patch and run clang-format over your #include lines in the files. Sorry for any noise here, but it is important to keep these things stable. I was seeing an increasing number of patches with irrelevant re-ordering of #include lines because clang-format was used. This patch at least isolates that churn, makes it easy to skip when resolving conflicts, and gets us to a clean baseline (again). llvm-svn: 304787	2017-06-06 11:49:48 +00:00
Mandeep Singh Grang	5e1697ef28	[llvm] Remove double semicolons Reviewers: craig.topper, arsenm, mehdi_amini Reviewed By: mehdi_amini Subscribers: mehdi_amini, wdng, nhaehnle, javed.absar, llvm-commits Differential Revision: https://reviews.llvm.org/D33924 llvm-svn: 304767	2017-06-06 05:08:36 +00:00
Konstantin Zhuravlyov	5b0bf2ff0d	AMDGPU: Remove deprecated and unused elf definitions Differential Revision: https://reviews.llvm.org/D33689 llvm-svn: 304737	2017-06-05 21:33:40 +00:00
Mark Searles	602ee930bf	[AMDGPU] Fix uninit'ed var (RevisitLoop) Differential Revision: https://reviews.llvm.org/D33907 llvm-svn: 304729	2017-06-05 19:29:01 +00:00
Stanislav Mekhanoshin	286a4225b9	[AMDGPU] Fix SIFoldOperands crash with clamp Fixes bug #33302. Pass did not account that Src1 of max instruction can be an immediate. Differential Revision: https://reviews.llvm.org/D33884 llvm-svn: 304696	2017-06-05 01:03:04 +00:00
Stanislav Mekhanoshin	0330660403	[AMDGPU] Untangle SDWA pass from SIShrinkInstructions Remove dependency of SDWA pass on SIShrinkInstructions. The goal is to move SDWA even higher in the stack to avoid second run of MachineLICM, MachineCSE and SIFoldOperands. Also added handling to preserve original src modifiers. Differential Revision: https://reviews.llvm.org/D33860 llvm-svn: 304665	2017-06-03 17:39:47 +00:00
Tom Stellard	e042412ef1	AMDGPU/GlobalISel: Mark 1-bit integer constants as legal Summary: These are mostly legal, but will probably need special lowering for some cases. Reviewers: arsenm Reviewed By: arsenm Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, rovka, kristof.beyls, igorb, dstuttard, tpr, llvm-commits, t-tye Differential Revision: https://reviews.llvm.org/D33791 llvm-svn: 304628	2017-06-03 01:13:33 +00:00
Stanislav Mekhanoshin	f154b4f52c	[AMDGPU] Preserve operand order in SIFoldOperands SIFoldOperands can commute operands even if no folding was done. This change is to preserve IR is no folding was done. Differential Revision: https://reviews.llvm.org/D33802 llvm-svn: 304625	2017-06-03 00:41:52 +00:00
Stanislav Mekhanoshin	ca5d2efe5a	[AMDGPU] V_DIV_FIXUP_F16 is not a commutable operation Differential Revision: https://reviews.llvm.org/D33808 llvm-svn: 304619	2017-06-03 00:16:44 +00:00
Matt Arsenault	746e065716	AMDGPU: Register AMDGPUAlwaysInline llvm-svn: 304574	2017-06-02 18:02:42 +00:00
Konstantin Zhuravlyov	be6c0ca5e2	AMDGPU: Make auto waitcnt before barrier a feature Differential Revision: https://reviews.llvm.org/D33793 llvm-svn: 304571	2017-06-02 17:40:26 +00:00
Alexander Timofeev	3f70b619a9	AMDGPUAnnotateUniformValue should always treat volatile loads as divergent llvm-svn: 304554	2017-06-02 15:25:52 +00:00
Mark Searles	70359ac60d	[AMDGPU] Turn on the new waitcnt insertion pass. Adjust tests. -enable-si-insert-waitcnts=1 becomes the default -enable-si-insert-waitcnts=0 to use old pass Differential Revision: https://reviews.llvm.org/D33730 llvm-svn: 304551	2017-06-02 14:19:25 +00:00
Yaxun Liu	a618acf923	[AMDGPU] Fix kernel arg segment size for amdgizcl Differential Revision: https://reviews.llvm.org/D33307 llvm-svn: 304482	2017-06-01 21:31:53 +00:00
Matt Arsenault	3416b8c874	AMDGPU: Remove error on call in AsmPrinter Partial revert of r301938 which is making it harder to split patches up. llvm-svn: 304418	2017-06-01 15:05:15 +00:00
Matt Arsenault	50f43e4168	AMDGPU: Set high getCSRFirstUseCost llvm-svn: 304416	2017-06-01 14:38:02 +00:00
Matthias Braun	d6a36ae282	TargetMachine: Indicate whether machine verifier passes. This adds a callback to the LLVMTargetMachine that lets target indicate that they do not pass the machine verifier checks in all cases yet. This is intended to be a temporary measure while the targets are fixed allowing us to enable the machine verifier by default with EXPENSIVE_CHECKS enabled! Differential Revision: https://reviews.llvm.org/D33696 llvm-svn: 304320	2017-05-31 18:41:23 +00:00
Mark Searles	11d0a04050	[AMDGPU] Fix bugs in new waitcnt pass. Add test. - new waitcnt pass remains off by default; -enable-si-insert-waitcnts=1 to enable it - fix handling of PERMUTE ops - fix insertion of waitcnt instrs at function begin/end ( port of analogous code that was added to old waitcnt pass ) - add new test Differential Revision: https://reviews.llvm.org/D33114 llvm-svn: 304311	2017-05-31 16:44:23 +00:00
Dmitry Preobrazhensky	793c592652	[AMDGPU][MC] New syntax for ds_swizzle_b32 offset See Bug 28601: https://bugs.llvm.org//show_bug.cgi?id=28601 Reviewers: artem.tamazov, vpykhtin Differential Revision: https://reviews.llvm.org/D33542 llvm-svn: 304309	2017-05-31 16:26:47 +00:00
Matthias Braun	5e394c3d6f	TargetPassConfig: Keep a reference to an LLVMTargetMachine; NFC TargetPassConfig is not useful for targets that do not use the CodeGen library, so we may just as well store a pointer to an LLVMTargetMachine instead of just to a TargetMachine. While at it, also change the constructor to take a reference instead of a pointer as the TM must not be nullptr. llvm-svn: 304247	2017-05-30 21:36:41 +00:00
Stanislav Mekhanoshin	56ea488d8b	[AMDGPU] Allow SDWA in instructions with immediates and SGPRs An encoding does not allow to use SDWA in an instruction with scalar operands, either literals or SGPRs. That is however possible to copy these operands into a VGPR first. Several copies of the value are produced if multiple SDWA conversions were done. To cleanup MachineLICM (to hoist copies out of loops), MachineCSE (to remove duplicate copies) and SIFoldOperands (to replace SGPR to VGPR copy with immediate copy right to the VGPR) runs are added after the SDWA pass. Differential Revision: https://reviews.llvm.org/D33583 llvm-svn: 304219	2017-05-30 16:49:24 +00:00
Mark Searles	00ce96f6ee	[AMDGPU] Require waitcnt before barrier for all targets; adjust tests. Differential Revision: https://reviews.llvm.org/D33576 llvm-svn: 304217	2017-05-30 16:22:43 +00:00
Konstantin Zhuravlyov	b2ff8dfea0	Resubmit r303859 with test fixed. [AMDGPU] add intrinsic for s_getpc Summary: The s_getpc instruction is exposed as intrinsic llvm.amdgcn.s.getpc. Patch by Tim Corringham llvm-svn: 304031	2017-05-26 20:38:26 +00:00
Benjamin Kramer	debb3c35e0	Make helper functions static. NFC. llvm-svn: 304029	2017-05-26 20:09:00 +00:00
Dmitry Preobrazhensky	6a2431df0b	[AMDGPU][MC][GFX9] Corrected encoding of flat_scratch* for SDWA opcodes See bug 33171: https://bugs.llvm.org/show_bug.cgi?id=33171 Reviewers: Sam Kolton Differential Revision: https://reviews.llvm.org/D33553 llvm-svn: 304015	2017-05-26 18:01:29 +00:00
Tom Stellard	dde28a8c92	AMDGPU/GlobalISel: Mark 32-bit float constants as legal Reviewers: arsenm Reviewed By: arsenm Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, rovka, kristof.beyls, igorb, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D33212 llvm-svn: 304003	2017-05-26 16:40:03 +00:00
Sam Kolton	363f47a2c7	[AMDGPU] SDWA: add disassembler support for GFX9 Summary: Added decoder methods and tests Reviewers: vpykhtin, artem.tamazov, dp Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye Differential Revision: https://reviews.llvm.org/D33545 llvm-svn: 303999	2017-05-26 15:52:00 +00:00
Nico Weber	b3d83a092a	Revert r303859, CodeGen/AMDGPU/llvm.amdgcn.s.getpc.ll fails on bots. llvm-svn: 303902	2017-05-25 19:19:29 +00:00
Tim Corringham	32d0d38679	[AMDGPU] add intrinsic for s_getpc Summary: The s_getpc instruction is exposed as intrinsic llvm.amdgcn.s.getpc. Reviewers: arsenm Reviewed By: arsenm Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye Differential Revision: https://reviews.llvm.org/D32862 llvm-svn: 303859	2017-05-25 14:04:14 +00:00
Nirav Dave	d20066cbad	[AMDGPU] Prevent too large store merges in AMDGPU Subtargets. NFCI. Various address spaces on the SI and R600 subtargets have stricter limits on memory access size that other address spaces. Use canMergeStoresTo predicate to prevent the DAGCombiner from creating these stores as they will be split up during legalization. llvm-svn: 303767	2017-05-24 15:59:09 +00:00
Marek Olsak	8973a0a22c	Revert "AMDGPU: Fold CI-specific complex SMRD patterns into existing complex patterns" This reverts commit e065977c4b5f68ab845400b256f6a3822b1325fa. It doesn't work. S_LOAD_DWORD_IMM_ci and friends aren't selected by any of the patterns, so it was putting 32-bit literals into the 8-bit field. llvm-svn: 303754	2017-05-24 14:53:50 +00:00
Simon Pilgrim	c910a70b21	[AMDGPU] Add INDIRECT_BASE_ADDR to R600_Reg32 class (PR33045) This fixes 17 of the 41 -verify-machineinstrs test failures identified in PR33045 Differential Revision: https://reviews.llvm.org/D33451 llvm-svn: 303691	2017-05-23 21:27:15 +00:00
Changpeng Fang	1dbace195d	AMDGPU/SI: Move the local memory usage related checking after calling convention checking in PromoteAlloca Summary: Promoting Alloca to Vector and Promoting Alloca to LDS are two independent handling of Alloca and should not affect each other. As a result, we should not give up promoting to vector if there is not enough LDS. This patch factors out the local memory usage related checking out and replace it after the calling convention checking. Reviewer: arsenm Differential Revision: http://reviews.llvm.org/D33139 llvm-svn: 303684	2017-05-23 20:25:41 +00:00
Stanislav Mekhanoshin	53a21292f8	[AMDGPU] Combine and (srl) into shl (bfe) Perform DAG combine: and (srl x, c), mask => shl (bfe x, nb + c, mask >> nb), nb Where nb is a number of trailing zeroes in mask. It replaces two instructions with two and BFE is generally a more expensive one. However this is only done if we are selecting a byte or word at an aligned boundary which results in a proper SDWA operand pattern. It is only done if SDWA is supported. TODO: improve SDWA pass to actually convert this pattern. It is not done now because we have an immediate in the instruction, which has be moved into a VGPR. Differential Revision: https://reviews.llvm.org/D33455 llvm-svn: 303681	2017-05-23 19:54:48 +00:00
Marek Olsak	7dadd86a35	AMDGPU: Fold CI-specific complex SMRD patterns into existing complex patterns This is just a cleanup. Also, it adds checking that ByteCount is aligned to 4. Reviewers: arsenm, nhaehnle, tstellarAMD Subscribers: kzhuravl, wdng, yaxunl, tony-tye Differential Revision: https://reviews.llvm.org/D28994 llvm-svn: 303658	2017-05-23 17:14:34 +00:00
Stanislav Mekhanoshin	a96ec3f360	[AMDGPU] Convert shl (add) into add (shl) shl (or\|add x, c2), c1 => or\|add (shl x, c1), (c2 << c1) This allows to fold a constant into an address in some cases as well as to eliminate second shift if the expression is used as an address and second shift is a result of a GEP. Differential Revision: https://reviews.llvm.org/D33432 llvm-svn: 303641	2017-05-23 15:59:58 +00:00
Sam Kolton	f7659d71eb	[AMDGPU] SDWA: Add assembler support for GFX9 Summary: Added separate pseudo and real instruction for GFX9 SDWA instructions. Currently supports only in assembler. Depends D32493 Reviewers: vpykhtin, artem.tamazov Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye Differential Revision: https://reviews.llvm.org/D33132 llvm-svn: 303620	2017-05-23 10:08:55 +00:00
Stanislav Mekhanoshin	5fa289f0d8	[AMDGPU] Narrow lshl from 64 to 32 bit if possible Turn expensive 64 bit shift into 32 bit if shift does not overflow int: shl (ext x) => zext (shl x) Differential Revision: https://reviews.llvm.org/D33367 llvm-svn: 303569	2017-05-22 16:58:10 +00:00
Valery Pykhtin	74cb9c8831	[AMDGPU] Fix incorrect register usage tracking in GCNUpwardTracker Differential revision: https://reviews.llvm.org/D33289 llvm-svn: 303548	2017-05-22 13:09:40 +00:00
Dmitry Preobrazhensky	ce941c9c38	[AMDGPU][MC] Corrected disassembler to decode instructions with 2 literals See bug 32922: https://bugs.llvm.org//show_bug.cgi?id=32922 Reviewers: artem.tamazov, vpykhtin Differential Revision: https://reviews.llvm.org/D32912 llvm-svn: 303428	2017-05-19 14:27:52 +00:00
Dmitry Preobrazhensky	9321e8fcec	[AMDGPU][MC] Fixed bugs in export instruction See Bugs 33019, 33056: https://bugs.llvm.org//show_bug.cgi?id=33019 https://bugs.llvm.org//show_bug.cgi?id=33056 Reviewers: artem.tamazov, vpykhtin Differential Revision: https://reviews.llvm.org/D33288 llvm-svn: 303423	2017-05-19 13:36:09 +00:00
Francis Visoiu Mistrih	8b61764cbb	[LegacyPassManager] Remove TargetMachine constructors This provides a new way to access the TargetMachine through TargetPassConfig, as a dependency. The patterns replaced here are: * Passes handling a null TargetMachine call `getAnalysisIfAvailable<TargetPassConfig>`. * Passes not handling a null TargetMachine `addRequired<TargetPassConfig>` and call `getAnalysis<TargetPassConfig>`. * MachineFunctionPasses now use MF.getTarget(). * Remove all the TargetMachine constructors. * Remove INITIALIZE_TM_PASS. This fixes a crash when running `llc -start-before prologepilog`. PEI needs StackProtector, which gets constructed without a TargetMachine by the pass manager. The StackProtector pass doesn't handle the case where there is no TargetMachine, so it segfaults. Related to PR30324. Differential Revision: https://reviews.llvm.org/D33222 llvm-svn: 303360	2017-05-18 17:21:13 +00:00
Sam Kolton	ebfdaf7394	[AMDGPU] SDWA operands should not intersect with potential MIs Summary: There should be no intesection between SDWA operands and potential MIs. E.g.: ``` v_and_b32 v0, 0xff, v1 -> src:v1 sel:BYTE_0 v_and_b32 v2, 0xff, v0 -> src:v0 sel:BYTE_0 v_add_u32 v3, v4, v2 ``` In that example it is possible that we would fold 2nd instruction into 3rd (v_add_u32_sdwa) and then try to fold 1st instruction into 2nd (that was already destroyed). So if SDWAOperand is also a potential MI then do not apply it. Reviewers: vpykhtin, arsenm Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye Differential Revision: https://reviews.llvm.org/D32804 llvm-svn: 303347	2017-05-18 12:12:03 +00:00
Matt Arsenault	2b1f9aa577	AMDGPU: Start defining a calling convention Partially implement callee-side for arguments and return values. byval doesn't work properly, and most likely sret or other on-stack return values most as well. llvm-svn: 303308	2017-05-17 21:56:25 +00:00
Matt Arsenault	2525e4e4c2	AMDGPU: Expand frame indexes to be relative to scratch wave offset In order for an arbitrary callee to access an object in a caller's stack frame, the 32-bit offset used as the private pointer needs to be relative to the kernel's scratch wave offset register. Convert to this by finding the difference from the current stack frame and scaling by the wavefront size. llvm-svn: 303303	2017-05-17 21:23:14 +00:00
Matt Arsenault	156d3ae0b6	AMDGPU: Change mubuf soffset register when SP relative Check the MachinePointerInfo for whether the access is supposed to be relative to the stack pointer. No tests because this is used in later commits implementing calls. llvm-svn: 303301	2017-05-17 21:02:58 +00:00
Matt Arsenault	98f2946ab3	AMDGPU: Make better use of op_sel with high components Handle more general swizzles. llvm-svn: 303296	2017-05-17 20:30:58 +00:00
Matt Arsenault	786eeea23e	AMDGPU: Try to use op_sel when selecting packed instructions Avoids instructions to pack a vector when the source is really a scalar being broadcast. Also be smarter and look for per-component fneg. Doesn't yet handle scalar from upper half of register or other swizzles. llvm-svn: 303291	2017-05-17 20:00:00 +00:00
Matt Arsenault	ea8a4ed588	AMDGPU: Use appropriate soffset for spilling This needs to be the frame offset register, and not the global scratch wave offset register. For kernels, these are the same. llvm-svn: 303287	2017-05-17 19:37:57 +00:00
Matt Arsenault	ee324ffc1f	AMDGPU: Fix min3/max3 combines for f16/i16 Fix missing instruction definitions for min3/max3. llvm-svn: 303284	2017-05-17 19:25:06 +00:00
Stanislav Mekhanoshin	acca0f5c02	[AMDGPU] Use GCNRPTracker dumper methods in scheduler Differential Revision: https://reviews.llvm.org/D33244 llvm-svn: 303186	2017-05-16 16:31:45 +00:00
Stanislav Mekhanoshin	b10860788f	[AMDGPU] Cache live-ins and register pressure in scheduler Using LIS can be quite expensive, so caching of calculated region live-ins and pressure is implemented. It does two things: 1. Caches the info for the second stage when we schedule with decreased target occupancy. 2. Tracks the basic block from top to bottom thus eliminating the need to scan whole register file liveness at every region split in the middle of the block. The scheduling is now done in 3 stages instead of two, with the first one being really a no-op and only used to collect scheduling regions as sent by the scheduler driver. There is no functional change to the current behavior, only compilation speed is affected. In general computeBlockPressure() could be simplified if we switch to backward RP tracker, because scheduler sends regions within a block starting from the last upward. We could use a natural order of upward tracker to seamlessly change between regions of the same block, since live reg set of a previous tracked region would become a live-out of the next region. That however requires fixing upward tracker to properly account defs and uses of the same instruction as both are contributing to the current pressure. When we converge on the produced pressure we should be able to switch between them back and forth. In addition, backward tracker is less expensive as it uses LIS in recede less often than forward uses it in advance. At the moment the worst known case compilation time has improved from 26 minutes to 8.5. Differential Revision: https://reviews.llvm.org/D33117 llvm-svn: 303184	2017-05-16 16:11:26 +00:00
Stanislav Mekhanoshin	464cecf81e	[AMDGPU] Turn register pressure estimation into forward tracker This factors register pressure estimation mechanism from the GCNSchedStrategy into the forward tracker to unify interface with other strategies and expose it to other interested phases. Differential Revision: https://reviews.llvm.org/D33105 llvm-svn: 303179	2017-05-16 15:43:52 +00:00
NAKAMURA Takumi	994a43d27a	AMDGPUCodeGen: Fix warnings in r303111. [-Wunused-variable] llvm-svn: 303137	2017-05-16 04:01:23 +00:00
Davide Italiano	60d36c7506	[AMDGPU] Kill now unused phiInfoElementGetDebugLoc(). NFCI. llvm-svn: 303122	2017-05-15 22:10:15 +00:00
Jan Sjodin	a06bfe054e	Re-submit AMDGPUMachineCFGStructurizer. Differential Revision: https://reviews.llvm.org/D23209 llvm-svn: 303111	2017-05-15 20:18:37 +00:00
Jan Sjodin	0e289822fa	Revert 303091. llvm-svn: 303098	2017-05-15 18:39:47 +00:00
Jan Sjodin	e9d2ddc9dd	Add AMDGPUMachineCFGStructurizer. Differential Revision: https://reviews.llvm.org/D23209 llvm-svn: 303091	2017-05-15 18:13:56 +00:00
Dmitry Preobrazhensky	167f8b69e3	[AMDGPU][MC] Corrected several VI opcodes to avoid printing _e64 See bug 32936: https://bugs.llvm.org//show_bug.cgi?id=32936 Reviewers: artem.tamazov, vpykhtin Differential Revision: https://reviews.llvm.org/D33123 llvm-svn: 303070	2017-05-15 14:28:23 +00:00
Dmitry Preobrazhensky	03852a9dca	[AMDGPU][MC] Removed V_MQSAD_U16_U8 This instruction does not really exist See Bug 33018: https://bugs.llvm.org//show_bug.cgi?id=33018 Reviewers: vpykhtin, artem.tamazov Differential Revision: https://reviews.llvm.org/D33126 llvm-svn: 303055	2017-05-15 12:37:03 +00:00
Changpeng Fang	161e8c39af	AMDGPU/SI: Don't promote to vector if the load/store is volatile. Summary: We should not change volatile loads/stores in promoting alloca to vector. Reviewers: arsenm Differential Revision: http://reviews.llvm.org/D33107 llvm-svn: 302943	2017-05-12 20:31:12 +00:00
Craig Topper	8df66c602a	[KnownBits] Add bit counting methods to KnownBits struct and use them where possible This patch adds min/max population count, leading/trailing zero/one bit counting methods. The min methods return answers based on bits that are known without considering unknown bits. The max methods give answers taking into account the largest count that unknown bits could give. Differential Revision: https://reviews.llvm.org/D32931 llvm-svn: 302925	2017-05-12 17:20:30 +00:00
Tom Stellard	a0d67c748a	AMDGPU/GlobalISel: Mark 32-bit integer constants as legal Reviewers: arsenm Reviewed By: arsenm Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, rovka, kristof.beyls, igorb, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D33115 llvm-svn: 302919	2017-05-12 16:46:46 +00:00
Davide Italiano	0dcc015a81	[AMDGPU] Placate unused variable warning in release builds. llvm-svn: 302821	2017-05-11 19:58:52 +00:00
Matt Arsenault	47ccafe787	AMDGPU: Remove tfe bit from flat instruction definitions We don't use it and it was removed in gfx9, and the encoding bit repurposed. Additionally actually using it requires changing the output register class, which wasn't done anyway. llvm-svn: 302814	2017-05-11 17:38:33 +00:00
Matt Arsenault	bf5482e4bb	AMDGPU: Pull fneg out of extract_vector_elt This allows folding source modifiers in more f16 cases. Makes it easier to select per-component packed neg modifiers. llvm-svn: 302813	2017-05-11 17:26:25 +00:00
Stanislav Mekhanoshin	33a97ec4ed	[AMDGPU] Fix incorrect register pressure calculation Earlier fix D32572 introduced a bug where live-ins were calculated for basic block instead of scheduling region. This change fixes it. Differential Revision: https://reviews.llvm.org/D33086 llvm-svn: 302812	2017-05-11 17:16:55 +00:00
Serge Guelton	1b421c259f	Remove now useless trailing nullptr in StructType::get llvm-svn: 302779	2017-05-11 08:46:02 +00:00
Matt Arsenault	3c5e4237c6	AMDGPU: Make some packed shuffles free VOP3P instructions can encode access to either half of the register. llvm-svn: 302730	2017-05-10 21:29:33 +00:00
Matt Arsenault	acdc7659cc	AMDGPU: Add new subtarget features for gfx9 flat instructions Flat instructions gain an immediate offset, and 2 new sets of segment specific flat instructions are added. llvm-svn: 302729	2017-05-10 21:19:05 +00:00
Dmitry Preobrazhensky	da61a7f9ef	[AMDGPU][MC] Corrected v_madak/madmk to avoid printing "_e32" in disassembler output See bug 32927: https://bugs.llvm.org//show_bug.cgi?id=32927 Reviewers: vpykhtin, artem.tamazov, arsenm Differential Revision: https://reviews.llvm.org/D32913 llvm-svn: 302648	2017-05-10 13:00:28 +00:00
Stanislav Mekhanoshin	7e3794d5c3	[AMDGPU] Fixed typo in GCNRegPressure, NFC VGRP -> VGPR, SGRP -> SGPR llvm-svn: 302586	2017-05-09 20:50:04 +00:00
Quentin Colombet	245994d968	[RegisterBankInfo] Uniquely allocate instruction mapping. This is a step toward having statically allocated instruciton mapping. We are going to tablegen them eventually, so let us reflect that in the API. NFC. llvm-svn: 302316	2017-05-05 22:48:22 +00:00
Kannan Narayanan	5e73b04b84	[AMDGPU] In the new waitcnt insertion pass, use getHeader instead of getTopBlock to find the loop header. Differential Revision: https://reviews.llvm.org/D32831 llvm-svn: 302290	2017-05-05 21:10:17 +00:00
Konstantin Zhuravlyov	6ccb076aeb	AMDGPU/AMDHSA: Set COMPUTE_PGM_RSRC2:LDS_SIZE to 0 This field is populated by the CP Differential Revision: https://reviews.llvm.org/D32619 llvm-svn: 302277	2017-05-05 20:13:55 +00:00
Craig Topper	f0aeee01c3	[KnownBits] Add wrapper methods for setting and clear all bits in the underlying APInts in KnownBits. This adds routines for reseting KnownBits to unknown, making the value all zeros or all ones. It also adds methods for querying if the value is zero, all ones or unknown. Differential Revision: https://reviews.llvm.org/D32637 llvm-svn: 302262	2017-05-05 17:36:09 +00:00
Marek Olsak	584d2c05d4	AMDGPU: GFX9 GS and HS shaders always have the scratch wave offset in SGPR5 Reviewers: arsenm, nhaehnle Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D32645 llvm-svn: 302200	2017-05-04 22:25:20 +00:00
Matt Arsenault	5c80618fb7	AMDGPU: Don't promote alloca to LDS for leaf functions LDS use in leaf functions not currently handled. llvm-svn: 301958	2017-05-02 18:33:18 +00:00
Matt Arsenault	b03dd8daae	AMDGPU: Refactor AsmPrinter Avoid analyzing functions multiple times. This allows asserting that each function is only analyzed once. llvm-svn: 301938	2017-05-02 17:14:00 +00:00
Matt Arsenault	7b82b4bddb	AMDGPU: Make intrinsics speculatable llvm-svn: 301937	2017-05-02 16:57:44 +00:00
Marek Olsak	a302a736ec	AMDGPU: Add AMDGPU_HS calling convention Reviewers: arsenm, nhaehnle Subscribers: mehdi_amini, kzhuravl, wdng, yaxunl, dstuttard, tpr, llvm-commits, t-tye Differential Revision: https://reviews.llvm.org/D32644 llvm-svn: 301930	2017-05-02 15:41:10 +00:00
Sanjoy Das	e6bca0eecb	Rename WeakVH to WeakTrackingVH; NFC This relands r301424. llvm-svn: 301812	2017-05-01 17:07:49 +00:00
Amara Emerson	d28f0cd448	Generalize the specialized flag-carrying SDNodes by moving flags into SDNode. This removes BinaryWithFlagsSDNode, and flags are now all passed by value. Differential Revision: https://reviews.llvm.org/D32527 llvm-svn: 301803	2017-05-01 15:17:51 +00:00
Matt Arsenault	2a80369ae4	AMDGPU: Fix copies from physical registers in SIFixSGPRCopies This would assert when there were multiple defs of a physical register. We just need to move all of the users of it. llvm-svn: 301730	2017-04-29 01:26:34 +00:00
Marek Olsak	2d82590f64	AMDGPU: Add new amdgcn.init.exec intrinsics v2: More tests, bug fixes, cosmetic changes. Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, llvm-commits, t-tye Differential Revision: https://reviews.llvm.org/D31762 llvm-svn: 301677	2017-04-28 20:21:58 +00:00
Craig Topper	d0af7e8ab8	[SelectionDAG] Use KnownBits struct in DAG's computeKnownBits and simplifyDemandedBits This patch replaces the separate APInts for KnownZero/KnownOne with a single KnownBits struct. This is similar to what was done to ValueTracking's version recently. This is largely a mechanical transformation from KnownZero to Known.Zero. Differential Revision: https://reviews.llvm.org/D32569 llvm-svn: 301620	2017-04-28 05:31:46 +00:00
Sam Kolton	5d99386b4d	[AMDGPU] DPP: add support for GFX9 Reviewers: artem.tamazov Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye Differential Revision: https://reviews.llvm.org/D32588 llvm-svn: 301551	2017-04-27 15:42:38 +00:00
Konstantin Zhuravlyov	97a663b6a2	AMDGPU: Fix assert in scheduler Assert is triggered if DBG_VALUE is first instruction in BB Differential Revision: https://reviews.llvm.org/D32572 llvm-svn: 301511	2017-04-27 03:22:44 +00:00
Dmitry Preobrazhensky	43d297eb45	[AMDGPU][MC] Added arg checks for vmcnt, expcnt, lgkmcnt helpers Summary of changes: - corrected vmcnt, expcnt, lgkmcnt helpers to checks their argument for truncation; - added saturated versions of these helpers. See bug 32711 for details: https://bugs.llvm.org//show_bug.cgi?id=32711 Reviewers: artem.tamazov, vpykhtin Differential Revision: https://reviews.llvm.org/D32546 llvm-svn: 301439	2017-04-26 17:55:50 +00:00
Sanjoy Das	2cbeb00f38	Reverts commit r301424, r301425 and r301426 Commits were: "Use WeakVH instead of WeakTrackingVH in AliasSetTracker's UnkownInsts" "Add a new WeakVH value handle; NFC" "Rename WeakVH to WeakTrackingVH; NFC" The changes assumed pointers are 8 byte aligned on all architectures. llvm-svn: 301429	2017-04-26 16:37:05 +00:00
Sanjoy Das	01de557738	Rename WeakVH to WeakTrackingVH; NFC Summary: I plan to use WeakVH to mean "nulls itself out on deletion, but does not track RAUW" in a subsequent commit. Reviewers: dblaikie, davide Reviewed By: davide Subscribers: arsenm, mehdi_amini, mcrosier, mzolotukhin, jfb, llvm-commits, nhaehnle Differential Revision: https://reviews.llvm.org/D32266 llvm-svn: 301424	2017-04-26 16:20:52 +00:00
Dmitry Preobrazhensky	c7d35a0d6a	[AMDGPU][MC] Added check for truncation of SOPK imm operand See bug 30827: https://bugs.llvm.org//show_bug.cgi?id=30827 Reviewers: artem.tamazov, vpykhtin Differential Revision: https://reviews.llvm.org/D32535 llvm-svn: 301418	2017-04-26 15:34:19 +00:00
Davide Italiano	0316f7ae7b	[AMDGPU] Garbage collect dead code. NFCI. llvm-svn: 301375	2017-04-26 01:00:52 +00:00
Matt Arsenault	36c3122ecd	AMDGPU: Shift down reserved SP register like scratch wave offset llvm-svn: 301367	2017-04-25 23:40:57 +00:00
Matt Arsenault	df58e825ad	AMDGPU: Clean up VOP3NoMods pattern There is no need to copy the operands or inspect the sources. Also remove some unnecessary clamp/omod usage. llvm-svn: 301363	2017-04-25 21:17:38 +00:00
Konstantin Zhuravlyov	54ba4312a3	AMDGPU: Fix ValueKind code object metadata for images Differential Revision: https://reviews.llvm.org/D32504 llvm-svn: 301360	2017-04-25 20:38:26 +00:00
Matt Arsenault	e22184940b	AMDGPU: Slightly simplify prolog reserved register handling Rely on MachineRegisterInfo's knowledge of used physical registers. Move flat_scratch initialization earlier, so the uses are visible when making these decisions. This will make it easier to add another reserved register at the end for the stack pointer rather than handling another special case. llvm-svn: 301254	2017-04-24 21:08:32 +00:00
Matt Arsenault	0774ea267a	AMDGPU: Select scratch mubuf offsets when pointer is a constant In call sequence setups, there may not be a frame index base and the pointer is a constant offset from the frame pointer / scratch wave offset register. llvm-svn: 301230	2017-04-24 19:40:59 +00:00
Matt Arsenault	df6539f44b	AMDGPU: Set StackGrowsUp in MCAsmInfo Not sure what this does though. llvm-svn: 301229	2017-04-24 19:40:51 +00:00
Stanislav Mekhanoshin	bd5394be3d	[AMDGPU] Merge M0 initializations Merges equivalent initializations of M0 and hoists them into a common dominator block. Technically the same code can be used with any register, physical or virtual. Differential Revision: https://reviews.llvm.org/D32279 llvm-svn: 301228	2017-04-24 19:37:54 +00:00
Krzysztof Parzyszek	44e25f37ae	Move size and alignment information of regclass to TargetRegisterInfo 1. RegisterClass::getSize() is split into two functions: - TargetRegisterInfo::getRegSizeInBits(const TargetRegisterClass &RC) const; - TargetRegisterInfo::getSpillSize(const TargetRegisterClass &RC) const; 2. RegisterClass::getAlignment() is replaced by: - TargetRegisterInfo::getSpillAlignment(const TargetRegisterClass &RC) const; This will allow making those values depend on subtarget features in the future. Differential Revision: https://reviews.llvm.org/D31783 llvm-svn: 301221	2017-04-24 18:55:33 +00:00
Yaxun Liu	fd23a0c095	CodeGen: Add a hook for getFenceOperandTy Currently the operand type for ATOMIC_FENCE assumes value type of a pointer in address space 0. This is fine for most targets. However for amdgcn target, the size of pointer in address space 0 depends on triple environment. For amdgiz environment, it is 64 bit but for other environment it is 32 bit. On the other hand, amdgcn target expects 32 bit fence operands independent of the target triple environment. Therefore a hook is need in target lowering for getting the fence operand type. This patch has no effect on targets other than amdgcn. Differential Revision: https://reviews.llvm.org/D32186 llvm-svn: 301215	2017-04-24 18:26:27 +00:00
Matt Arsenault	1c0ae3972f	AMDGPU: Add StackPtr and FramePtr registers to MFI These will be necessary for setting up call sequences. llvm-svn: 301208	2017-04-24 18:05:16 +00:00
Matt Arsenault	3e02538a02	AMDGPU: Move trap lowering to DAG Fixes traps in any block besides the entry block, and fixes depending on a live-in physical register by using a virtual register copy. Also happens to stop emitting a nop in the case debug trap is not supported. llvm-svn: 301206	2017-04-24 17:49:13 +00:00
Nicolai Haehnle	5dea645138	AMDGPU: Move v_readlane lane select from VGPR to SGPR Summary: Fix a compiler bug when the lane select happens to end up in a VGPR. Clarify the semantic of the corresponding intrinsic to be that of the corresponding GLSL: the lane select must be uniform across a wave front, otherwise results are undefined. Reviewers: arsenm Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D32343 llvm-svn: 301197	2017-04-24 17:17:36 +00:00
Nicolai Haehnle	ef449787d8	AMDGPU: Fix crash when scheduling non-memory SMRD instructions Summary: Fixes piglit spec/arb_shader_clock/execution/* Reviewers: arsenm Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, llvm-commits, t-tye Differential Revision: https://reviews.llvm.org/D32345 llvm-svn: 301191	2017-04-24 16:53:52 +00:00
Konstantin Zhuravlyov	f628406bbd	AMDGPU/GFX9: Enable FastFMAF32 Differential Revision: https://reviews.llvm.org/D32363 llvm-svn: 301029	2017-04-21 19:57:53 +00:00
Konstantin Zhuravlyov	3d1cc88c68	AMDGPU: Temporarily disable packed inlinable literals (v2f16, v2i16) Differential Revision: https://reviews.llvm.org/D32361 llvm-svn: 301028	2017-04-21 19:45:22 +00:00
Konstantin Zhuravlyov	88938d4e67	AMDGPU: Fix S_PACK_HH_B32_B16 - We really ought to zero out lower 16 bits Differential Revision: https://reviews.llvm.org/D32356 llvm-svn: 301026	2017-04-21 19:35:05 +00:00
Yaxun Liu	15a96b1dc8	[AMDGPU] Handle SI_MASKED_UNREACHABLE in instruction emitter SI_MASKED_UNREACHABLE does not have machine instruction encoding. It needs special handling in AMDGPUAsmPrinter::EmitInstruction like some other pseudo instructions. This patch fixes compilation failure of RadeonRays. Differential Revision: https://reviews.llvm.org/D32364 llvm-svn: 301025	2017-04-21 19:32:02 +00:00
Konstantin Zhuravlyov	c4b18e7099	AMDGPU: Do not lower fast unsafe div for safe, f32, with fp32 denormals Differential Revision: https://reviews.llvm.org/D32085 llvm-svn: 301023	2017-04-21 19:25:33 +00:00
Akira Hatanaka	22e839f4b2	[AArch64] Improve code generation for logical instructions taking immediate operands. This commit adds an AArch64 dag-combine that optimizes code generation for logical instructions taking immediate operands. The optimization uses demanded bits to change a logical instruction's immediate operand so that the immediate can be folded into the immediate field of the instruction. This recommits r300932 and r300930, which was causing dag-combine to loop forever. The problem was that optimizeLogicalImm was returning true even when there was no change to the immediate node (which happened when the immediate was all zeros or ones), which caused dag-combine to push and pop the same node to the work list over and over again without making any progress. This commit fixes the bug by returning false early in optimizeLogicalImm if the immediate is all zeros or ones. Also, it changes the code to compare the immediate with 0 or Mask rather than calling countPopulation. rdar://problem/18231627 Differential Revision: https://reviews.llvm.org/D5591 llvm-svn: 301019	2017-04-21 18:53:12 +00:00
Akira Hatanaka	78ccba6a20	Revert r300932 and r300930. It seems that r300930 was creating an infinite loop in dag-combine when compling the following file: MultiSource/Benchmarks/MiBench/consumer-typeset/z21.c llvm-svn: 300940	2017-04-21 01:31:50 +00:00
Akira Hatanaka	19077aaee0	[AArch64] Improve code generation for logical instructions taking immediate operands. This commit adds an AArch64 dag-combine that optimizes code generation for logical instructions taking immediate operands. The optimization uses demanded bits to change a logical instruction's immediate operand so that the immediate can be folded into the immediate field of the instruction. This recommits r300913, which broke bots because I didn't fix a call to ShrinkDemandedConstant in SIISelLowering.cpp after changing the APIs of TargetLoweringOpt and TargetLowering. rdar://problem/18231627 Differential Revision: https://reviews.llvm.org/D5591 llvm-svn: 300930	2017-04-21 00:05:16 +00:00
Akira Hatanaka	7b06cebe73	Revert "[AArch64] Improve code generation for logical instructions taking" This reverts r300913. This broke bots. llvm-svn: 300916	2017-04-20 23:03:30 +00:00
Akira Hatanaka	e327f09832	[AArch64] Improve code generation for logical instructions taking immediate operands. This commit adds an AArch64 dag-combine that optimizes code generation for logical instructions taking immediate operands. The optimization uses demanded bits to change a logical instruction's immediate operand so that the immediate can be folded into the immediate field of the instruction. rdar://problem/18231627 Differential Revision: https://reviews.llvm.org/D5591 llvm-svn: 300913	2017-04-20 22:47:56 +00:00
Kannan Narayanan	2fb5960121	Revert earlier change. ds permute operations affect lgkm counter. Differential Revision: https://reviews.llvm.org/D32254 llvm-svn: 300791	2017-04-19 23:39:19 +00:00
Matt Arsenault	4a48623e4f	AMDGPU: Custom lower illegal small select types Promote them to i32 vectors to avoid unpacking and re-packing the vectors. llvm-svn: 300754	2017-04-19 20:53:07 +00:00
Matt Arsenault	021a218dd2	AMDGPU: Don't emit amd_kernel_code_t for callable functions This is inserted directly in the text section. The relocation for the function ends up resolving to the beginning of the amd_kernel_code_t header rather than the actual function entry point. Also skip some of the comments for initialization that only makes sense for kernels. llvm-svn: 300736	2017-04-19 19:38:10 +00:00
Matt Arsenault	6cb7b8a42f	AMDGPU: Don't align callable functions to 256 llvm-svn: 300720	2017-04-19 17:42:39 +00:00
Matt Arsenault	4c1ecded63	AMDGPU: Change DivergenceAnalysis for function arguments Stop assuming all functions are kernels. llvm-svn: 300719	2017-04-19 17:42:34 +00:00
Matt Arsenault	aa31dce3c5	Fix typo llvm-svn: 300597	2017-04-18 20:59:46 +00:00
Matt Arsenault	161e2b4223	AMDGPU: Make MFI fields private llvm-svn: 300596	2017-04-18 20:59:40 +00:00
Matt Arsenault	a3566f2149	AMDGPU: Use MachineRegisterInfo to find max used register Avoid looping through program to determine register counts. This avoids needing to look at regmask operands. Also fixes some counting errors with flat_scr when there are no stack objects. llvm-svn: 300482	2017-04-17 19:48:30 +00:00
Matt Arsenault	869fec278c	AMDGPU: Change stack alignment While the incoming stack for a kernel is 256-byte aligned, this refers to the base address of the entire wave. This isn't useful information for most of codegen. Fixes unnecessarily aligning stack objects in callees. llvm-svn: 300481	2017-04-17 19:48:24 +00:00
Konstantin Zhuravlyov	12096848fd	AMDGPU: Set CodePointerSize to 8 for amdgcn llvm-svn: 300470	2017-04-17 18:02:09 +00:00
Stanislav Mekhanoshin	eff0bc7839	[AMDGPU] set read_only access qualifier for pointers If a kernel's pointer argument is known to be readonly set access qualifier accordingly. This allows RT not to flush caches before dispatches. Differential Revision: https://reviews.llvm.org/D32091 llvm-svn: 300362	2017-04-14 19:11:40 +00:00
Dmitry Preobrazhensky	e6ef099dcd	[AMDGPU][MC] Corrected ds_write_src2_* to require one offset instead of two. Fixed bug 32551: https://bugs.llvm.org//show_bug.cgi?id=32551 Reviewers: vpykhtin Differential Revision: https://reviews.llvm.org/D31809 llvm-svn: 300319	2017-04-14 12:28:07 +00:00
Dmitry Preobrazhensky	5714860ee4	[AMDGPU][MC] Enabled constants for src operands of s_cbranch_g_fork Fixed bug 32619: https://bugs.llvm.org//show_bug.cgi?id=32619 Reviewers: artem.tamazov, vpykhtin Differential Revision: https://reviews.llvm.org/D31973 llvm-svn: 300318	2017-04-14 11:52:26 +00:00
Stanislav Mekhanoshin	86b0a5465b	[AMDGPU] added SIInstrInfo::getAddNoCarry() helper Addressed rest of post submit comments from D31993. Differential Revision: https://reviews.llvm.org/D32057 llvm-svn: 300288	2017-04-14 00:33:44 +00:00
Konstantin Zhuravlyov	d24aeb20fc	AMDGPU/GFX9: Do not use v_pack_b32_f16 when packing Differential Revision: https://reviews.llvm.org/D31819 llvm-svn: 300275	2017-04-13 23:17:00 +00:00
Reid Kleckner	f021fab2af	[IR] Make getParamAttributes take argument numbers, not ArgNo+1 Add hasParamAttribute() and use it instead of hasAttribute(ArgNo+1, Kind) everywhere. The fact that the AttributeList index for an argument is ArgNo+1 should be a hidden implementation detail. NFC llvm-svn: 300272	2017-04-13 23:12:13 +00:00
Reid Kleckner	dbc9ba3061	Fix -Wunused-value warning llvm-svn: 300254	2017-04-13 20:32:58 +00:00
Stanislav Mekhanoshin	d026f79bd3	[AMDGPU] Combine DS operations with offsets bigger than byte In many cases ds operations can be combined even if offsets do not fit into 8 bit encoding. What it takes is to adjust base address. Differential Revision: https://reviews.llvm.org/D31993 llvm-svn: 300227	2017-04-13 17:53:07 +00:00
Wei Ding	74da350b85	AMDGPU : Fix common dominator of two incoming blocks terminates with uniform branch issue. Differential Revision: http://reviews.llvm.org/D31350 llvm-svn: 300142	2017-04-12 23:51:47 +00:00
Matt Arsenault	0d0d6c2f25	AMDGPU: Fix invalid copies when copying i1 to phys reg Insert a VReg_1 virtual register so the i1 workaround pass can handle it. llvm-svn: 300113	2017-04-12 21:58:23 +00:00
Stanislav Mekhanoshin	c90347d760	[AMDGPU] Generate range metadata for workitem id If workgroup size is known inform llvm about range returned by local id and local size queries. Differential Revision: https://reviews.llvm.org/D31804 llvm-svn: 300102	2017-04-12 20:48:56 +00:00
Dmitry Preobrazhensky	14104e0d0f	[AMDGPU][MC] Added support for several VI-specific opcodes (s_wakeup, etc) Added support for VI: - s_endpgm_saved - s_wakeup - s_rfe_restore_b64 - v_perm_b32 Enabled for VI: - v_mov_fed_b32 - v_mov_fed_b32_e64 See bug 32593: https://bugs.llvm.org//show_bug.cgi?id=32593 Reviewers: artem.tamazov, vpykhtin Differential Revision: https://reviews.llvm.org/D31931 llvm-svn: 300076	2017-04-12 17:10:07 +00:00
Dmitry Preobrazhensky	5ac9fd64a3	[AMDGPU][MC] Corrected parsing of v_cmp_class* and v_cmpx_class* Fixed bug 32565: https://bugs.llvm.org//show_bug.cgi?id=32565 Reviewers: vpykhtin Differential Revision: https://reviews.llvm.org/D31820 llvm-svn: 300073	2017-04-12 16:31:18 +00:00
Dmitry Preobrazhensky	3bff0c8c59	[AMDGPU][MC] Corrected encoding of V_MQSAD_U32_U8 for CI Corrected encoding of V_MQSAD_U32_U8 for CI See bug 32552: https://bugs.llvm.org//show_bug.cgi?id=32552 Reviewers: vpykhtin Differential Revision: https://reviews.llvm.org/D31810 llvm-svn: 300070	2017-04-12 15:36:09 +00:00
Dmitry Preobrazhensky	7184c44d66	[AMDGPU][MC] Corrected ds_wrxchg2* to support two offsets Fixed bug 28227: https://bugs.llvm.org//show_bug.cgi?id=28227 Reviewers: vpykhtin Differential Revision: https://reviews.llvm.org/D31808 llvm-svn: 300066	2017-04-12 14:29:45 +00:00
Dmitry Preobrazhensky	12194e9bec	[AMDGPU][MC] Corrected src0 size for s_cbranch_join Fix for bug 28159: https://bugs.llvm.org//show_bug.cgi?id=28159 Reviewers: vpykhtin, arsenm Differential Revision: https://reviews.llvm.org/D31595 llvm-svn: 300055	2017-04-12 12:40:19 +00:00
Sam Kolton	aff8341da2	[AMDGPU] SDWA: make pass global Summary: Remove checks for basic blocks. Reviewers: vpykhtin, rampitec, arsenm Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye Differential Revision: https://reviews.llvm.org/D31935 llvm-svn: 300040	2017-04-12 09:36:05 +00:00
Kannan Narayanan	acb089e12a	[AMDGPU] Add a new pass to insert waitcnts. Leave under an option for testing. Based on comments in https://reviews.llvm.org/D31161. llvm-svn: 300023	2017-04-12 03:25:12 +00:00
Matt Arsenault	9ac40026dd	AMDGPU: Insert wait at start of callee functions llvm-svn: 300000	2017-04-11 22:29:31 +00:00
Matt Arsenault	efa9f4b210	AMDGPU: Refactor SIMachineFunctionInfo slightly Prepare for handling non-entry functions. llvm-svn: 299999	2017-04-11 22:29:28 +00:00
Matt Arsenault	e622dc3803	AMDGPU: Refactor argument lowering Split into smaller functions and prepare for handling non-entry functions. llvm-svn: 299998	2017-04-11 22:29:24 +00:00
Matt Arsenault	fe78ffba92	AMDGPU: Fix folding reg_sequence into copy to phys reg This was producing an illegal reg_sequence defining a physical register with virtual register inputs. llvm-svn: 299997	2017-04-11 22:29:19 +00:00
Matt Arsenault	978b1667d2	AMDGPU: Prune unecessary include llvm-svn: 299996	2017-04-11 22:29:16 +00:00
Yaxun Liu	e95df719e1	[AMDGPU] Add A5 to data layout for amdgiz environment Differential Revision: https://reviews.llvm.org/D31589 llvm-svn: 299964	2017-04-11 17:18:13 +00:00
Vassil Vassilev	e1f12fadc0	Remove unused functions. Remove static qualifier from functions in header files. NFC. llvm-svn: 299947	2017-04-11 14:55:32 +00:00
Matt Arsenault	678e111e11	AMDGPU: Fix crash when disassembling VOP3 mac The unused dummy src2_modifiers is missing, so it crashes when trying to print it. I tried to fully remove src2_modifiers, but there are some irritations in the places where it is converted to mad since it starts to require modifying use lists while iterating over them. llvm-svn: 299861	2017-04-10 17:58:06 +00:00
Matt Arsenault	dd8fd9dcfd	AMDGPU: Actually write nops for writeNopData Before this was just writing 0s, which ends up looking like a v_cndmask_b32 v0, s0, v0, vcc. Write out an encoded s_nop instead. llvm-svn: 299816	2017-04-08 21:28:38 +00:00
Stanislav Mekhanoshin	478b81982f	[AMDGPU] Unroll more to eliminate phis and conditions Increase threshold to unroll a loop which contains an "if" statement whose condition defined by a PHI belonging to the loop. This may help to eliminate if region and potentially even PHI itself, saving on both divergence and registers used for the PHI. Add a small bonus for each of such "if" statements. Differential Revision: https://reviews.llvm.org/D31693 llvm-svn: 299779	2017-04-07 16:26:28 +00:00
Dmitry Preobrazhensky	e5147247b8	[AMDGPU][MC] Fix for Bug 28211 + LIT tests - corrected DS_GWS_* opcodes (see VI_Shader_Programming#16.pdf for detailed description) - address operand is not used - several opcodes have data operand - all opcodes have offset modifier - DS_AND_SRC2_B32: corrected typo in mnemo - DS_WRAP_RTN_F32 replaced with DS_WRAP_RTN_B32 - added CI/VI opcodes: - DS_CONDXCHG32_RTN_B64 - DS_GWS_SEMA_RELEASE_ALL - added VI opcodes: - DS_CONSUME - DS_APPEND - DS_ORDERED_COUNT Differential Revision: https://reviews.llvm.org/D31707 llvm-svn: 299767	2017-04-07 13:07:13 +00:00
Sam Kolton	6e79529db4	[AMDGPU] Move SiShrinkInstruction and SDWAPeephole to SSAOptimization passes Summary: Difference beetween PreRegAlloc() and MachineSSAOptimization() are that the former is run despite of -O0 optimization level. In my undestanding SiShrinkInstructions and SDWAPeephole shouldn't run when optimizations are disabled. With this change order of passes will not change. Reviewers: arsenm, vpykhtin, rampitec Subscribers: qcolombet, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye Differential Revision: https://reviews.llvm.org/D31705 llvm-svn: 299757	2017-04-07 10:53:12 +00:00
Konstantin Zhuravlyov	4b3847e865	AMDGPU/GFX9: Fix shared and private aperture queries Differential Revision: https://reviews.llvm.org/D31786 llvm-svn: 299727	2017-04-06 23:02:33 +00:00
Matt Arsenault	21a438255d	AMDGPU: Diagnose illegal SGPR to VGPR copies This is possible in ways that are not compiler bugs, so stop asserting on them. This emits an extra error when emitting objects when it can't encode the new pseudo, but I'm not sure that matters. llvm-svn: 299712	2017-04-06 21:09:53 +00:00
Matt Arsenault	5cf4271883	AMDGPU: Replace fp16SrcZerosHighBits with a whitelist FCOPYSIGN is lowered to bit operations which don't clear the high bits. llvm-svn: 299708	2017-04-06 20:58:30 +00:00
Yaxun Liu	76ae47cb35	[AMDGPU] Temporarily change constant address space from 4 to 2 Our final address space mapping is to let constant address space to be 4 to match nvptx. However for now we will make it 2 to avoid unnecessary work in FE/BE/devlib about intrinsics returning constant pointers. Differential Revision: https://reviews.llvm.org/D31770 llvm-svn: 299690	2017-04-06 19:17:32 +00:00
Matt Arsenault	dd10884e9d	AMDGPU: Stop using CCAssignToRegWithShadow This does not do what it is attempting to use it for and requires working around in LowerFormalArguments. llvm-svn: 299667	2017-04-06 17:37:27 +00:00
Stanislav Mekhanoshin	ea57c38521	[AMDGPU] Eliminate barrier if workgroup size is not greater than wavefront size If a workgroup size is known to be not greater than wavefront size the s_barrier instruction is not needed since all threads are guarantied to come to the same point at the same time. Differential Revision: https://reviews.llvm.org/D31731 llvm-svn: 299659	2017-04-06 16:48:30 +00:00
Sam Kolton	9fa169601f	[AMDGPU] Resubmit SDWA peephole: enable by default Reviewers: vpykhtin, rampitec, arsenm Subscribers: qcolombet, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye Differential Revision: https://reviews.llvm.org/D31671 llvm-svn: 299654	2017-04-06 15:03:28 +00:00
Ivan Krasin	d4f70c70b9	Revert r299536. [AMDGPU] SDWA peephole: enable by default. Reason: breaks multiple bots: http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux-fast/builds/3988 http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux-bootstrap/builds/1173 Original Review URL: https://reviews.llvm.org/D31671 llvm-svn: 299583	2017-04-05 19:58:12 +00:00
Dmitry Preobrazhensky	3ac6311a8d	[AMDGPU][MC] Fix for Bug 28158 + LIT tests Added support of the following instructions: - s_cbranch_cdbgsys - s_cbranch_cdbgsys_and_user - s_cbranch_cdbgsys_or_user - s_cbranch_cdbguser - s_setkill Reviewers: vpykhtin Differential Revision: https://reviews.llvm.org/D31469 llvm-svn: 299567	2017-04-05 17:26:45 +00:00
Dmitry Preobrazhensky	45db65037f	[AMDGPU][MC] Fix for Bug 28167 + LIT tests Corrected src0 for v_writelane_b32: - Enabled inline constants and literals for SI/CI (VOP2) - Enabled inline constants for VI (VOP3) Reviewers: vpykhtin, arsenm https://reviews.llvm.org/D31463 llvm-svn: 299555	2017-04-05 16:08:21 +00:00
Sam Kolton	34e29784fb	[AMDGPU] SDWA peephole: enable by default Reviewers: vpykhtin, rampitec, arsenm Subscribers: qcolombet, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye Differential Revision: https://reviews.llvm.org/D31671 llvm-svn: 299536	2017-04-05 12:00:45 +00:00
Alex Bradbury	866113c2ea	Add MCContext argument to MCAsmBackend::applyFixup for error reporting A number of backends (AArch64, MIPS, ARM) have been using MCContext::reportError to report issues such as out-of-range fixup values in their TgtAsmBackend. This is great, but because MCContext couldn't easily be threaded through to the adjustFixupValue helper function from its usual callsite (applyFixup), these backends ended up adding an MCContext* argument and adding another call to applyFixup to processFixupValue. Adding an MCContext parameter to applyFixup makes this unnecessary, and even better - applyFixup can take a reference to MCContext rather than a potentially null pointer. Differential Revision: https://reviews.llvm.org/D30264 llvm-svn: 299529	2017-04-05 10:16:14 +00:00
Matt Arsenault	3e90f84806	AMDGPU: Remove legacy export intrinsic llvm-svn: 299444	2017-04-04 16:34:39 +00:00
Matt Arsenault	236da200f1	AMDGPU: Remove legacy image intrinsics llvm-svn: 299443	2017-04-04 16:34:35 +00:00
Matt Arsenault	b600e138cc	AMDGPU: Remove llvm.SI.vs.load.input llvm-svn: 299391	2017-04-03 21:45:13 +00:00
Matt Arsenault	754dd3eaef	AMDGPU: Remove legacy bfe intrinsics llvm-svn: 299372	2017-04-03 18:08:08 +00:00
Davide Italiano	c88169e61b	[AMDGPU] Garbage collect now unused dead code. NFCI. llvm-svn: 299310	2017-04-01 19:30:17 +00:00
Stanislav Mekhanoshin	12aa5b733e	[AMDGPU] Remove assumption that vector and scalar types do not alias Differential Revision: https://reviews.llvm.org/D31547 llvm-svn: 299250	2017-03-31 20:16:54 +00:00
Matt Arsenault	8edfaee7be	AMDGPU: Remove unnecessary ands when f16 is legal Add a new node to act as a fancy bitcast from f16 operations to i32 that implicitly zero the high 16-bits of the result. Alternatively could try making v2f16 legal and canonicalizing on build_vectors. llvm-svn: 299246	2017-03-31 19:53:03 +00:00
Jan Vesely	3c99441ef4	AMDGPU/R600: Fix amdgpu alias analysis pass. R600 uses higher AS number to access kernel parameters Fixes: r298846 Differential Revision: https://reviews.llvm.org/D31520 llvm-svn: 299245	2017-03-31 19:26:23 +00:00
Simon Pilgrim	3c81c34d8d	[DAGCombiner] Add vector demanded elements support to ComputeNumSignBits Currently ComputeNumSignBits returns the minimum number of sign bits for all elements of vector data, when we may only be interested in one/some of the elements. This patch adds a DemandedElts argument that allows us to specify the elements we actually care about. The original ComputeNumSignBits implementation calls with a DemandedElts demanding all elements to match current behaviour. Scalar types set this to 1. I've only added support for BUILD_VECTOR and EXTRACT_VECTOR_ELT so far, all others will default to demanding all elements but can be updated in due course. Followup to D25691. Differential Revision: https://reviews.llvm.org/D31311 llvm-svn: 299219	2017-03-31 13:54:09 +00:00
Sam Kolton	27e0f8bc72	[AMDGPU] SDWA Peephole: improve search for immediates in SDWA patterns Previously compiler often extracted common immediates into specific register, e.g.: ``` %vreg0 = S_MOV_B32 0xff; %vreg2 = V_AND_B32_e32 %vreg0, %vreg1 %vreg4 = V_AND_B32_e32 %vreg0, %vreg3 ``` Because of this SDWA peephole failed to find SDWA convertible pattern. E.g. in previous example this could be converted into 2 SDWA src operands: ``` SDWA src: %vreg2 src_sel:BYTE_0 SDWA src: %vreg4 src_sel:BYTE_0 ``` With this change peephole check if operand is either immediate or register that is copy of immediate. llvm-svn: 299202	2017-03-31 11:42:43 +00:00
Simon Pilgrim	37b536e4b3	[DAGCombiner] Add vector demanded elements support to computeKnownBitsForTargetNode Follow up to D25691, this sets up the plumbing necessary to support vector demanded elements support in known bits calculations in target nodes. Differential Revision: https://reviews.llvm.org/D31249 llvm-svn: 299201	2017-03-31 11:24:16 +00:00
Matt Arsenault	1074cb5420	AMDGPU: Rename isKernel What we really want to do is distinguish functions that may be called by other functions, and graphics shaders are not called kernels. llvm-svn: 299140	2017-03-30 23:58:04 +00:00
Matt Arsenault	79f837c254	AMDGPU: Add all atomicrmw fields to atomic.inc/dec Add scope, order, isVolatile llvm-svn: 299122	2017-03-30 22:21:40 +00:00
Stanislav Mekhanoshin	89653dfd2a	[AMDGPU] Add GlobalOpt parameter to Always Inliner pass If set to false it does not remove global aliases. With this parameter set to false it should be safe to run the pass before link. Differential Revision: https://reviews.llvm.org/D31489 llvm-svn: 299108	2017-03-30 20:16:02 +00:00
Simon Pilgrim	b670ba4e87	[AMDGPU] Tidy up computeKnownBitsForTargetNode/ComputeNumSignBitsForTargetNode arguments. NFCI. Based on comment in D31249. llvm-svn: 298991	2017-03-29 12:09:25 +00:00
Stanislav Mekhanoshin	baf31ac7c8	[AMDGPU] Boost unroll threshold for loops reading local memory This is less important than increase threshold for private memory, but still brings performance improvements in a wide range of tests. Unrolling more for local memory serves three purposes: it allows to combine ds operations if offset becomes static, saves registers used for offsets in case of static offsets, and allows better lds latency hiding. Differential Revision: https://reviews.llvm.org/D31412 llvm-svn: 298948	2017-03-28 22:13:51 +00:00
Stanislav Mekhanoshin	b933c3f554	[AMDGPU] Fix recorded region boundaries in max-occupancy scheduler This is incorrect to record region boundaries before scheduling, it may change after scheduling. As a result second pass may see less instructions to schedule than it should. Differential Revision: https://reviews.llvm.org/D31434 llvm-svn: 298945	2017-03-28 21:48:54 +00:00
Stanislav Mekhanoshin	9053f22eeb	[AMDGPU] Split -amdgpu-early-inline-all option Previously it was covered by the internalization. It turns out we cannot run internalizer in FE, it break separate compilation tests. Thus early inliner gets its own option. Differential Revision: https://reviews.llvm.org/D31429 llvm-svn: 298935	2017-03-28 18:23:24 +00:00
Valery Pykhtin	9f3eca96eb	[AMDGPU] Update SI scheduler colorHighLatenciesGroups Depends on rL298896: MachineScheduler/ScheduleDAG: Add support for GetSubGraph Patch by Axel Davy (axel.davy@normalesup.org) Differential revision: https://reviews.llvm.org/D30152 llvm-svn: 298902	2017-03-28 07:19:48 +00:00
Valery Pykhtin	fb9905545c	[AMDGPU] SISched: Detect dependency types between blocks Patch by Axel Davy (axel.davy@normalesup.org) Differential revision: https://reviews.llvm.org/D30153 llvm-svn: 298872	2017-03-27 18:22:39 +00:00
Valery Pykhtin	ba3a4def29	[AMDGPU] SISched: Update colorEndsAccordingToDependencies Patch by Axel Davy (axel.davy@normalesup.org) Differential revision: https://reviews.llvm.org/D30150 llvm-svn: 298861	2017-03-27 17:26:40 +00:00
Valery Pykhtin	f70f683670	[AMDGPU] Fix SI scheduler LiveOut Refcount issue Patch by Axel Davy (axel.davy@normalesup.org) Differential revision: https://reviews.llvm.org/D30145 llvm-svn: 298857	2017-03-27 17:06:36 +00:00
Dmitry Preobrazhensky	c512d44845	[AMDGPU][MC] Fix for Bug 28207 + LIT tests Enabled clamp and omod for v_cvt_* opcodes which have src0 of an integer type Reviewers: vpykhtin, arsenm Differential Revision: https://reviews.llvm.org/D31327 llvm-svn: 298852	2017-03-27 15:57:17 +00:00
Yaxun Liu	1a14bfa022	[AMDGPU] Get address space mapping by target triple environment As we introduced target triple environment amdgiz and amdgizcl, the address space values are no longer enums. We have to decide the value by target triple. The basic idea is to use struct AMDGPUAS to represent address space values. For address space values which are not depend on target triple, use static const members, so that they don't occupy extra memory space and is equivalent to a compile time constant. Since the struct is lightweight and cheap, it can be created on the fly at the point of usage. Or it can be added as member to a pass and created at the beginning of the run* function. Differential Revision: https://reviews.llvm.org/D31284 llvm-svn: 298846	2017-03-27 14:04:01 +00:00
Yaxun Liu	14834c3e3d	[AMDGPU] Switch data layout by triple environment amdgiz Switch data layout by target triple environment amdgiz and amdgizcl indicating using of an address space mapping in which generic address space is 0. amdgiz is for non-OpenCL environment where generic address space is 0. amdgizcl is for OpenCL environment where generic address space is 0. Differential Revision: https://reviews.llvm.org/D31211 llvm-svn: 298758	2017-03-25 02:05:44 +00:00
Matt Arsenault	0607a4427b	AMDGPU: Fix annotating loops with nested loop conditions If the branch condition for a loop was a phi which itself was fed from a phi from a loop, it isn't safe to try to delete the phi until after the loop is handled. llvm-svn: 298737	2017-03-24 20:57:10 +00:00
Matt Arsenault	b5d23271e2	AMDGPU: Implement f16 fround llvm-svn: 298730	2017-03-24 20:04:18 +00:00
Matt Arsenault	b8f8dbc227	AMDGPU: Unify divergent function exits. StructurizeCFG can't handle cases with multiple returns creating regions with multiple exits. Create a copy of UnifyFunctionExitNodes that only unifies exit nodes that skips exit nodes with uniform branch sources. llvm-svn: 298729	2017-03-24 19:52:05 +00:00
Stanislav Mekhanoshin	70603dcef2	[AMDGPU] Fold V_CNDMASK with identical source operands Such instructions sometimes appear after lowering and folding. Differential Revision: https://reviews.llvm.org/D31318 llvm-svn: 298723	2017-03-24 18:55:20 +00:00
Konstantin Zhuravlyov	4986d9fb45	[AMDGPU] Rename Kind to ValueKind in metadata to be consistent llvm-svn: 298722	2017-03-24 18:43:15 +00:00
Stanislav Mekhanoshin	a27b2cac03	[AMDGPU] Add AMDGPUAliasAnalysis to opt pipeline Previously it was added only to the BE. Differential Revision: https://reviews.llvm.org/D31323 llvm-svn: 298721	2017-03-24 18:01:14 +00:00
Benjamin Kramer	80e3d5bb24	[AMDGPU] Don't enforce constexpr, there are still old standard libraries around that don't have a constexpr std::pair. llvm-svn: 298719	2017-03-24 17:53:06 +00:00
Valery Pykhtin	e2419dc907	[AMDGPU] Remove double map lookups in SI scheduler Patch by Axel Davy (axel.davy@normalesup.org) Differential revision: https://reviews.llvm.org/D30382 llvm-svn: 298718	2017-03-24 17:49:05 +00:00
Valery Pykhtin	f7d1023a73	[AMDGPU] Fix SGPR usage count in SI scheduler Patch by Axel Davy (axel.davy@normalesup.org) Differential revision: https://reviews.llvm.org/D30149 llvm-svn: 298710	2017-03-24 16:45:50 +00:00
Valery Pykhtin	57ab699933	[AMDGPU] Add a new line after a debug message Patch by Axel Davy (axel.davy@normalesup.org) Differential revision: https://reviews.llvm.org/D30146 llvm-svn: 298708	2017-03-24 16:37:48 +00:00
Benjamin Kramer	c06d672a7a	Don't build up std::vectors with constant sizes when an array suffices. NFC. llvm-svn: 298701	2017-03-24 14:11:47 +00:00
Konstantin Zhuravlyov	4cbb68959b	[AMDGPU] Do not emit isa info as code object metadata - It was decided to expose this information through other means (rocr) Differential Revision: https://reviews.llvm.org/D30970 llvm-svn: 298560	2017-03-22 23:27:09 +00:00
Konstantin Zhuravlyov	a780ffaac2	[AMDGPU] Emit kernel debug properties as code object metadata Differential Revision: https://reviews.llvm.org/D30969 llvm-svn: 298558	2017-03-22 23:10:46 +00:00
Konstantin Zhuravlyov	ca0e7f6472	[AMDGPU] Emit kernel code properties as code object metadata - These are not required for low level runtime Differential Revision: https://reviews.llvm.org/D29949 llvm-svn: 298556	2017-03-22 22:54:39 +00:00
Konstantin Zhuravlyov	7498cd61fb	[AMDGPU] Restructure code object metadata creation - Rename runtime metadata -> code object metadata - Make metadata not flow - Switch enums to use ScalarEnumerationTraits - Cleanup and move AMDGPUCodeObjectMetadata.h to AMDGPU/MCTargetDesc - Introduce in-memory representation for attributes - Code object metadata streamer - Create metadata for isa and printf during EmitStartOfAsmFile - Create metadata for kernel during EmitFunctionBodyStart - Finalize and emit metadata to .note during EmitEndOfAsmFile - Other minor improvements/bug fixes Differential Revision: https://reviews.llvm.org/D29948 llvm-svn: 298552	2017-03-22 22:32:22 +00:00
Konstantin Zhuravlyov	eb685e5f27	[AMDGPU] Fix bug 31610 Differential Revision: https://reviews.llvm.org/D31258 llvm-svn: 298551	2017-03-22 21:48:18 +00:00
Dmitry Preobrazhensky	895d377dc7	[AMDGPU][MC] Fix for Bug 28204 + LIT tests Fixed v_mad_i64_i32/u64_u32 encoding Reviewers: artem.tamazov Differential Revision: https://reviews.llvm.org/D30828 llvm-svn: 298502	2017-03-22 13:31:01 +00:00
Matt Arsenault	513cb7a87d	AMDGPU: Remove hasSideEffects from SI_RETURN_TO_EPILOG llvm-svn: 298454	2017-03-21 22:28:48 +00:00
Matt Arsenault	5b20fbb748	AMDGPU: Rename SI_RETURN This is used for a specific type of return to a shader part's epilog code. Rename to try avoiding confusion from a true call's return. llvm-svn: 298452	2017-03-21 22:18:10 +00:00
George Burgess IV	56c7e88c2c	Let llvm.objectsize be conservative with null pointers This adds a parameter to @llvm.objectsize that makes it return conservative values if it's given null. This fixes PR23277. Differential Revision: https://reviews.llvm.org/D28494 llvm-svn: 298430	2017-03-21 20:08:59 +00:00
Marek Olsak	5c7a61d221	AMDGPU: Buffer descriptor changes for GFX9 Reviewers: arsenm Subscribers: qcolombet, kzhuravl, wdng, nhaehnle, yaxunl, tony-tye, dstuttard, tpr Differential Revision: https://reviews.llvm.org/D31158 llvm-svn: 298397	2017-03-21 17:00:39 +00:00
Marek Olsak	e22fdb9cac	AMDGPU: Always use VGPR indexing on GFX9 Reviewers: arsenm Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, tony-tye, dstuttard, tpr Differential Revision: https://reviews.llvm.org/D31157 llvm-svn: 298396	2017-03-21 17:00:32 +00:00
Reid Kleckner	b518054b87	Rename AttributeSet to AttributeList Summary: This class is a list of AttributeSetNodes corresponding the function prototype of a call or function declaration. This class used to be called ParamAttrListPtr, then AttrListPtr, then AttributeSet. It is typically accessed by parameter and return value index, so "AttributeList" seems like a more intuitive name. Rename AttributeSetImpl to AttributeListImpl to follow suit. It's useful to rename this class so that we can rename AttributeSetNode to AttributeSet later. AttributeSet is the set of attributes that apply to a single function, argument, or return value. Reviewers: sanjoy, javed.absar, chandlerc, pete Reviewed By: pete Subscribers: pete, jholewinski, arsenm, dschuff, mehdi_amini, jfb, nhaehnle, sbc100, void, llvm-commits Differential Revision: https://reviews.llvm.org/D31102 llvm-svn: 298393	2017-03-21 16:57:19 +00:00
Matt Arsenault	5af82a7ae1	AMDGPU: Fix not including v2i16/v2f16 in register class llvm-svn: 298390	2017-03-21 16:42:50 +00:00
Matt Arsenault	f8fb605a68	AMDGPU: Fix asserting on 0 dmask for image intrinsics Fold these to undef during lowering so users get eliminated. llvm-svn: 298387	2017-03-21 16:32:17 +00:00
Valery Pykhtin	fd4c410f4d	[AMDGPU] Iterative scheduling infrastructure + minimal registry scheduler Differential revision: https://reviews.llvm.org/D31046 llvm-svn: 298368	2017-03-21 13:15:46 +00:00
Sam Kolton	f60ad58dad	[ADMGPU] SDWA peephole optimization pass. Summary: First iteration of SDWA peephole. This pass tries to combine several instruction into one SDWA instruction. E.g. it converts: ''' V_LSHRREV_B32_e32 %vreg0, 16, %vreg1 V_ADD_I32_e32 %vreg2, %vreg0, %vreg3 V_LSHLREV_B32_e32 %vreg4, 16, %vreg2 ''' Into: ''' V_ADD_I32_sdwa %vreg4, %vreg1, %vreg3 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD ''' Pass structure: 1. Iterate over machine instruction in basic block and try to apply "SDWA patterns" to each of them. SDWA patterns match machine instruction into either source or destination SDWA operand. E.g. ''' V_LSHRREV_B32_e32 %vreg0, 16, %vreg1''' is matched to source SDWA operand '''%vreg1 src_sel:WORD_1'''. 2. Iterate over found SDWA operands and find instruction that could be potentially coverted into SDWA. E.g. for source SDWA operand potential instruction are all instruction in this basic block that uses '''%vreg0''' 3. Iterate over all potential instructions and check if they can be converted into SDWA. 4. Convert instructions to SDWA. This review contains basic implementation of SDWA peephole pass. This pass requires additional testing fot both correctness and performance (no performance testing done). There are several ways this pass can be improved: 1. Make this pass work on whole function not only basic block. As I can see this can be done right now without changes to pass. 2. Introduce more SDWA patterns 3. Introduce mnemonics to limit when SDWA patterns should apply Reviewers: vpykhtin, alex-t, arsenm, rampitec Subscribers: wdng, nhaehnle, mgorny Differential Revision: https://reviews.llvm.org/D30038 llvm-svn: 298365	2017-03-21 12:51:34 +00:00
Konstantin Zhuravlyov	2534bc07f4	[AMDGPU] Run always inliner early in opt Differential Revision: https://reviews.llvm.org/D31141 llvm-svn: 298281	2017-03-20 18:06:45 +00:00
Dmitry Preobrazhensky	1e124e1825	[AMDGPU][MC] Fix for Bugs 28201, 28199, 28170 + LIT tests This fix enables sp3 abs modifier with constants Reviewers: artem.tamazov Differential Revision: https://reviews.llvm.org/D30825 llvm-svn: 298265	2017-03-20 16:33:20 +00:00
Dmitry Preobrazhensky	40af9c35d3	[AMDGPU][MC] Fix for Bugs 28200, 28202 + LIT tests Fixed several related issues with VOP3 fp modifiers. Reviewers: artem.tamazov Differential Revision: https://reviews.llvm.org/D30821 llvm-svn: 298255	2017-03-20 14:50:35 +00:00
Konstantin Zhuravlyov	8a67eb144f	Revert "[AMDGPU] Run always inliner early in opt" This reverts commit r297958, it breaks device-libs build. llvm-svn: 298239	2017-03-20 09:26:08 +00:00
Simon Pilgrim	5fa1b9a12f	Fix MSVC warning: "switch statement contains 'default' but no 'case' labels". NFCI. llvm-svn: 298225	2017-03-19 16:39:04 +00:00
Stanislav Mekhanoshin	8e45acfc38	[AMDGPU] Add address space based alias analysis pass This is direct port of HSAILAliasAnalysis pass, just cleaned for style and renamed. Differential Revision: https://reviews.llvm.org/D31103 llvm-svn: 298172	2017-03-17 23:56:58 +00:00
Matt Arsenault	59ece95f6c	AMDGPU: Fix broken condition in hazard recognizer Fixes bug 32248. llvm-svn: 298125	2017-03-17 21:36:28 +00:00
Matt Arsenault	e70d5dcf3e	AMDGPU: Fix handling of constant phi input loop conditions If the loop condition was an i1 phi with a constantexpr input, this would add a loop intrinsic fed by a phi dependent on a call to if.break in the same block. Insert the call in the loop header. llvm-svn: 298121	2017-03-17 20:52:21 +00:00
Matt Arsenault	c5b641ac02	AMDGPU: Cleanup control flow intrinsics Move backend internal intrinsics along with the rest of the normal intrinsics, and use the Intrinsic::getDeclaration API instead of manually constructing the type list. It's surprising this was working before. fdiv.fast had the wrong number of parameters. The control flow intrinsic declaration attributes were not being applied, and their types were inconsistent. The actual IR use types did not match the declaration, and were closer to the types used for the patterns. The brcond lowering was changing the types, so introduce new nodes for those. llvm-svn: 298119	2017-03-17 20:41:45 +00:00
Stanislav Mekhanoshin	ee2dd785f6	Only unswitch loops with uniform conditions Loop unswitching can be extremely harmful for a SIMT target. In case if hoisted condition is not uniform a SIMT machine will execute both clones of a loop sequentially. Therefor LoopUnswitch checks if the condition is non-divergent. Since DivergenceAnalysis adds an expensive PostDominatorTree analysis not needed for non-SIMT targets a new option is added to avoid unneded analysis initialization. The method getAnalysisUsage is called when TargetTransformInfo is not yet available and we cannot use it here. For that reason a new field DivergentTarget is added to PassManagerBuilder to control the behavior and set this field from a target. Differential Revision: https://reviews.llvm.org/D30796 llvm-svn: 298104	2017-03-17 17:13:41 +00:00
Stanislav Mekhanoshin	f80507979d	[AMDGPU] Run always inliner early in opt We can mark functions to always inline early in the opt. Since we do not have call support this early inlining creates opportunities for inter-procedural optimizations which would not occur otherwise. Differential Revision: https://reviews.llvm.org/D31016 llvm-svn: 297958	2017-03-16 16:11:46 +00:00
Matt Arsenault	7dc01c96ae	AMDGPU: Allow sinking of addressing modes for atomic_inc/dec llvm-svn: 297913	2017-03-15 23:15:12 +00:00
Matt Arsenault	86e02ce2dc	AMDGPU: Fix unnecessary ands when packing f16 vectors computeKnownBits didn't handle fp_to_fp16 to report the high bits as 0. ARM maps the generic node to an instruction that does not modify the high bits of the register, so introduce a target node where the high bits are known 0. llvm-svn: 297873	2017-03-15 19:04:26 +00:00
Matt Arsenault	0e6e018054	AMDGPU: Minor SIAnnotateControlFlow cleanups Newline fixes, early return, range loops. llvm-svn: 297865	2017-03-15 18:00:12 +00:00
Sanjay Patel	fa929a2134	Cyle -> Cycle; NFCI llvm-svn: 297846	2017-03-15 15:37:42 +00:00
Simon Pilgrim	6778b8f715	Reverted unintended commit llvm-svn: 297841	2017-03-15 14:47:30 +00:00
Simon Pilgrim	3804a12fc3	Fix Wint-in-bool-context warning (PR32248) llvm-svn: 297840	2017-03-15 14:38:19 +00:00
Matt Arsenault	747bf8afa8	AMDGPU: Re-use TM.getNullPointerValue llvm-svn: 297662	2017-03-13 20:18:14 +00:00
Matt Arsenault	971c85ebb4	AMDGPU: Treat 0 as private null pointer in addrspacecast lowering llvm-svn: 297658	2017-03-13 19:47:31 +00:00
Matt Arsenault	dd905b0e9b	AMDGPU: Remove packf16 intrinsic llvm-svn: 297557	2017-03-11 05:51:16 +00:00
Matt Arsenault	3cb9ff8863	AMDGPU: Keep track of modifiers when converting v_mac to v_mad Since v_max_f32_e64/v_max_f16_e64 can be folded if the target instruction supports the clamp bit, we also need to maintain modifiers when converting v_mac to v_mad. This fixes a rendering issue with Dirt Rally because a v_mac instruction with the clamp bit set was converted to a v_mad but that bit was lost during the conversion. Fixes: e184e01dd79 ("AMDGPU: Fold FP clamp as modifier bit") Patch by Samuel Pitoiset <samuel.pitoiset@gmail.com> llvm-svn: 297556	2017-03-11 05:40:40 +00:00
Stanislav Mekhanoshin	79da2a7698	[AMDGPU] Remove getBidirectionalReasonRank This method inverts the Reason field of a scheduling candidate. It does right comparison between RegCritical and RegExcess, but everything else is broken. In fact it can prefer less strong reason such as Weak over RegCritical because Weak > -RegCritical. The CandReason enum is properly sorted, so just remove artificial ranking. Differential Revision: https://reviews.llvm.org/D30557 llvm-svn: 297536	2017-03-11 00:29:27 +00:00
Konstantin Zhuravlyov	ffdb00eda9	[AMDGPU] Split R600/SI getFrameIndexReference and emit stack object offsets for SI Differential Revision: https://reviews.llvm.org/D29674 llvm-svn: 297499	2017-03-10 19:39:07 +00:00
Yaxun Liu	874d26a89d	Rename PT_NOTE namespace name used in AMDGPUPTNote.h Patch by Guansong Zhang. Differential Revision: https://reviews.llvm.org/D30750 llvm-svn: 297498	2017-03-10 19:35:43 +00:00
Changpeng Fang	1be9b9f816	AMDGPU/SI: Disable unrolling in the loop vectorizer if the loop is not vectorized. Reviewers: arsenm Differential Revision: http://reviews.llvm.org/D30719 llvm-svn: 297328	2017-03-09 00:07:00 +00:00
Matt Arsenault	52d1b62a28	AMDGPU: Don't wait at end of block with a trivial successor If there is only one successor, and that successor only has one predecessor the wait can obviously be delayed until uses or the end of the next block. This avoids code quality regressions when there are trivial fallthrough blocks inserted for structurization. llvm-svn: 297251	2017-03-08 01:06:58 +00:00
Matt Arsenault	d8ed207a20	AMDGPU: Constant fold rcp node When doing arcp optimization with a constant denominator, this was leaving behind rcps with constant inputs. llvm-svn: 297248	2017-03-08 00:48:46 +00:00
Changpeng Fang	6b49fa4ca7	AMDGPU/SI: Do not insert EndCf in an unreachable block Reviewers: arsenm Differential Revision: http://reviews.llvm.org/D22025 llvm-svn: 297243	2017-03-07 23:29:36 +00:00
Daniel Sanders	52b4ce727a	Recommit: [globalisel] Change LLT constructor string into an LLT-based object that knows how to generate it. Summary: This will allow future patches to inspect the details of the LLT. The implementation is now split between the Support and CodeGen libraries to allow TableGen to use this class without introducing layering concerns. Thanks to Ahmed Bougacha for finding a reasonable way to avoid the layering issue and providing the version of this patch without that problem. The problem with the previous commit appears to have been that TableGen was including CodeGen/LowLevelType.h instead of Support/LowLevelTypeImpl.h. Reviewers: t.p.northover, qcolombet, rovka, aditya_nandakumar, ab, javed.absar Subscribers: arsenm, nhaehnle, mgorny, dberris, llvm-commits, kristof.beyls Differential Revision: https://reviews.llvm.org/D30046 llvm-svn: 297241	2017-03-07 23:20:35 +00:00
Daniel Sanders	8ebec37d26	Revert r297177: Change LLT constructor string into an LLT-based object ... More module problems. This time it only showed up in the stage 2 compile of clang-x86_64-linux-selfhost-modules-2 but not the stage 1 compile. Somehow, this change causes the build to need Attributes.gen before it's been generated. llvm-svn: 297188	2017-03-07 19:21:23 +00:00
Daniel Sanders	8612326a08	[globalisel] Change LLT constructor string into an LLT-based object that knows how to generate it. Summary: This will allow future patches to inspect the details of the LLT. The implementation is now split between the Support and CodeGen libraries to allow TableGen to use this class without introducing layering concerns. Thanks to Ahmed Bougacha for finding a reasonable way to avoid the layering issue and providing the version of this patch without that problem. Reviewers: t.p.northover, qcolombet, rovka, aditya_nandakumar, ab, javed.absar Subscribers: arsenm, nhaehnle, mgorny, dberris, llvm-commits, kristof.beyls Differential Revision: https://reviews.llvm.org/D30046 llvm-svn: 297177	2017-03-07 18:32:25 +00:00
Konstantin Zhuravlyov	e8aaab8abe	Revert "AMDGPU: Set MCAsmInfo::PointerSize" It breaks line tables because the patch is not complete, working on a complete one at the moment This reverts commit r294031. llvm-svn: 297118	2017-03-07 04:44:33 +00:00
Jan Vesely	3ea1704434	AMDGPU/R600: Fix ALU clause markers use detection also exit early on kill instead of redefinition. Differential Revision: https://reviews.llvm.org/D30230 llvm-svn: 297060	2017-03-06 20:10:05 +00:00
Krzysztof Parzyszek	cc31871dc4	Make TargetInstrInfo::isPredicable take a const reference, NFC llvm-svn: 296901	2017-03-03 18:30:54 +00:00
Dmitry Preobrazhensky	03880f8d24	[AMDGPU][MC] Fix for Bug 30829 + LIT tests Added code to check constant bus restrictions for VOP formats (only one SGPR value or literal-constant may be used by the instruction). Note that the same checks are performed by SIInstrInfo::verifyInstruction (used by lowering code). Added LIT tests. llvm-svn: 296873	2017-03-03 14:31:06 +00:00
Matt Arsenault	31a58c6ac0	AMDGPU: Fix missing dominator tree dependency llvm-svn: 296842	2017-03-02 23:50:51 +00:00
Matt Arsenault	8f016df1ed	AMDGPU: Fix types for VOP_I16_I16_I16 llvm-svn: 296523	2017-02-28 21:31:45 +00:00
Matt Arsenault	4d263f6f18	AMDGPU: Add definition for v_swap_b32 This is somewhat tricky because there are two pairs of tied operands, and it isn't allowed to be VOP3 encoded. llvm-svn: 296519	2017-02-28 21:09:04 +00:00
Matt Arsenault	03612631cb	AMDGPU: Add definition for v_xad_u32 llvm-svn: 296515	2017-02-28 20:27:30 +00:00
Matt Arsenault	781249833b	AMDGPU: Add ds_nop to assembler llvm-svn: 296513	2017-02-28 20:15:46 +00:00
Matt Arsenault	dedc544ac7	AMDGPU: Add definitions for ds_{read\|write}_b{96\|128} It's not clear to me if this is always better than doing ds_write2_b64 This adds the constraint of a 128-bit register input instead of a pair of 64-bit. llvm-svn: 296512	2017-02-28 20:15:43 +00:00
Stanislav Mekhanoshin	357d3db0a4	[AMDGPU] Add second pass of the scheduler If during scheduling we have identified that we cannot keep optimistic occupancy increase critical register pressure limit and try scheduling of the whole function again. In this case blocks with smaller pressure will have a chance for better scheduling. Differential Revision: https://reviews.llvm.org/D30442 llvm-svn: 296506	2017-02-28 19:20:33 +00:00
Stanislav Mekhanoshin	282e8e4a72	[AMDGPU] New method to estimate register pressure This change introduces new method to estimate register pressure in GCNScheduler. Standard RPTracker gives huge error due to the following reasons: 1. It does not account for live-ins or live-outs if value is not used in the region itself. That creates a huge error in a very common case if there are a lot of live-thu registers. 2. It does not properly count subregs. 3. It assumes a register used as an input operand can be reused as an output. This is not always possible by itself, this is not what RA will finally do in many cases for various reasons not limited to RA's inability to do so, and this is not so if the value is actually a live-thu. In addition we can now see clear separation between live-in pressure which we cannot change with the scheduling and tentative pressure which we can change. Differential Revision: https://reviews.llvm.org/D30439 llvm-svn: 296491	2017-02-28 17:22:39 +00:00
Konstantin Zhuravlyov	182e9cc6d5	[AMDGPU] Change amd_kernel_code_t's minor version to 1 - We do emit amd_kernel_code_t v1.1 Differential Revision: https://reviews.llvm.org/D30433 llvm-svn: 296489	2017-02-28 17:17:52 +00:00
Stanislav Mekhanoshin	080889cad7	[AMDGPU] Fix read-undef flags when schedule is reverted If two subregs of the same register are defined and we need to revert schedule changing def order, we will end up with both instructions having def,read-undef flags because adjustLaneLiveness() will only set this flag but will not remove it. Fix this by removing read-undef flags before calling adjustLaneLiveness. Differential Revision: https://reviews.llvm.org/D30428 llvm-svn: 296484	2017-02-28 16:26:27 +00:00
Daniel Sanders	983c9b98e9	Revert r296474 - [globalisel] Change LLT constructor string into an LLT subclass that knows how to generate it. There's a circular dependency that's only revealed when LLVM_ENABLE_MODULES=1. llvm-svn: 296478	2017-02-28 15:00:27 +00:00
Daniel Sanders	a5afdefec6	[globalisel] Change LLT constructor string into an LLT subclass that knows how to generate it. Summary: This will allow future patches to inspect the details of the LLT. The implementation is now split between the Support and CodeGen libraries to allow TableGen to use this class without introducing layering concerns. Thanks to Ahmed Bougacha for finding a reasonable way to avoid the layering issue and providing the version of this patch without that problem. Reviewers: t.p.northover, qcolombet, rovka, aditya_nandakumar, ab, javed.absar Subscribers: arsenm, nhaehnle, mgorny, dberris, llvm-commits, kristof.beyls Differential Revision: https://reviews.llvm.org/D30046 llvm-svn: 296474	2017-02-28 14:21:31 +00:00
Matt Arsenault	10268f93e8	AMDGPU: Use v_med3_{f16\|i16\|u16} llvm-svn: 296401	2017-02-27 22:40:39 +00:00
Matt Arsenault	eb522e68bc	AMDGPU: Support v2i16/v2f16 packed operations llvm-svn: 296396	2017-02-27 22:15:25 +00:00
Matt Arsenault	c9f2517e96	AMDGPU: Add some of the new gfx9 VOP3 instructions llvm-svn: 296382	2017-02-27 21:04:41 +00:00
Matt Arsenault	7596f13d15	AMDGPU: Support inlineasm for packed instructions Add packed types as legal so they may be used with inlineasm. Keep all operations expanded for now. llvm-svn: 296379	2017-02-27 20:52:10 +00:00
Matt Arsenault	2ed2193218	AMDGPU: Don't fold immediate if clamp/omod are set Doesn't fix any practical problems because clamp/omod are currently folded after peephole optimizer. llvm-svn: 296375	2017-02-27 20:21:31 +00:00
Matt Arsenault	3cb390498e	AMDGPU: Fold omod into instructions llvm-svn: 296372	2017-02-27 19:35:42 +00:00
Matt Arsenault	e2d1d3a940	AMDGPU: Add f16 to shader calling conventions Mostly useful for writing tests for f16 features. llvm-svn: 296370	2017-02-27 19:24:47 +00:00
Matt Arsenault	9be7b0d485	AMDGPU: Add VOP3P instruction format Add a few non-VOP3P but instructions related to packed. Includes hack with dummy operands for the benefit of the assembler llvm-svn: 296368	2017-02-27 18:49:11 +00:00
Konstantin Zhuravlyov	972948b36e	[AMDGPU] Runtime metadata fixes: - Verify that runtime metadata is actually valid runtime metadata when assembling, otherwise we could accept the following when assembling, but ocl runtime will reject it: .amdgpu_runtime_metadata { amd.MDVersion: [ 2, 1 ], amd.RandomUnknownKey, amd.IsaInfo: ... - Make IsaInfo optional, and always emit it. Differential Revision: https://reviews.llvm.org/D30349 llvm-svn: 296324	2017-02-27 07:55:17 +00:00
Wei Ding	4d3d4ca1b3	AMDGPU : Replace FMAD with FMA when denormals are enabled. Differential Revision: http://reviews.llvm.org/D29958 llvm-svn: 296186	2017-02-24 23:00:29 +00:00
Stanislav Mekhanoshin	42259cf35e	Revert "Correct register pressure calculation in presence of subregs" This reverts commit r296009. It broke one out of tree target and also does not account for all partial lines added or removed when calculating PressureDiff. llvm-svn: 296182	2017-02-24 21:56:16 +00:00
Stanislav Mekhanoshin	78468e48cf	[AMDGPU] Shut the warning "getRegUnitWeight hides overload...". NFC. Clang issues warning about hidden overload. That was intended, so add "using AMDGPUGenRegisterInfo::getRegUnitWeight;" to mute it. llvm-svn: 296021	2017-02-23 21:51:28 +00:00
Stanislav Mekhanoshin	ce3ddd2de4	Correct register pressure calculation in presence of subregs If a subreg is used in an instruction it counts as a whole superreg for the purpose of register pressure calculation. This patch corrects improper register pressure calculation by examining operand's lane mask. Differential Revision: https://reviews.llvm.org/D29835 llvm-svn: 296009	2017-02-23 20:19:44 +00:00
Jan Vesely	70293a045b	AMDGPU/SI: Fix trunc i16 pattern Hit on ASICs that support 16bit instructions. Differential Revision: https://reviews.llvm.org/D30281 llvm-svn: 295990	2017-02-23 16:12:21 +00:00
Matt Arsenault	f0a88dbaab	LoadStoreVectorizer: Split even sized illegal chains properly Implement isLegalToVectorizeLoadChain for AMDGPU to avoid producing private address spaces accesses that will need to be split up later. This was doing the wrong thing in the case where the queried chain was an even number of elements. A possible <4 x i32> store was being split into store <2 x i32> store i32 store i32 rather than store <2 x i32> store <2 x i32> when legal. llvm-svn: 295933	2017-02-23 03:58:53 +00:00
Matt Arsenault	a9e16e6597	AMDGPU: Add another BFE pattern This is the pattern that falls out of the instruction's definition if offset == 0. llvm-svn: 295912	2017-02-23 00:23:43 +00:00
Matt Arsenault	79a45db7f5	AMDGPU: Use clamp with f64 llvm-svn: 295908	2017-02-22 23:53:37 +00:00
Matt Arsenault	d5c6515b68	AMDGPU: Fold FP clamp as modifier bit The manual is unclear on the details of this. It's not clear to me if denormals are not allowed with clamp, or if that is only omod. Not allowing denorms for fp16 or fp64 isn't useful so I also question if that is really a restriction. Same with whether this is valid without IEEE mode enabled. llvm-svn: 295905	2017-02-22 23:27:53 +00:00
Wei Ding	f2cce02eb2	AMDGPU : Update TrapCode based on Trap Handler ABI. Differential Revision: http://reviews.llvm.org/D30232 llvm-svn: 295904	2017-02-22 23:22:19 +00:00
Matt Arsenault	f5262256a1	AMDGPU: Add replacement bfe intrinsics llvm-svn: 295899	2017-02-22 23:04:58 +00:00
Matt Arsenault	7b6c5d28f5	AMDGPU: Don't add emergency stack slot if all spills are SGPR->VGPR This should avoid reporting any stack needs to be allocated in the case where no stack is truly used. An unused stack slot is still left around in other cases where there are real stack objects but no spilling occurs. llvm-svn: 295891	2017-02-22 22:23:32 +00:00
Matt Arsenault	93e65ea733	AMDGPU: Don't look at chain users when adjusting writemask Fixes not adjusting using new intrinsics with chains. llvm-svn: 295878	2017-02-22 21:16:41 +00:00
Matt Arsenault	707780b420	AMDGPU: Always allocate emergency stack slot at offset 0 This allows us to ensure that 0 is never a valid pointer to a user object, and ensures that the offset is always legal without needing a register to access it. This comes at the cost of usable offsets and wasted stack space. llvm-svn: 295877	2017-02-22 21:05:25 +00:00
Matt Arsenault	61ec6a03ca	AMDGPU: Change exp with compr bit printing llvm-svn: 295873	2017-02-22 20:37:12 +00:00
Wei Ding	6ade56e0a0	Revert "AMDGPU : Update TrapCode based on Trap Handler ABI." This reverts commit r295867. llvm-svn: 295871	2017-02-22 20:29:22 +00:00
Wei Ding	4991d3570f	AMDGPU : Update TrapCode based on Trap Handler ABI. Differential Revision: http://reviews.llvm.org/D30232 llvm-svn: 295867	2017-02-22 20:05:06 +00:00
Matt Arsenault	1f17c66890	AMDGPU: Add cvt.pkrtz intrinsic Convert llvm.SI.packf16 test uses llvm-svn: 295797	2017-02-22 00:27:34 +00:00
Matt Arsenault	9417505f7d	AMDGPU: Remove llvm.AMDGPU.clamp intrinsic llvm-svn: 295789	2017-02-21 23:46:04 +00:00
Matt Arsenault	2fdf2a1a18	AMDGPU: Redefine clamp node as clamp 0.0-1.0 Change implementation to use max instead of add. min/max/med3 do not flush denormals regardless of the mode, so it is OK to use it whether or not they are enabled. Also allow using clamp with f16, and use knowledge of dx10_clamp. llvm-svn: 295788	2017-02-21 23:35:48 +00:00
Matt Arsenault	7d6b71db4f	AMDGPU: Formatting fixes llvm-svn: 295783	2017-02-21 22:50:41 +00:00
Matt Arsenault	c2a44e4c3c	AMDGPU: Remove llvm.AMDGPU.flbit intrinsic llvm-svn: 295754	2017-02-21 19:27:33 +00:00
Matt Arsenault	e0bf7d02f0	AMDGPU: Don't use stack space for SGPR->VGPR spills Before frame offsets are calculated, try to eliminate the frame indexes used by SGPR spills. Then we can delete them after. I think for now we can be sure that no other instruction will be re-using the same frame indexes. It should be easy to notice if this assumption ever breaks since everything asserts if it tries to use a dead frame index later. The unused emergency stack slot seems to still be left behind, so an additional 4 bytes is still wasted. llvm-svn: 295753	2017-02-21 19:12:08 +00:00
Matt Arsenault	2021f08080	AMDGPU: Fix assembler subtarget predicate for gfx9 This was accepting GFX9 instructions on VI. llvm-svn: 295557	2017-02-18 19:12:26 +00:00
Matt Arsenault	a3b3b489fb	AMDGPU: Fix disassembly of aperture registers llvm-svn: 295555	2017-02-18 18:41:41 +00:00
Matt Arsenault	e823d92f7f	AMDGPU: Merge initial gfx9 support llvm-svn: 295554	2017-02-18 18:29:53 +00:00
Jan Vesely	4b1243facb	AMDGPU/R600: Assert on infinite loop in EmitClauseMarkers Differential Revision: https://reviews.llvm.org/D29792 llvm-svn: 295539	2017-02-18 04:24:10 +00:00
Matt Arsenault	f6cf1032fd	AMDGPU: Fix crashes on invalid icmp/fcmp intrinsics llvm-svn: 295489	2017-02-17 19:49:10 +00:00

... 6 7 8 9 10 ...

2284 Commits