llvm-project

Commit Graph

Author	SHA1	Message	Date
Matt Arsenault	7bedceb5b2	GlobalISel: moreElementsVector for G_LOAD/G_STORE AMDGPU change and test is a placeholder until a future patch with complete handling. llvm-svn: 367503	2019-08-01 01:44:22 +00:00
Matt Arsenault	d48324ff6f	Reapply "AMDGPU: Split block for si_end_cf" This reverts commit r359363, reapplying r357634 llvm-svn: 367500	2019-08-01 01:25:27 +00:00
Matt Arsenault	3594011de0	AMDGPU/GlobalISel: Select local loads llvm-svn: 367498	2019-08-01 00:53:38 +00:00
Stanislav Mekhanoshin	2594fa8593	[AMDGPU] Fix high occupancy calculation and print it We had couple places which still return 10 as a maximum occupancy. Fixed. Also print comment about occupancy as compiler see it. Differential Revision: https://reviews.llvm.org/D65423 llvm-svn: 367381	2019-07-31 01:07:10 +00:00
Matt Arsenault	52c262484f	TableGen: Add MinAlignment predicate AMDGPU uses some custom code predicates for testing alignments. I'm still having trouble comprehending the behavior of predicate bits in the PatFrag hierarchy. Any attempt to abstract these properties unexpectdly fails to apply them. llvm-svn: 367373	2019-07-31 00:14:43 +00:00
Stanislav Mekhanoshin	9aff33bb95	[AMDGPU] Print register pressure for agpr and vgpr separately Differential Revision: https://reviews.llvm.org/D65476 llvm-svn: 367355	2019-07-30 20:45:15 +00:00
Stanislav Mekhanoshin	450afcea39	[AMDGPU] Reserve all AGPRs on targets which do not have them Differential Revision: https://reviews.llvm.org/D65471 llvm-svn: 367347	2019-07-30 19:29:33 +00:00
Austin Kerbow	c99f62e313	[AMDGPU/GlobalISel] Add llvm.amdgcn.fdiv.fast legalization. Reviewers: arsenm Reviewed By: arsenm Subscribers: volkan, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, rovka, dstuttard, tpr, t-tye, hiraditya, Petar.Avramovic, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D64966 llvm-svn: 367344	2019-07-30 18:49:16 +00:00
Matt Arsenault	57ef94fb06	AMDGPU: Avoid emitting "true" predicates Empty condition strings are considerde always true. This removes a lot of clutter from the generated matcher tables. This shrinks the source size of AMDGPUGenDAGISel.inc from 7.3M to 6.1M. llvm-svn: 367326	2019-07-30 15:56:43 +00:00
Tom Stellard	cc0bc941d4	AMDGPU/LoadStoreOptimizer: combine MMOs when merging instructions Summary: The LoadStoreOptimizer was creating instructions with 2 MachineMemOperands, which meant they were assumed to alias with all other instructions, because MachineInstr:mayAlias() returns true when an instruction has multiple MachineMemOperands. This was preventing these instructions from being merged again, and was giving the scheduler less freedom to reorder them. Reviewers: arsenm, nhaehnle Reviewed By: arsenm Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D65036 llvm-svn: 367237	2019-07-29 16:40:58 +00:00
Jay Foad	3bdcedbf3d	[AMDGPU] Fix typo in error message llvm-svn: 367235	2019-07-29 16:17:13 +00:00
Jay Foad	dcb7532479	[DivergenceAnalysis] Add methods for querying divergence at use Summary: The existing isDivergent(Value) methods query whether a value is divergent at its definition. However even if a value is uniform at its definition, a use of it in another basic block can be divergent because of divergent control flow between the def and the use. This patch adds new isDivergent(Use) methods to DivergenceAnalysis, LegacyDivergenceAnalysis and GPUDivergenceAnalysis. This might allow D63953 or other similar workarounds to be removed. Reviewers: alex-t, nhaehnle, arsenm, rtaylor, rampitec, simoll, jingyue Reviewed By: nhaehnle Subscribers: jfb, jvesely, wdng, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D65141 llvm-svn: 367218	2019-07-29 10:22:09 +00:00
David Stuttard	20235ef3e7	[AMDGPU] Enable v4f16 and above for v_pk_fma instructions Summary: If isel is presented with <2 x half> vectors then it will correctly select v_pk_fma style instructions. If isel is presented with e.g. <4 x half> vectors it will scalarize, unlike for other instruction types (such as fadd, fmul etc.) Added extra support to enable this. Updated one of the tests to include a test for this (as well as extending the test to GFX9) Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, tpr, t-tye, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D65325 Change-Id: I50a4577a3f8223fb53992af3b7d26121f65b71ee llvm-svn: 367206	2019-07-29 08:15:10 +00:00
Michael Liao	711556e6a8	[AMDGPU] Fix typo. llvm-svn: 367131	2019-07-26 17:13:59 +00:00
Carl Ritson	0b28357053	[AMDGPU] Move WQM/WWM intrinsic instruction selection to AMDGPUISelDAGToDAG Reviewers: arsenm, nhaehnle Reviewed By: arsenm Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D65328 llvm-svn: 367105	2019-07-26 13:11:44 +00:00
Carl Ritson	00e89b428b	[AMDGPU] Add llvm.amdgcn.softwqm intrinsic Add llvm.amdgcn.softwqm intrinsic which behaves like llvm.amdgcn.wqm only if there is other WQM computation in the shader. Reviewers: nhaehnle, tpr Reviewed By: nhaehnle Subscribers: arsenm, kzhuravl, jvesely, wdng, yaxunl, dstuttard, t-tye, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D64935 llvm-svn: 367097	2019-07-26 09:54:12 +00:00
Matt Arsenault	a9ea8a9aae	AMDGPU/GlobalISel: Handle most function return types handleAssignments gives up pretty easily on structs, and i8 values for some reason. The other case that doesn't work is when an implicit sret needs to be inserted if the return size exceeds the number of return registers. llvm-svn: 367082	2019-07-26 02:36:05 +00:00
Michael Liao	53f967f2bd	[AMDGPU] Run `unreachable-mbb-elimination` after isel to clean up PHIs. Summary: - As LCSSA is turned on just before isel, it may create PHI of the flow, which is consumed by pseudo structurized CFG instructions. When that PHIs are eliminated in O0, COPY may be placed wrongly as the these pseudo structurized CFG instructions are considering prologue of MBB. - Run extra `unreachable-mbb-elimination` at the end of isel to clean up PHIs. Reviewers: arsenm, rampitec Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D64353 llvm-svn: 367023	2019-07-25 14:50:18 +00:00
Matt Arsenault	a85af76c72	AMDGPU: Don't assert on v4f16 arguments to shader calling conventions llvm-svn: 367018	2019-07-25 13:55:07 +00:00
Stanislav Mekhanoshin	c43784ff26	[AMDGPU] Increase kernel padding To support prefetch mode 3 we need to pad current cacheline and fill 3 cachelines after. Current padding is only sufficient for mode 2. Differential Revision: https://reviews.llvm.org/D65236 llvm-svn: 366938	2019-07-24 19:40:13 +00:00
Dmitry Preobrazhensky	5e1dd02c90	[AMDGPU][MC][GFX10] Enabled GFX10 assembly with arbitrary wavesize assumed by the code Reviewers: rampitec, arsenm Differential Revision: https://reviews.llvm.org/D65216 llvm-svn: 366921	2019-07-24 16:50:17 +00:00
Stanislav Mekhanoshin	5cdacea297	[AMDGPU] Add all vgpr classes to asm parser Differential Revision: https://reviews.llvm.org/D65158 llvm-svn: 366917	2019-07-24 16:21:18 +00:00
Matt Arsenault	0e7d8698b5	AMDGPU/GlobalISel: Don't assume instruction can be erased when selecting exts The G_ANYEXT handling can end up reaching selectCOPY, which mutates the instruction in place. llvm-svn: 366915	2019-07-24 16:05:53 +00:00
Simon Pilgrim	c60c12fb10	Fix MSVC warning about extending a uint32_t shift result to uint64_t. NFCI. llvm-svn: 366808	2019-07-23 14:04:54 +00:00
Matt Arsenault	827427f65b	AMDGPU: Don't use SDNodeXForm for DS offset output The xform has no real valuewhen it's using out of a complex pattern output. The complex pattern was already creating TargetConstants with i16, so this was just unnecessary machinery. This allows global isel to import the simple cases once the complex pattern is implemented. llvm-svn: 366743	2019-07-22 21:38:11 +00:00
Matt Arsenault	937d0ee5d8	AMDGPU/GlobalISel: Remove unnecessary code The minnum/maxnum case are dead, and the cvt is handled by the default. llvm-svn: 366685	2019-07-22 13:05:25 +00:00
Jay Foad	298500ae33	[AMDGPU] Save some work when an atomic op has no uses Summary: In the atomic optimizer, save doing a bunch of work and generating a bunch of dead IR in the fairly common case where the result of an atomic op (i.e. the value that was in memory before the atomic op was performed) is not used. NFC. Reviewers: arsenm, dstuttard, tpr Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, t-tye, hiraditya, jfb, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D64981 llvm-svn: 366667	2019-07-22 07:19:44 +00:00
Matt Arsenault	f3bfb85bce	AMDGPU/GlobalISel: Legalize GEP for other 32-bit address spaces llvm-svn: 366621	2019-07-19 22:28:44 +00:00
Stanislav Mekhanoshin	05d9e6a2a3	[AMDGPU] Autogenerate register sequences in tuples Differential Revision: https://reviews.llvm.org/D65007 llvm-svn: 366619	2019-07-19 21:43:42 +00:00
Stanislav Mekhanoshin	7b5a54e369	[AMDGPU] Fixed occupancy calculation for gfx10 Differential Revision: https://reviews.llvm.org/D65010 llvm-svn: 366616	2019-07-19 21:29:51 +00:00
Matt Arsenault	5e23f42820	AMDGPU: Avoid custom predicates for stores with glue llvm-svn: 366613	2019-07-19 21:01:30 +00:00
Matt Arsenault	e3401a9b86	AMDGPU: Redefine setcc condition PatLeafs Avoid using custom code predicates. llvm-svn: 366609	2019-07-19 20:24:40 +00:00
Matt Arsenault	48c0df5d46	AMDGPU: Don't rely on m0 being -1 for GWS offsets This only works if the high bits of m0 are also 0, so m0 would have to be set to 0xffff. llvm-svn: 366608	2019-07-19 20:01:24 +00:00
Matt Arsenault	85f3890126	AMDGPU: Force s_waitcnt after GWS instructions This is apparently required to be the immediately following instruction, so force it into a bundle with a waitcnt. llvm-svn: 366607	2019-07-19 19:47:30 +00:00
Stanislav Mekhanoshin	01fcf9238f	[AMDGPU] Allow register tuples to set asm names This change reverts most of the previous register name generation. The real problem is that RegisterTuple does not generate asm names. Added optional operand to RegisterTuple. This way we can simplify register name access and dramatically reduce the size of static tables for the backend. Differential Revision: https://reviews.llvm.org/D64967 llvm-svn: 366598	2019-07-19 18:05:01 +00:00
Matt Arsenault	7df225dfc2	AMDGPU/GlobalISel: Fix MMO flags for kernel argument loads The DAG lowering sets dereferencable and invariant, not nontemporal. llvm-svn: 366597	2019-07-19 17:52:56 +00:00
Matt Arsenault	08494f6231	AMDGPU/GlobalISel: Selection for fminnum/fmaxnum v2f16 case doesn't work yet because the VOP3P complex patterns haven't been ported yet. llvm-svn: 366585	2019-07-19 14:42:40 +00:00
Matt Arsenault	b60a2ae40e	AMDGPU/GlobalISel: Support arguments with multiple registers Handles structs used directly in argument lists. llvm-svn: 366584	2019-07-19 14:29:30 +00:00
Matt Arsenault	fecf43eba3	AMDGPU/GlobalISel: Rewrite lowerFormalArguments This should now handle everything except structs passed as multiple registers. I think most of the packing logic should be handled by handleAssignments, but I'm unclear on what the contract is for multiple registers. This is copying how x86 handles this. This does change the behavior of the test_sgpr_alignment0 amdgpu_vs test. I don't think shader arguments should try to follow the alignment, and registers need to be repacked. I also don't think it matters, since I think the pointers are packed to the beginning of the argument list anyway. llvm-svn: 366582	2019-07-19 14:15:18 +00:00
Matt Arsenault	1022c0dfde	AMDGPU: Decompose all values to 32-bit pieces for calling conventions This is the more natural lowering, and presents more opportunities to reduce 64-bit ops to 32-bit. This should also help avoid issues graphics shaders have had with 64-bit values, and simplify argument lowering in globalisel. llvm-svn: 366578	2019-07-19 13:57:44 +00:00
Dmitry Preobrazhensky	4ccb7f8c45	[AMDGPU][MC] Corrected parsing of branch offsets See bug 40820: https://bugs.llvm.org/show_bug.cgi?id=40820 Reviewers: artem.tamazov, arsenm Differential Revision: https://reviews.llvm.org/D64629 llvm-svn: 366571	2019-07-19 13:12:47 +00:00
Jay Foad	7d06ffff46	[AMDGPU] Simplify the exclusive scan used for optimized atomics Summary: Change the scan algorithm to use only power-of-two shifts (1, 2, 4, 8, 16, 32) instead of starting off shifting by 1, 2 and 3 and then doing a 3-way ADD, because: 1. It simplifies the compiler a little. 2. It minimizes vgpr pressure because each instruction is now of the form vn = vn + vn << c. 3. It is more friendly to the DPP combiner, which currently can't combine into an ADD3 instruction. Because of #2 and #3 the end result is improved from this: v_add_u32_dpp v4, v3, v3 row_shr:1 row_mask:0xf bank_mask:0xf bound_ctrl:0 v_mov_b32_dpp v5, v3 row_shr:2 row_mask:0xf bank_mask:0xf v_mov_b32_dpp v1, v3 row_shr:3 row_mask:0xf bank_mask:0xf v_add3_u32 v1, v4, v5, v1 s_nop 1 v_add_u32_dpp v1, v1, v1 row_shr:4 row_mask:0xf bank_mask:0xe s_nop 1 v_add_u32_dpp v1, v1, v1 row_shr:8 row_mask:0xf bank_mask:0xc s_nop 1 v_add_u32_dpp v1, v1, v1 row_bcast:15 row_mask:0xa bank_mask:0xf s_nop 1 v_add_u32_dpp v1, v1, v1 row_bcast:31 row_mask:0xc bank_mask:0xf To this: v_add_u32_dpp v1, v1, v1 row_shr:1 row_mask:0xf bank_mask:0xf bound_ctrl:0 s_nop 1 v_add_u32_dpp v1, v1, v1 row_shr:2 row_mask:0xf bank_mask:0xf bound_ctrl:0 s_nop 1 v_add_u32_dpp v1, v1, v1 row_shr:4 row_mask:0xf bank_mask:0xe s_nop 1 v_add_u32_dpp v1, v1, v1 row_shr:8 row_mask:0xf bank_mask:0xc s_nop 1 v_add_u32_dpp v1, v1, v1 row_bcast:15 row_mask:0xa bank_mask:0xf s_nop 1 v_add_u32_dpp v1, v1, v1 row_bcast:31 row_mask:0xc bank_mask:0xf I.e. two fewer computational instructions, one extra nop where we could schedule something else. Reviewers: arsenm, sheredom, critson, rampitec, vpykhtin Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D64411 llvm-svn: 366543	2019-07-19 08:40:37 +00:00
Stanislav Mekhanoshin	a9c71e01e7	[AMDGPU] Drop Reg32 and use regular AsmName This allows to reduce generated AMDGPUGenAsmWriter.inc by ~100Kb. Differential Revision: https://reviews.llvm.org/D64952 llvm-svn: 366505	2019-07-18 22:18:33 +00:00
Stanislav Mekhanoshin	7872d76a16	[AMDGPU] Simplify AMDGPUInstPrinter::printRegOperand() Differential Revision: https://reviews.llvm.org/D64892 llvm-svn: 366385	2019-07-17 22:58:43 +00:00
Stanislav Mekhanoshin	9c7f4264d3	[AMDGPU] Stop special casing flat_scratch for register name Differential Revision: https://reviews.llvm.org/D64885 llvm-svn: 366376	2019-07-17 21:35:11 +00:00
Daniil Fukalov	d912a9ba9b	[AMDGPU] Tune inlining parameters for AMDGPU target Summary: Since the target has no significant advantage of vectorization, vector instructions bous threshold bonus should be optional. amdgpu-inline-arg-alloca-cost parameter default value and the target InliningThresholdMultiplier value tuned then respectively. Reviewers: arsenm, rampitec Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, eraman, hiraditya, haicheng, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D64642 llvm-svn: 366348	2019-07-17 16:51:29 +00:00
Matt Arsenault	06eed42213	AMDGPU: Use getTargetConstant Avoids creating an extra intermediate mov. llvm-svn: 366340	2019-07-17 15:35:36 +00:00
Jay Foad	70235c642e	[AMDGPU] Optimize atomic AND/OR/XOR Summary: Extend the atomic optimizer to handle AND, OR and XOR. Reviewers: arsenm, sheredom Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, jfb, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D64809 llvm-svn: 366323	2019-07-17 13:40:03 +00:00
Nicolai Haehnle	8b7041a5c6	AMDGPU/GFX10: Apply the VMEM-to-scalar-write hazard also to writes to EXEC Summary: Change-Id: I854fbf7d48e937bef9f8f3f5d0c8aeb970652630 Reviewers: rampitec, mareko Subscribers: arsenm, kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D64807 Change-Id: I4405b3a7f84186acea5a78d291bff71056e745fc llvm-svn: 366314	2019-07-17 11:22:57 +00:00
Nicolai Haehnle	a256b8b7d7	AMDGPU: Improve alias analysis for GDS Summary: GDS cannot alias anything else. Original patch by: Marek Olšák Reviewers: arsenm, mareko Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, Petar.Avramovic, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D64114 Change-Id: I07bfbd96f5d5c37a6dfba7997df12f291dd794b0 llvm-svn: 366313	2019-07-17 11:22:19 +00:00
Stanislav Mekhanoshin	e5012ab308	[AMDGPU] Autogenerate register asm names Differential Revision: https://reviews.llvm.org/D64839 llvm-svn: 366283	2019-07-16 23:44:21 +00:00
Matt Arsenault	f8c8284455	AMDGPU/GlobalISel: Select G_ASHR llvm-svn: 366257	2019-07-16 20:31:25 +00:00
Matt Arsenault	e5b28b98e9	AMDGPU/GlobalISel: Select G_LSHR llvm-svn: 366256	2019-07-16 20:25:43 +00:00
Matt Arsenault	1b69fd275d	AMDGPU/GlobalISel: Select G_SHL I think this manages to not break the DAG handling with the divergent predicates because the stadalone divergent patterns end up with a higher priority than the pattern on the instruction definition. The 16-bit versions don't work yet. llvm-svn: 366254	2019-07-16 20:15:30 +00:00
Stanislav Mekhanoshin	6e0fa292c2	[AMDGPU] Change register type for v32 vectors When it is AReg_1024 this results in unnecessary copying into AGPRs of a 32 element vectors even though they are not intended for an mfma instruction. Differential Revision: https://reviews.llvm.org/D64815 llvm-svn: 366252	2019-07-16 20:06:00 +00:00
Matt Arsenault	2d10407719	AMDGPU/GlobalISel: Fix selection of private stores llvm-svn: 366249	2019-07-16 19:27:44 +00:00
Matt Arsenault	7161fb0be5	AMDGPU/GlobalISel: Select private loads llvm-svn: 366248	2019-07-16 19:22:21 +00:00
Matt Arsenault	dad1f89210	AMDGPU/GlobalISel: Select flat stores llvm-svn: 366246	2019-07-16 18:42:53 +00:00
Matt Arsenault	7eb1902cd5	AMDGPU: Add register classes to flat store patterns For some reason GlobalISelEmitter needs register classes to import these, although it works for the load patterns. llvm-svn: 366242	2019-07-16 18:26:42 +00:00
Matt Arsenault	8f8d07e93b	AMDGPU: Replace store PatFrags Convert the easy cases to formats understood for GlobalISel. llvm-svn: 366240	2019-07-16 18:21:25 +00:00
Matt Arsenault	35c96598b1	AMDGPU/GlobalISel: Select flat loads Now that the patterns use the new PatFrag address space support, the only blocker to importing most load patterns is the addressing mode complex patterns. llvm-svn: 366237	2019-07-16 18:05:29 +00:00
Jay Foad	17060f0a54	[AMDGPU] Optimize atomic max/min Summary: Extend the atomic optimizer to handle signed and unsigned max and min operations, as well as add and subtract. Reviewers: arsenm, sheredom, critson, rampitec Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, jfb, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D64328 llvm-svn: 366235	2019-07-16 17:44:54 +00:00
Matt Arsenault	c6fd5abecc	AMDGPU: Redefine load PatFrags Rewrite PatFrags using the new PatFrag address space matching in tablegen. These will now work with both SelectionDAG and GlobalISel. llvm-svn: 366234	2019-07-16 17:38:50 +00:00
Michael Liao	b3f967d411	[AMDGPU] Add the adjusted FP as a livein register. Reviewers: arsenm, rampitec Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D64145 llvm-svn: 366223	2019-07-16 15:57:12 +00:00
Matt Arsenault	22c4a147a9	AMDGPU/GlobalISel: Fix test failures in release build Apparently the check for legal instructions during instruction select does not happen without an asserts build, so these would successfully select in release, and fail in debug. Make s16 and/or/xor legal. These can just be selected directly to the 32-bit operation, as is already done in SelectionDAG, so just make them legal. llvm-svn: 366210	2019-07-16 14:28:30 +00:00
Rui Ueyama	49a3ad21d6	Fix parameter name comments using clang-tidy. NFC. This patch applies clang-tidy's bugprone-argument-comment tool to LLVM, clang and lld source trees. Here is how I created this patch: $ git clone https://github.com/llvm/llvm-project.git $ cd llvm-project $ mkdir build $ cd build $ cmake -GNinja -DCMAKE_BUILD_TYPE=Debug \ -DLLVM_ENABLE_PROJECTS='clang;lld;clang-tools-extra' \ -DCMAKE_EXPORT_COMPILE_COMMANDS=On -DLLVM_ENABLE_LLD=On \ -DCMAKE_C_COMPILER=clang -DCMAKE_CXX_COMPILER=clang++ ../llvm $ ninja $ parallel clang-tidy -checks='-,bugprone-argument-comment' \ -config='{CheckOptions: [{key: StrictMode, value: 1}]}' -fix \ ::: ../llvm/lib//.{cpp,h} ../clang/lib/*/.{cpp,h} ../lld/*/.{cpp,h} llvm-svn: 366177	2019-07-16 04:46:31 +00:00
Matt Arsenault	1739b700b1	AMDGPU: Avoid code predicates for extload PatFrags Use the MemoryVT field. This will be necessary for tablegen to automatically handle patterns for GlobalISel. Doesn't handle the d16 lo/hi patterns. Those are a special case since it involvess the custom node type. llvm-svn: 366168	2019-07-16 02:46:05 +00:00
Austin Kerbow	423b4a18a4	[AMDGPU] Enable merging m0 initializations. Summary: Enable hoisting and merging m0 defs that are initialized with the same immediate value. Fixes bug where removed instructions are not considered to interfere with other inits, and make sure to not hoist inits before block prologues. Reviewers: rampitec, arsenm Reviewed By: rampitec Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D64766 llvm-svn: 366135	2019-07-15 22:07:05 +00:00
Matt Arsenault	b082f1055b	AMDGPU: Use standalone MUBUF load patterns We already do this for the flat and DS instructions, although it is certainly uglier and more verbose. This will allow using separate pattern definitions for extload and zextload. Currently we get away with using a single PatFrag with custom predicate code to check if the extension type is a zextload or anyextload. The generic mechanism the global isel emitter understands treats these as mutually exclusive. I was considering making the pattern emitter accept zextload or sextload extensions for anyextload patterns, but in global isel, the different extending loads have distinct opcodes, and there is currently no mechanism for an opcode matcher to try multiple (and there probably is very little need for one beyond this case). llvm-svn: 366132	2019-07-15 21:41:44 +00:00
Matt Arsenault	66ee934440	AMDGPU/GlobalISel: Allow scalar s1 and/or/xor If a 1-bit value is in a 32-bit VGPR, the scalar opcodes set SCC to whether the result is 0. If the inputs are SCC, these can be copied to a 32-bit SGPR to produce an SCC result. llvm-svn: 366125	2019-07-15 20:20:18 +00:00
Matt Arsenault	c8291c94f8	AMDGPU/GlobalISel: Select G_AND/G_OR/G_XOR llvm-svn: 366121	2019-07-15 19:50:07 +00:00
Matt Arsenault	ad19b50c00	AMDGPU/GlobalISel: Don't constrain source register of VCC copies This is a hack until I come up with a better way of dealing with the pseudo-register banks used for boolean values. If the use instruction constrains the register, the selector for the def instruction won't see that the bank was VCC. A 1-bit SReg_32 is could ambiguously have been SCCRegBank or VCCRegBank in wave32. This is necessary to successfully select branches with and and/or/xor condition. llvm-svn: 366120	2019-07-15 19:48:36 +00:00
Matt Arsenault	e1b52f4180	AMDGPU/GlobalISel: Fix selecting vcc->vcc bank copies The extra test change is correct, although how it arrives there is a bug that needs work. With wave32, the test for isVCC ambiguously reports true for an SCC or VCC source. A new allocatable pseudo register class for SCC may be necesssary. llvm-svn: 366119	2019-07-15 19:46:48 +00:00
Matt Arsenault	3bfdb54d88	AMDGPU/GlobalISel: Fix not constraining result reg of copies to VCC llvm-svn: 366118	2019-07-15 19:45:49 +00:00
Matt Arsenault	18b7133843	AMDGPU/GlobalISel: Fix handling of sgpr (not scc bank) s1 to VCC This was emitting a copy from a 32-bit register to a 64-bit. llvm-svn: 366117	2019-07-15 19:44:07 +00:00
Matt Arsenault	6ed315f89b	AMDGPU/GlobalISel: Custom legalize G_INSERT_VECTOR_ELT llvm-svn: 366116	2019-07-15 19:43:04 +00:00
Matt Arsenault	b0e04c018c	AMDGPU/GlobalISel: Custom legalize G_EXTRACT_VECTOR_ELT Turn the constant cases into G_EXTRACTs. llvm-svn: 366115	2019-07-15 19:40:59 +00:00
Matt Arsenault	5dfd466032	AMDGPU/GlobalISel: Fix G_ICMP for wave32 llvm-svn: 366114	2019-07-15 19:39:31 +00:00
Matt Arsenault	90bdfb3daf	AMDGPU/GlobalISel: Widen vector extracts llvm-svn: 366103	2019-07-15 18:31:10 +00:00
Matt Arsenault	53fa759ff5	AMDGPU/GlobalISel: Handle llvm.amdgcn.if.break llvm-svn: 366102	2019-07-15 18:25:24 +00:00
Matt Arsenault	b390121efb	AMDGPU/GlobalISel: Select llvm.amdgcn.end.cf llvm-svn: 366099	2019-07-15 18:18:46 +00:00
Matt Arsenault	49169a963e	AMDGPU: Add 24-bit mul intrinsics Insert these during codegenprepare. This works around a DAG issue where generic combines eliminate the and asserting the high bits are zero, which then exposes an unknown read source to the mul combine. It doesn't worth the hassle of trying to insert an AssertZext or something to try to deal with it. llvm-svn: 366094	2019-07-15 17:50:31 +00:00
Stanislav Mekhanoshin	7938424eb9	[AMDGPU] Copy missing predicate from pseudo to real NFC at the momemnt, needed for future commit. Differential Revision: https://reviews.llvm.org/D64761 llvm-svn: 366092	2019-07-15 17:49:25 +00:00
Matt Arsenault	a65913e752	AMDGPU/GlobalISel: Select easy cases for G_BUILD_VECTOR llvm-svn: 366087	2019-07-15 17:26:43 +00:00
Matt Arsenault	cc02b17082	AMDGPU/GlobalISel: RegBankSelect for G_CONCAT_VECTORS llvm-svn: 366086	2019-07-15 17:20:40 +00:00
Stanislav Mekhanoshin	fd08dcb9db	[AMDGPU] fixed scheduler crash in gfx908 For some reason scheduler can send down an SUnit without an instruction. Differential Revision: https://reviews.llvm.org/D64709 llvm-svn: 366074	2019-07-15 15:34:05 +00:00
Dmitry Preobrazhensky	5153b1723a	[AMDGPU][MC][GFX9][GFX10] Added support of GET_DOORBELL message Reviewers: artem.tamazov, arsenm Differential Revision: https://reviews.llvm.org/D64729 llvm-svn: 366071	2019-07-15 15:12:16 +00:00
Dmitry Preobrazhensky	8d879c8d95	[AMDGPU][MC] Corrected encoding of src0 for DS_GWS_* instructions See bug 42599: https://bugs.llvm.org/show_bug.cgi?id=42599 Reviewers: artem.tamazov, arsenm Differential Revision: https://reviews.llvm.org/D64716 llvm-svn: 366067	2019-07-15 14:37:57 +00:00
Bill Wendling	796ed134cc	Remove set but unused variable. llvm-svn: 366041	2019-07-15 06:35:28 +00:00
Stanislav Mekhanoshin	1dfae6fe50	[AMDGPU] use v32f32 for 3 mfma intrinsics These should really use v32f32, but were defined as v32i32 due to the lack of the v32f32 type. Differential Revision: https://reviews.llvm.org/D64667 llvm-svn: 365972	2019-07-12 22:42:01 +00:00
Matt Arsenault	51a05d72ae	AMDGPU: Drop remnants of byval support for shaders Before 2018, mesa used to use byval interchangably with inreg, which didn't really make sense. Fix tests still using it to avoid breaking in a future commit. llvm-svn: 365953	2019-07-12 20:12:17 +00:00
David Tenty	ae79a2c390	Fix missing use of defined() in include guard Subscribers: arsenm, jvesely, nhaehnle, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D64657 llvm-svn: 365952	2019-07-12 20:12:15 +00:00
Stanislav Mekhanoshin	495b0f5cc3	[AMDGPU] Extend MIMG opcode to 8 bits This is NFC, but required for future commit. Differential Revision: https://reviews.llvm.org/D64649 llvm-svn: 365940	2019-07-12 18:38:06 +00:00
Jay Foad	27ec195f39	[AMDGPU] Fix DPP combiner check for exec modification Summary: r363675 changed the exec modification helper function, now called execMayBeModifiedBeforeUse, so that if no UseMI is specified it checks all instructions in the basic block, even beyond the last use. That meant that the DPP combiner no longer worked in any basic block that ended with a control flow instruction, and in particular it didn't work on code sequences generated by the atomic optimizer. Fix it by reinstating the old behaviour but in a new helper function execMayBeModifiedBeforeAnyUse, and limiting the number of instructions scanned. Reviewers: arsenm, vpykhtin Subscribers: kzhuravl, nemanjai, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, kbarton, MaskRay, jfb, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D64393 llvm-svn: 365910	2019-07-12 15:59:40 +00:00
Jay Foad	7816ad918f	[AMDGPU] Restrict v_cndmask_b32 abs/neg modifiers to f32 Summary: D64497 allowed abs/neg source modifiers on v_cndmask_b32 but it doesn't make any sense to apply them to f16 operands; they would interpret the bits of the value as an f32, giving nonsensical results. This patch restricts them to f32 operands. Reviewers: arsenm, hakzsam Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D64636 llvm-svn: 365904	2019-07-12 15:02:59 +00:00
Fangrui Song	b251cc0d91	Delete dead stores llvm-svn: 365903	2019-07-12 14:58:15 +00:00
Michael Liao	16d3c1ac03	[AMDGPU] Skip calculating callee saved registers for entry function. Reviewers: arsenm Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D64596 llvm-svn: 365846	2019-07-11 23:53:30 +00:00
Matt Arsenault	e5fb434d92	AMDGPU: s_waitcnt field should be treated as unsigned Also make it an ImmLeaf, so it should work with global isel as well, which was part of the point of moving it in the first place. llvm-svn: 365842	2019-07-11 23:42:57 +00:00
Stanislav Mekhanoshin	28550c8680	[AMDGPU] Fixed asan error with agpr spilling Instruction was used after it was erased. llvm-svn: 365837	2019-07-11 22:30:11 +00:00
Stanislav Mekhanoshin	937ff6e701	[AMDGPU] gfx908 agpr spilling Differential Revision: https://reviews.llvm.org/D64594 llvm-svn: 365833	2019-07-11 21:54:13 +00:00
Stanislav Mekhanoshin	7d2019bb96	[AMDGPU] gfx908 hazard recognizer Differential Revision: https://reviews.llvm.org/D64593 llvm-svn: 365829	2019-07-11 21:30:34 +00:00
Stanislav Mekhanoshin	b83e283e65	[AMDGPU] gfx908 scheduling Differential Revision: https://reviews.llvm.org/D64590 llvm-svn: 365826	2019-07-11 21:25:00 +00:00
Stanislav Mekhanoshin	e67cc380a8	[AMDGPU] gfx908 mfma support Differential Revision: https://reviews.llvm.org/D64584 llvm-svn: 365824	2019-07-11 21:19:33 +00:00
Matt Arsenault	b725d27350	AMDGPU/GlobalISel: Move kernel argument handling to separate function llvm-svn: 365782	2019-07-11 14:18:25 +00:00
Jay Foad	c1b7db9eda	Remove some redundant code from r290372 and improve a comment. llvm-svn: 365741	2019-07-11 08:49:52 +00:00
Stanislav Mekhanoshin	e93279fd1b	[AMDGPU] gfx908 atomic fadd and atomic pk_fadd Differential Revision: https://reviews.llvm.org/D64435 llvm-svn: 365717	2019-07-11 00:10:17 +00:00
Stanislav Mekhanoshin	c0ae1be066	[AMDGPU] gfx908 dot instruction support Differential Revision: https://reviews.llvm.org/D64431 llvm-svn: 365715	2019-07-11 00:00:27 +00:00
Matt Arsenault	6ce1b4fec5	GlobalISel: Legalization for G_FMINNUM/G_FMAXNUM llvm-svn: 365658	2019-07-10 16:31:19 +00:00
Matt Arsenault	58426a3707	AMDGPU: Serialize mode from MachineFunctionInfo llvm-svn: 365653	2019-07-10 16:09:26 +00:00
Jay Foad	bba37e89a5	[AMDGPU] Allow abs/neg source modifiers on v_cndmask_b32 Summary: D59191 added support for these modifiers in the assembler and disassembler. This patch just teaches instruction selection that it can use them. Reviewers: arsenm, tstellar Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D64497 llvm-svn: 365640	2019-07-10 14:53:47 +00:00
Tom Stellard	d0ba79fe7b	AMDGPU/GlobalISel: Add support for wide loads >= 256-bits Summary: This adds support for the most commonly used wide load types: <8xi32>, <16xi32>, <4xi64>, and <8xi64> Reviewers: arsenm Reviewed By: arsenm Subscribers: hiraditya, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, rovka, kristof.beyls, dstuttard, tpr, t-tye, volkan, Petar.Avramovic, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D57399 llvm-svn: 365586	2019-07-10 00:22:41 +00:00
Matt Arsenault	b1843e130a	GlobalISel: Implement lower for G_FCOPYSIGN In SelectionDAG AMDGPU treated these as legal, but this was mostly because the bitcasts required for FP types were painful. Theoretically the bitpattern should eventually match to bfi, so don't bother trying to get the patterns to import. llvm-svn: 365583	2019-07-09 23:34:29 +00:00
Matt Arsenault	3f1a34546c	AMDGPU/GlobalISel: Fix legality for G_BUILD_VECTOR llvm-svn: 365575	2019-07-09 22:48:04 +00:00
Stanislav Mekhanoshin	1e9eae95af	[AMDGPU] gfx908 v_pk_fmac_f16 support Differential Revision: https://reviews.llvm.org/D64433 llvm-svn: 365573	2019-07-09 22:42:24 +00:00
Stanislav Mekhanoshin	50d7f46460	[AMDGPU] gfx908 mAI instructions, MC part Differential Revision: https://reviews.llvm.org/D64446 llvm-svn: 365563	2019-07-09 21:43:09 +00:00
Craig Topper	84a1f07363	[X86][AMDGPU][DAGCombiner] Move call to allowsMemoryAccess into isLoadBitCastBeneficial/isStoreBitCastBeneficial to allow X86 to bypass it Basically the problem is that X86 doesn't set the Fast flag from allowsMemoryAccess on certain CPUs due to slow unaligned memory subtarget features. This prevents bitcasts from being folded into loads and stores. But all vector loads and stores of the same width are the same cost on X86. This patch merges the allowsMemoryAccess call into isLoadBitCastBeneficial to allow X86 to skip it. Differential Revision: https://reviews.llvm.org/D64295 llvm-svn: 365549	2019-07-09 19:55:28 +00:00
Stanislav Mekhanoshin	9e77d0c6df	[AMDGPU] gfx908 register file changes Differential Revision: https://reviews.llvm.org/D64438 llvm-svn: 365546	2019-07-09 19:41:51 +00:00
Stanislav Mekhanoshin	22b2c3d651	[AMDGPU] gfx908 target Differential Revision: https://reviews.llvm.org/D64429 llvm-svn: 365525	2019-07-09 18:10:06 +00:00
Christudasan Devadasan	b2d24bd540	[AMDGPU] Created a sub-register class for the return address operand in the return instruction. Function return instruction lowering, currently uses the fixed register pair s[30:31] for holding the return address. It can be any SGPR pair other than the CSRs. Created an SGPR pair sub-register class exclusive of the CSRs, and used this regclass while lowering the return instruction. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D63924 llvm-svn: 365512	2019-07-09 16:48:42 +00:00
Matt Arsenault	4dd5755d01	AMDGPU/GlobalISel: Legalize more concat_vectors llvm-svn: 365488	2019-07-09 14:17:31 +00:00
Matt Arsenault	6bdb92d833	AMDGPU/GlobalISel: Improve regbankselect for icmp s16 Account for 64-bit scalar eq/ne when available. llvm-svn: 365487	2019-07-09 14:13:09 +00:00
Matt Arsenault	8b8eee5904	AMDGPU/GlobalISel: Make s16 G_ICMP legal llvm-svn: 365486	2019-07-09 14:10:43 +00:00
Matt Arsenault	e6d10f97dd	AMDGPU/GlobalISel: Select G_SUB llvm-svn: 365484	2019-07-09 14:05:11 +00:00
Matt Arsenault	872f38be7e	AMDGPU/GlobalISel: Select G_UNMERGE_VALUES llvm-svn: 365483	2019-07-09 14:02:26 +00:00
Matt Arsenault	9b7ffc4e55	AMDGPU/GlobalISel: Select G_MERGE_VALUES llvm-svn: 365482	2019-07-09 14:02:20 +00:00
Stanislav Mekhanoshin	c776dc0b60	[AMDGPU] Added td definitions for HW regs Infrastructure work for future commit. NFC. Differential Revision: https://reviews.llvm.org/D64370 llvm-svn: 365432	2019-07-09 03:20:33 +00:00
Stanislav Mekhanoshin	818d748a45	[AMDGPU] Always use s_memtime for readcyclecounter Differential Revision: https://reviews.llvm.org/D64369 llvm-svn: 365431	2019-07-09 03:10:18 +00:00
Matt Arsenault	9e7cbc0e7d	AMDGPU: Split extload/zextload local load patterns This will help removing the custom load predicates, allowing the global isel emitter to handle them. llvm-svn: 365398	2019-07-08 22:08:23 +00:00
Bill Wendling	c8933c4070	Add parentheses to silence warning. llvm-svn: 365394	2019-07-08 22:00:33 +00:00
Matt Arsenault	8561844321	AMDGPU: Fix unused variable in release build llvm-svn: 365378	2019-07-08 19:47:42 +00:00
Matt Arsenault	acc9e1e4c2	AMDGPU: Fix stray typing llvm-svn: 365373	2019-07-08 19:05:19 +00:00
Matt Arsenault	71dfb7ec5c	AMDGPU: Make s34 the FP register Make the FP register callee saved. This is tricky because now the FP needs to be spilled in the prolog relative to the incoming SP register, rather than the frame register used throughout the rest of the function. I don't like how this bypassess the standard mechanism for CSR spills just to get the correct insert point. I may look for a better solution, since all CSR VGPRs may also need to have all lanes activated. Another option might be to make getFrameIndexReference change the base register if the frame index is a CSR, and then try to figure out the right insertion point in emitProlog. If there is a free VGPR lane available for SGPR spilling, try to use it for the FP. If that would require intrtoducing a new VGPR spill, try to use a free call clobbered SGPR. Only fallback to introducing a new VGPR spill as a last resort. This also doesn't attempt to handle SGPR spilling with scalar stores. llvm-svn: 365372	2019-07-08 19:03:38 +00:00
Matt Arsenault	5e643036cb	AMDGPU: Move DEBUG_TYPE definition below includes llvm-svn: 365369	2019-07-08 18:48:39 +00:00
Matt Arsenault	224d8cd987	AMDGPU: Remove mubuf specific PatFrags These are identical to the *_global PatFrag, and will only create more work to get the GlobalISel importer to handle them. llvm-svn: 365350	2019-07-08 16:53:53 +00:00
Matt Arsenault	430b0497e7	AMDGPU: Move waitcnt intrinsic to instruction definition pattern llvm-svn: 365349	2019-07-08 16:53:48 +00:00
Dmitry Preobrazhensky	2eff0318c6	[AMDGPU][MC] Corrected parsing of FLAT offset modifier Summary of changes: - simplified handling of FLAT offset: offset_s13 and offset_u12 have been replaced with flat_offset; - provided information about error position for pre-gfx9 targets; - improved errors handling. Reviewers: artem.tamazov, arsenm, rampitec Differential Revision: https://reviews.llvm.org/D64244 llvm-svn: 365321	2019-07-08 14:27:37 +00:00
Jay Foad	38902350ef	[AMDGPU] Use a named predicate instead of a magic number. Reviewers: arsenm Reviewed By: arsenm Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D64201 llvm-svn: 365294	2019-07-08 07:04:58 +00:00
Matt Arsenault	5e9610a3f5	AMDGPU: Fix assert in clang test llvm-svn: 365245	2019-07-05 21:09:53 +00:00
Matt Arsenault	e7e23e3e91	AMDGPU: Make AMDGPUPerfHintAnalysis an SCC pass Add a string attribute instead of directly setting MachineFunctionInfo. This avoids trying to get the analysis in the MachineFunctionInfo in a way that doesn't work with the new pass manager. This will also avoid re-visiting the call graph for every single function. llvm-svn: 365241	2019-07-05 20:26:13 +00:00
Michael Liao	8d6ea2d48c	[CodeGen] Enhance `MachineInstrSpan` to allow the end of MBB to be used. Summary: - Explicitly specify the parent MBB to allow the end iterator to be used. Reviewers: aprantl, MatzeB, craig.topper, qcolombet Subscribers: arsenm, jvesely, nhaehnle, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D64261 llvm-svn: 365240	2019-07-05 20:23:59 +00:00
Christudasan Devadasan	652ad423bb	[NFC] A test commit to check the access permission. Removed a blank line. llvm-svn: 365223	2019-07-05 17:07:42 +00:00
Yaxun Liu	a62413526d	[AMDGPU] Added a new metadata for multi grid sync implicit argument Patch by Christudasan Devadasan. Differential Revision: https://reviews.llvm.org/D63886 llvm-svn: 365217	2019-07-05 16:05:17 +00:00
Jay Foad	7e0c10b55f	[AMDGPU] DPP combiner: recognize identities for more opcodes Summary: This allows the DPP combiner to kick in more often. For example the exclusive scan generated by the atomic optimizer for a divergent atomic add used to look like this: v_mov_b32_e32 v3, v1 v_mov_b32_e32 v5, v1 v_mov_b32_e32 v6, v1 v_mov_b32_dpp v3, v2 wave_shr:1 row_mask:0xf bank_mask:0xf s_nop 1 v_add_u32_dpp v4, v3, v3 row_shr:1 row_mask:0xf bank_mask:0xf bound_ctrl:0 v_mov_b32_dpp v5, v3 row_shr:2 row_mask:0xf bank_mask:0xf v_mov_b32_dpp v6, v3 row_shr:3 row_mask:0xf bank_mask:0xf v_add3_u32 v3, v4, v5, v6 v_mov_b32_e32 v4, v1 s_nop 1 v_mov_b32_dpp v4, v3 row_shr:4 row_mask:0xf bank_mask:0xe v_add_u32_e32 v3, v3, v4 v_mov_b32_e32 v4, v1 s_nop 1 v_mov_b32_dpp v4, v3 row_shr:8 row_mask:0xf bank_mask:0xc v_add_u32_e32 v3, v3, v4 v_mov_b32_e32 v4, v1 s_nop 1 v_mov_b32_dpp v4, v3 row_bcast:15 row_mask:0xa bank_mask:0xf v_add_u32_e32 v3, v3, v4 s_nop 1 v_mov_b32_dpp v1, v3 row_bcast:31 row_mask:0xc bank_mask:0xf v_add_u32_e32 v1, v3, v1 v_add_u32_e32 v1, v2, v1 v_readlane_b32 s0, v1, 63 But now most of the dpp movs are combined into adds: v_mov_b32_e32 v3, v1 v_mov_b32_e32 v5, v1 s_nop 0 v_mov_b32_dpp v3, v2 wave_shr:1 row_mask:0xf bank_mask:0xf s_nop 1 v_add_u32_dpp v4, v3, v3 row_shr:1 row_mask:0xf bank_mask:0xf bound_ctrl:0 v_mov_b32_dpp v5, v3 row_shr:2 row_mask:0xf bank_mask:0xf v_mov_b32_dpp v1, v3 row_shr:3 row_mask:0xf bank_mask:0xf v_add3_u32 v1, v4, v5, v1 s_nop 1 v_add_u32_dpp v1, v1, v1 row_shr:4 row_mask:0xf bank_mask:0xe s_nop 1 v_add_u32_dpp v1, v1, v1 row_shr:8 row_mask:0xf bank_mask:0xc s_nop 1 v_add_u32_dpp v1, v1, v1 row_bcast:15 row_mask:0xa bank_mask:0xf s_nop 1 v_add_u32_dpp v1, v1, v1 row_bcast:31 row_mask:0xc bank_mask:0xf v_add_u32_e32 v1, v2, v1 v_readlane_b32 s0, v1, 63 Reviewers: arsenm, vpykhtin Subscribers: kzhuravl, nemanjai, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, kbarton, MaskRay, jfb, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D64207 llvm-svn: 365211	2019-07-05 14:52:48 +00:00
Tim Renouf	5816889c74	[AMDGPU] Custom lower INSERT_SUBVECTOR v3, v4, v5, v8 Summary: Since the changes to introduce vec3 and vec5, INSERT_VECTOR for these sizes has been marked "expand", which made LegalizeDAG lower it to loads and stores via a stack slot. The code got optimized a bit later, but the now-unused stack slot was never deleted. This commit avoids that problem by custom lowering INSERT_SUBVECTOR into an EXTRACT_VECTOR_ELT and INSERT_VECTOR_ELT for each element in the subvector to insert. V2: Addressed review comments re test. Differential Revision: https://reviews.llvm.org/D63160 Change-Id: I9e3c13e36f68cfa3431bb9814851cc1f673274e1 llvm-svn: 365148	2019-07-04 17:38:24 +00:00
Jay Foad	0cd50b2a95	Fix typos in comments and debug output. llvm-svn: 365146	2019-07-04 15:04:29 +00:00
Michael Liao	7a9ad430fe	[AMDGPU] Correct the setting of `FlatScratchInit`. Summary: - That flag setting should skip spilling stack slot. Reviewers: arsenm, rampitec Subscribers: qcolombet, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D64143 llvm-svn: 365137	2019-07-04 13:29:45 +00:00
Matt Arsenault	5b0922fe1f	AMDGPU: Add pass to lower SGPR spills This is split out from my patches to split register allocation into a separate SGPR and VGPR phase, and has some parts that aren't yet used (like maintaining LiveIntervals). This simplifies making the frame pointer register callee saved. As it is now, the code to determine callee saves needs to predict all the possible SGPR spills and how many callee saved VGPRs are needed. By handling this before PrologEpilogInserter, it's possible to just check the spill objects that already exist. Change-Id: I29e6df4034afcf949e06f8ef44206acb94696f04 llvm-svn: 365095	2019-07-03 23:32:29 +00:00
Matt Arsenault	c96c174557	Revert "[AMDGPU] Kernel arg metadata: added support for "__hip_texture" type." This reverts commit r365073. This is crashing, and is improperly relying on IR type names. llvm-svn: 365087	2019-07-03 21:34:34 +00:00
Konstantin Pyzhov	6f419a3370	[AMDGPU] Kernel arg metadata: added support for "__hip_texture" type. Summary: Hip texture type is equivalent to OpenCL image. So, we need to set the Image type for kernel arguments with __hip_texture type. Differential revision: https://reviews.llvm.org/D63850 llvm-svn: 365073	2019-07-03 19:11:35 +00:00
Michael Liao	80177ca5a9	[AMDGPU] Enable serializing of argument info. Summary: - Support serialization of all arguments in machine function info. This enables fabricating MIR tests depending on argument info. Reviewers: arsenm, rampitec Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D64096 llvm-svn: 364995	2019-07-03 02:00:21 +00:00
Matt Arsenault	c04aab9c06	AMDGPU: Look through bundles for existing waitcnts These aren't produced now, but will be in a future patch. llvm-svn: 364983	2019-07-03 00:30:44 +00:00
Matt Arsenault	5fe851b6cd	AMDGPU: Custom lower vector_shuffle for v4i16/v4f16 Ordinarily it is lowered as a build_vector of each extract_vector_elt, which in turn get lowered to bitcasts and bit shifts. Very little understand the lowered extract pattern, resulting in much worse code. We treat concat_vectors of v2i16 as legal, so prefer that. llvm-svn: 364959	2019-07-02 19:15:45 +00:00
Alexander Timofeev	66ac6b409d	[AMDGPU] LCSSA pass added in preISel. Fixing typo in previous commit llvm-svn: 364952	2019-07-02 18:16:42 +00:00
Alexander Timofeev	2ce560f029	[AMDGPU] LCSSA pass added in preISel. Uniform values defined in the divergent loop and used outside Differential Revision: https://reviews.llvm.org/D63953 Reviewers: rampitec, nhaehnle, arsenm llvm-svn: 364950	2019-07-02 17:59:44 +00:00
Matt Arsenault	50be3481d4	AMDGPU/GlobalISel: Try generated matcher with intrinsics llvm-svn: 364933	2019-07-02 14:52:16 +00:00
Matt Arsenault	a8bff4b963	AMDGPU/GlobalISel: Select mul llvm-svn: 364932	2019-07-02 14:52:14 +00:00
Matt Arsenault	70a4d3f67c	AMDGPU/GlobalISel: Fix G_GEP with mixed SGPR/VGPR operands The register bank for the destination of the sample argument copy was wrong. We shouldn't be constraining each source to the result register bank. Allow constraining the original register to the right size. llvm-svn: 364928	2019-07-02 14:40:22 +00:00
Matt Arsenault	ed63399244	AMDGPU/GlobalISel: Select G_FENCE Manually select to workaround tablegen emitter emitting checks for G_CONSTANT. llvm-svn: 364927	2019-07-02 14:17:38 +00:00
Matt Arsenault	40c08052a5	AMDGPU: Correct properties for adjcallstack* pseudos These should be SALU writes, and these are lowered to instructions that def SCC. llvm-svn: 364859	2019-07-01 22:01:05 +00:00
Matt Arsenault	bae3636f96	AMDGPU/GlobalISel: Handle more input argument intrinsics llvm-svn: 364836	2019-07-01 18:50:50 +00:00
Matt Arsenault	9e8e8c60fa	AMDGPU/GlobalISel: Lower kernarg segment ptr intrinsics llvm-svn: 364835	2019-07-01 18:49:01 +00:00
Matt Arsenault	756d81905f	AMDGPU/GlobalISel: Legalize workgroup ID intrinsics llvm-svn: 364834	2019-07-01 18:47:22 +00:00
Matt Arsenault	e2c86cce3a	AMDGPU/GlobalISel: Legalize workitem ID intrinsics Tests don't cover the masked input path since non-kernel arguments aren't lowered yet. Test is copied directly from the existing test, with 2 additions. llvm-svn: 364833	2019-07-01 18:45:36 +00:00
Matt Arsenault	e15770aec4	AMDGPU/GlobalISel: Custom lower control flow intrinsics Replace the brcond for the 2 cases that act as branches. For now follow how the current system works, although I think we can eventually get rid of the pseudos. llvm-svn: 364832	2019-07-01 18:40:23 +00:00
Matt Arsenault	4073b33786	AMDGPU/GlobalISel: Handle 16-bit SALU min/max This needs to be extended to s32, and expanded into cmp+select. This is relying on the fact that widenScalar happens to leave the instruction in place, but this isn't a guaranteed property of LegalizerHelper. llvm-svn: 364831	2019-07-01 18:33:37 +00:00
Matt Arsenault	5a7d5111e5	AMDGPU/GlobalISel: Lower SALU min/max to cmp+select Use a change observer to apply a register bank to the newly created intermediate result register. llvm-svn: 364830	2019-07-01 18:30:45 +00:00
Matt Arsenault	ef59cb6982	AMDGPU/GlobalISel: Legalize s16 add/sub/mul If this is scalar, promote to s32. Use a new observer class to assign the register bank of newly created registers. llvm-svn: 364827	2019-07-01 18:18:55 +00:00
Matt Arsenault	9470bb262b	AMDGPU/GlobalISel: Fix allowing non-boolean conditions for G_SELECT The condition register bank must be scc or vcc so that a copy will be inserted, which will be lowered to a compare. Currently greedy unnecessarily forces using a VCC select. llvm-svn: 364825	2019-07-01 18:13:12 +00:00
Matt Arsenault	b2ea20eedd	AMDGPU/GlobalISel: RegBankSelect for sendmsg/sendmsghalt llvm-svn: 364819	2019-07-01 17:40:18 +00:00
Matt Arsenault	40d1faf38f	AMDGPU/GlobalISel: Legalize s16 fcmp llvm-svn: 364817	2019-07-01 17:35:53 +00:00
Nicolai Haehnle	10c911db63	AMDGPU/GFX10: implement ds_ordered_count changes Summary: ds_ordered_count can now simultaneously operate on up to 4 dwords in a single instruction, which are taken from (and returned to) lanes 0..3 of a single VGPR. Change-Id: I19b6e7b0732b617c10a779a7f9c0303eec7dd276 Reviewers: mareko, arsenm, rampitec Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D63716 llvm-svn: 364815	2019-07-01 17:17:52 +00:00
Nicolai Haehnle	4dc3b2bf95	AMDGPU: Support GDS atomics Summary: Original patch by Marek Olšák Change-Id: Ia97d5d685a63a377d86e82942436d1fe6e429bab Reviewers: mareko, arsenm, rampitec Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, jfb, Petar.Avramovic, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D63452 llvm-svn: 364814	2019-07-01 17:17:45 +00:00
Matt Arsenault	1094e6a814	AMDGPU/GlobalISel: RegBankSelect for DS ordered add/swap llvm-svn: 364811	2019-07-01 17:04:57 +00:00
Matt Arsenault	265059eaf6	AMDGPU/GlobalISel: RegBankSelect for amdgcn.writelane llvm-svn: 364808	2019-07-01 16:41:36 +00:00
Matt Arsenault	a310727830	AMDGPU/GlobalISel: Fail instead of assert when selecting loads llvm-svn: 364807	2019-07-01 16:36:39 +00:00
Matt Arsenault	0a52e9d026	AMDGPU/GlobalISel: Complete implementation of G_GEP Also works around tablegen defect in selecting add with unused carry, but if we have to manually select GEP, might as well handle add manually. llvm-svn: 364806	2019-07-01 16:34:48 +00:00
Matt Arsenault	e1006259d8	AMDGPU/GlobalISel: Select G_PHI llvm-svn: 364805	2019-07-01 16:32:47 +00:00
Matt Arsenault	d810ff2588	AMDGPU/GlobalISel: Try to select VOP3 form of add There are several things broken, but at least emit the right thing for gfx9. The import of the pattern with the unused carry out seems to not work. Needs a special class for clamp, because OperandWithDefaultOps doesn't really work. llvm-svn: 364804	2019-07-01 16:27:32 +00:00
Matt Arsenault	62d64b0c30	AMDGPU/GlobalISel: RegBankSelect for readlane/readfirstlane llvm-svn: 364801	2019-07-01 16:19:39 +00:00
Tom Stellard	9e9dd30de3	AMDGPU/GlobalISel: Implement select for 32-bit G_ADD Reviewers: arsenm Reviewed By: arsenm Subscribers: hiraditya, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, rovka, kristof.beyls, dstuttard, tpr, t-tye, Petar.Avramovic, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D58804 llvm-svn: 364797	2019-07-01 16:09:33 +00:00
Matt Arsenault	2ab25f9ceb	AMDGPU/GlobalISel: Select G_BRCOND for vcc llvm-svn: 364795	2019-07-01 16:06:02 +00:00
Matt Arsenault	cda82f0bb6	AMDGPU/GlobalISel: Select G_FRAME_INDEX llvm-svn: 364789	2019-07-01 15:48:18 +00:00
Nicolai Haehnle	7cfd99ab15	AMDGPU/GFX10: fix scratch resource descriptor Summary: The stride should depend on the wave size, not the hardware generation. Also, the 32_FLOAT format is 0x16, not 16; though that shouldn't be relevant. Change-Id: I088f93bf6708974d085d1c50967f119061da6dc6 Reviewers: arsenm, rampitec, mareko Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D63808 llvm-svn: 364788	2019-07-01 15:43:00 +00:00
Matt Arsenault	fdf36729c7	AMDGPU/GlobalISel: Make s16 select legal This is easy to handle and avoids legalization artifacts which are likely to obscure combines. llvm-svn: 364787	2019-07-01 15:42:47 +00:00
Matt Arsenault	6464280eb0	AMDGPU/GlobalISel: Select G_BRCOND for scc conditions llvm-svn: 364786	2019-07-01 15:39:27 +00:00
Matt Arsenault	1daad91af6	AMDGPU/GlobalISel: Tolerate copies with no type set isVCC has the same bug, but isn't used in a context where it can cause a problem. llvm-svn: 364784	2019-07-01 15:23:04 +00:00
Matt Arsenault	4f64ade04c	AMDGPU/GlobalISel: Select src modifiers llvm-svn: 364782	2019-07-01 15:18:56 +00:00
Matt Arsenault	1b317685e9	AMDGPU: Convert some places to Register llvm-svn: 364769	2019-07-01 13:44:46 +00:00
Matt Arsenault	5bf850d52e	AMDGPU/GlobalISel: Fix RegBankSelect for G_FCANONICALIZE llvm-svn: 364768	2019-07-01 13:40:18 +00:00
Matt Arsenault	b5fc94f3e7	AMDGPU/GlobalISel: Fix RegBankSelect for G_BUILD_VECTOR llvm-svn: 364767	2019-07-01 13:40:17 +00:00
Matt Arsenault	89fc8bcdd6	AMDGPU/GlobalISel: Fail on store to 32-bit address space llvm-svn: 364766	2019-07-01 13:37:39 +00:00
Matt Arsenault	3b7668ae4b	AMDGPU/GlobalISel: Improve icmp selection coverage. Select s64 eq/ne scalar icmp. llvm-svn: 364765	2019-07-01 13:34:26 +00:00
Matt Arsenault	c23149f612	AMDGPU/GlobalISel: RegBankSelect for WWM/WQM llvm-svn: 364763	2019-07-01 13:30:12 +00:00
Matt Arsenault	facf69e844	AMDGPU/GlobalISel: Use vcc reg bank for amdgcn.wqm.vote llvm-svn: 364762	2019-07-01 13:30:09 +00:00
Matt Arsenault	9f992c238a	AMDGPU/GlobalISel: Fix scc->vcc copy handling This was checking the size of the register with the value of the size, which happens to be exec. Also fix assuming VCC is 64-bit to fix wave32. Also remove some untested handling for physical registers which is skipped. This doesn't insert the V_CNDMASK_B32 if SCC is the physical copy source. I'm not sure if this should be trying to handle this special case instead of dealing with this in copyPhysReg. llvm-svn: 364761	2019-07-01 13:22:07 +00:00
Matt Arsenault	5dafcb9b11	AMDGPU/GlobalISel: Use and instead of BFE with inline immediate Zext from s1 is the only case where this should do anything with the current legal extensions. llvm-svn: 364760	2019-07-01 13:22:06 +00:00
Florian Hahn	33c8c0ea27	[AMDGPU] Call isLoopExiting for blocks in the loop. isLoopExiting should only be called for blocks in the loop. A follow up patch makes this requirement an assertion. I've updated the usage here, to only match for actual exit blocks. Previously, it would also match blocks not in the loop. Reviewers: arsenm, nhaehnle Reviewed By: nhaehnle Differential Revision: https://reviews.llvm.org/D63980 llvm-svn: 364750	2019-07-01 12:36:44 +00:00
Matt Arsenault	0d45209757	AMDGPU/GlobalISel: RegBankSelect for update.dpp llvm-svn: 364701	2019-06-29 00:44:36 +00:00
Matt Arsenault	fd82cf4f4d	AMDGPU/GlobalISel: RegBankSelect for atomic.inc/atomic.dec llvm-svn: 364699	2019-06-29 00:39:20 +00:00
Matt Arsenault	adb1f21e52	AMDGPU/GlobalISel: RegBankSelect for some DS intrinsics llvm-svn: 364698	2019-06-29 00:33:13 +00:00

... 2 3 4 5 6 ...

3865 Commits