llvm-project

Commit Graph

Author	SHA1	Message	Date
Mark Searles	ed54ff1d51	[AMDGPU][Waitcnt] Fix build error: unused variable 'SWaitInst' https://reviews.llvm.org/rL333556 caused a buildbot failure. See http://lab.llvm.org:8011/builders/lld-x86_64-darwin13/builds/21876/steps/build_Lld/logs/stdio /Users/buildslave/as-bldslv9/lld-x86_64-darwin13/llvm.src/lib/Target/AMDGPU/SIInsertWaitcnts.cpp:2007:10: error: unused variable 'SWaitInst' [-Werror,-Wunused-variable] auto SWaitInst = BuildMI(EntryBB, EntryBB.getFirstNonPHI(), The unused variable was for debugging purposes; removing that piece of code to fix the build. llvm-svn: 333559	2018-05-30 16:27:57 +00:00
Matt Arsenault	7b4826e6ce	AMDGPU: Use better alignment for kernarg lowering This was just emitting loads with the ABI alignment for the raw type. The true alignment is often better, especially when an illegal vector type was scalarized. The better alignment allows using a scalar load more often. llvm-svn: 333558	2018-05-30 16:17:51 +00:00
Mark Searles	1054541490	[AMDGPU][Waitcnt] Fix handling of loops with many bottom blocks In terms of waitcnt insertion/if necessary, the waitcnt pass forces convergence for a loop. Previously, that kicked if greater than 2 passes over a loop, which doesn't account for loop with many bottom blocks. So, increase the threshold to (n+1), where n is the number of bottom blocks. This gives the pass an opportunity to consider the contribution of each bottom block, to the overall loop, before the forced convergence potentially kicks in. Differential Revision: https://reviews.llvm.org/D47488 llvm-svn: 333556	2018-05-30 15:47:45 +00:00
Matt Arsenault	2e4d338d16	AMDGPU: Fix typo in option description llvm-svn: 333457	2018-05-29 19:35:46 +00:00
Matt Arsenault	1ea0402e82	AMDGPU: Round up kernel argument allocation size AFAIK the driver's allocation will actually have to round this up anyway. It is useful to track the rounded up size, so that the end of the kernel segment is known to be dereferencable so a wider s_load_dword can be used for a short argument at the end of the segment. llvm-svn: 333456	2018-05-29 19:35:00 +00:00
Konstantin Zhuravlyov	2ca6b1f2ba	AMDGPU: Always set COMPUTE_PGM_RSRC2.ENABLE_TRAP_HANDLER to zero for AMDHSA as it is set by CP Differential Revision: https://reviews.llvm.org/D47392 llvm-svn: 333451	2018-05-29 19:09:13 +00:00
Matt Arsenault	ceafc55e5a	AMDGPU: Pass function directly instead of MachineFunction These functions just query the underlying IR function, so pass it directly. llvm-svn: 333442	2018-05-29 17:42:50 +00:00
Matt Arsenault	2fb9ccf770	AMDGPU: Add nuw to add off of kernarg ptr llvm-svn: 333441	2018-05-29 17:42:38 +00:00
Tom Stellard	57b9342c80	AMDGPU: Split R600 MCInst lowering into its own class Reviewers: arsenm, nhaehnle Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, llvm-commits, t-tye Differential Revision: https://reviews.llvm.org/D47307 llvm-svn: 333439	2018-05-29 17:41:59 +00:00
Tim Renouf	fa213f797b	[AMDGPU] Fixed build warning Summary: V2: Use cast instead of extra if. Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D47426 Change-Id: I6ac31da0306f79706960284a7ebd7b9c6237a83a llvm-svn: 333397	2018-05-29 08:15:37 +00:00
Farhana Aleen	eacb1020aa	[AMDGPU] Re-enabled 128bit wide-vector generation for local addr space by default. Summary: Bug reported here https://bugs.freedesktop.org/show_bug.cgi?id=105464 found to be resolved by some other fixes. Author: FarhanaAleen llvm-svn: 333380	2018-05-28 18:15:11 +00:00
Tim Renouf	364edcd2e5	[AMDGPU] Fixed WWM bug in block otherwise entirely in WQM Summary: For a block with WQM on entry and exit and containing no exact mode code, but containing some WWM code, the WQM pass forgot to process the block at all and so did not insert code to enter and leave WWM. This commit fixes that. Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D47027 Change-Id: I044792eead1293bed4203fb26ce75f47878afeb6 llvm-svn: 333362	2018-05-27 17:26:11 +00:00
Mark Searles	32efedcff3	[AMDGPU][Waitcnt] Remove obsolete waitcnt option With the removal of the old waitcnt pass, the '-enable-si-insert-waitcnts' option is obsolete. Remove it. Differential Revision: https://reviews.llvm.org/D47378 llvm-svn: 333303	2018-05-25 20:24:08 +00:00
Stanislav Mekhanoshin	7fc1cee051	[AMDGPU] Fixed test failure with AMDGPUPerfHint We shall not keep iterator to a map while map is modified, this leads to a broken map. llvm-svn: 333298	2018-05-25 18:46:58 +00:00
Reid Kleckner	cb48efd585	Fix -Winconsistent-missing-overrides in AMDGPU code llvm-svn: 333291	2018-05-25 17:46:24 +00:00
Stanislav Mekhanoshin	1c538423dc	[AMDGPU] Add perf hints to functions This is adoption of HSAIL perfhint pass. Two types of hints are produced: 1. Function is memory bound. 2. Kernel can use wave limiter. Currently these hints are used in the scheduler. If a function is suspected to be memory bound we allow occupancy to decrease to 4 waves in the course of scheduling. Differential Revision: https://reviews.llvm.org/D46992 llvm-svn: 333289	2018-05-25 17:25:12 +00:00
Tim Renouf	ad8b7c1190	[AMDGPU] Fixed incorrect break from loop Summary: Lower control flow did not correctly handle the case that a loop break in if/else was on a condition that was not guaranteed to be masked by exec. The first test kernel shows an example of this going wrong; after exiting the loop, exec is all ones, even if it was not before the loop. The fix is for lowering of if-break and else-break to insert an S_AND_B64 to mask the break condition with exec. This commit also includes the optimization of not inserting that S_AND_B64 if it is obviously not needed because the break condition is the result of a V_CMP in the same basic block. V2: Addressed some review comments. V3: Test fixes. Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D44046 Change-Id: I0fc56a01209a9e99d1d5c9b0ffd16f111caf200c llvm-svn: 333258	2018-05-25 07:55:04 +00:00
Tom Stellard	79fffe3515	AMDGPU: Remove AMDGPUMCInstLower.h Summary: The AMDGPUMCInstLower class is not used outside AMDGPUMCInstLower.cpp, so we don't need a header file. Reviewers: arsenm, nhaehnle Reviewed By: arsenm Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D47264 llvm-svn: 333254	2018-05-25 04:57:02 +00:00
Tom Stellard	c501501055	AMDGPU: Split R600 AsmPrinter code into its own class Reviewers: arsenm, nhaehnle Reviewed By: arsenm Subscribers: kzhuravl, wdng, mgorny, yaxunl, dstuttard, tpr, llvm-commits, t-tye Differential Revision: https://reviews.llvm.org/D47245 llvm-svn: 333219	2018-05-24 20:02:01 +00:00
Tom Stellard	1b95fed6f7	AMDGPU/R600: Remove code for handling AMDGPUISD::CLAMP Summary: We don't generate AMDGPUISD::CLAMP for R600 now that llvm.AMDGPU.clamp is gone. Reviewers: arsenm, nhaehnle Reviewed By: arsenm Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D47181 llvm-svn: 333153	2018-05-24 05:28:34 +00:00
Matt Arsenault	606bc315d6	AMDGPU: Fix v2f16 fneg/fabs pattern The integer operation convertion for some reason only happens if the source is a bitcast from an integer, which happens to always be the situation when the result is loaded. Add an additional pattern for when the source operation is really an FP operation. llvm-svn: 333019	2018-05-22 20:13:34 +00:00
Tom Stellard	b12f4dec08	AMDGPU: Move AMDGPUTargetLowering::isFPExtFoldable() into SITargetLowering Summary: This is always false for R600. Reviewers: arsenm, nhaehnle Reviewed By: arsenm Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D47180 llvm-svn: 333016	2018-05-22 19:37:55 +00:00
Matt Arsenault	1349a04ef5	AMDGPU: Make v2i16/v2f16 legal on VI This usually results in better code. Fixes using inline asm with short2, and also fixes having a different ABI for function parameters between VI and gfx9. Partially cleans up the mess used for lowering of the d16 operations. Making v4f16 legal will help clean this up more, but this requires additional work. llvm-svn: 332953	2018-05-22 06:32:10 +00:00
Tom Stellard	44b30b4537	AMDGPU: Remove #include "MCTargetDesc/AMDGPUMCTargetDesc.h" from common headers Summary: MCTargetDesc/AMDGPUMCTargetDesc.h contains enums for all the instuction and register defintions, which are huge so we only want to include them where needed. This will also make it easier if we want to split the R600 and GCN definitions into separate tablegenerated files. I was unable to remove AMDGPUMCTargetDesc.h from SIMachineFunctionInfo.h because it uses some enums from the header to initialize default values for the SIMachineFunction class, so I ended up having to remove includes of SIMachineFunctionInfo.h from headers too. Reviewers: arsenm, nhaehnle Reviewed By: nhaehnle Subscribers: MatzeB, kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, javed.absar, llvm-commits Differential Revision: https://reviews.llvm.org/D46272 llvm-svn: 332930	2018-05-22 02:03:23 +00:00
Peter Collingbourne	dcd7d6c331	MC: Separate creating a generic object writer from creating a target object writer. NFCI. With this we gain a little flexibility in how the generic object writer is created. Part of PR37466. Differential Revision: https://reviews.llvm.org/D47045 llvm-svn: 332868	2018-05-21 19:20:29 +00:00
Stanislav Mekhanoshin	9badad2051	[AMDGPU] Add divergence analysis as a dependency for ISel AMDGPUDAGToDAGISel adds DivergenceAnalysis in getAnalysisUsage but does not list it in pass dependencies which may lead to crash. Differential Revision: https://reviews.llvm.org/D47151 llvm-svn: 332862	2018-05-21 18:18:52 +00:00
Peter Collingbourne	571a3301ae	MC: Change MCAsmBackend::writeNopData() to take a raw_ostream instead of an MCObjectWriter. NFCI. To make this work I needed to add an endianness field to MCAsmBackend so that writeNopData() implementations know which endianness to use. Part of PR37466. Differential Revision: https://reviews.llvm.org/D47035 llvm-svn: 332857	2018-05-21 17:57:19 +00:00
Tom Stellard	a91ce17b5f	AMDGPU/GlobalISel: Address post-commit review comments for r332379 MCRegisterInfo::getPhysRegSize() will be deprecated. llvm-svn: 332856	2018-05-21 17:49:31 +00:00
Simon Pilgrim	ede0e4073e	Fix MSVC unused variable warning. NFCI. AMDGPURegisterInfo::getSubRegFromChannel is a static method - we don't need to get the AMDGPURegisterInfo instance. llvm-svn: 332807	2018-05-19 12:46:02 +00:00
Matt Arsenault	372d796ab1	AMDGPU: Add pass to optimize reqd_work_group_size Eliminate loads from the dispatch packet when they will have a known value. Also pattern match the code used by the library to handle partial workgroup dispatches, which isn't necessary if reqd_work_group_size is used. llvm-svn: 332771	2018-05-18 21:35:00 +00:00
Peter Collingbourne	e3f652973e	Support: Simplify endian stream interface. NFCI. Provide some free functions to reduce verbosity of endian-writing a single value, and replace the endianness template parameter with a field. Part of PR37466. Differential Revision: https://reviews.llvm.org/D47032 llvm-svn: 332757	2018-05-18 19:46:24 +00:00
Konstantin Zhuravlyov	caa8251971	AMDGPU/NFC: Set symbol's type that is coming from an argument in EmitAMDGPUSymbolType, instead of hard-coding it to STT_AMDGPU_HSA_KERNEL. llvm-svn: 332753	2018-05-18 18:41:37 +00:00
Peter Collingbourne	f7b81db715	MC: Change the streamer ctors to take an object writer instead of a stream. NFCI. The idea is that a client that wants split dwarf would create a specific kind of object writer that creates two files, and use it to create the streamer. Part of PR37466. Differential Revision: https://reviews.llvm.org/D47050 llvm-svn: 332749	2018-05-18 18:26:45 +00:00
Changpeng Fang	860d460063	AMDGPU/SI: Don't promote alloca to vector for atomic load/store Summary: Don't promote alloca to vector for atomic load/store Reviewer: arsenm Differential Revision: https://reviews.llvm.org/D46085 llvm-svn: 332673	2018-05-17 21:49:44 +00:00
Changpeng Fang	391bcf8893	AMDGPU/SI: Handle infinite loop for the structurizer to work with CFG with infinite loops. Summary: The current StructurizeCFG pass only works for CFG with one exit. AMDGPUUnifyDivergentExitNodes combines multiple "return" blocks and/or "unreachable" blocks to one exit block for the Structurizer to work. However, infinite loop is another kind of special "exit", and if we don't handle it, the case of multiple exits will prevent the structurizer from working. In this work, for each infinite loop, we add a dummy edge to the "return" block, and thus the AMDGPUUnifyDivergentExitNodes pass will work with infinite loops. This will make CFG with infinite loops be structurized. Reviewer: nhaehnle Differential Revision: https://reviews.llvm.org/D46340 llvm-svn: 332625	2018-05-17 16:45:01 +00:00
Konstantin Zhuravlyov	c72ece6c2c	AMDGPU : Recalculate SGPRs when trap handler is supported Differential Revision: https://reviews.llvm.org/D29911 llvm-svn: 332523	2018-05-16 20:47:48 +00:00
Tony Tye	43259df44a	[AMDGPU] Change llvm.debugtrap to be a debug breakpoint that can resume execution. No longer require the queue pointer to be passed in in fixed SGPRs. Differential Revision: https://reviews.llvm.org/D46769 llvm-svn: 332485	2018-05-16 16:19:34 +00:00
Matt Arsenault	67a9815a5c	AMDGPU: Custom lower v4i16/v4f16 vector operations Avoids stack access. Also handle extract hi elt pattern from truncate + shift to avoid a couple test regressions. llvm-svn: 332453	2018-05-16 11:47:30 +00:00
Stanislav Mekhanoshin	57d341c27a	[AMDGPU] Fix handling of void types in isLegalAddressingMode It is legal for the type passed to isLegalAddressingMode to be unsized or, more specifically, VoidTy. In this case, we must check the legality of load / stores for all legal types. Directly trying to call getTypeStoreSize is incorrect, and leads to breakage in e.g. Loop Strength Reduction. This change guards against that behaviour. Differential Revision: https://reviews.llvm.org/D40405 llvm-svn: 332409	2018-05-15 22:07:51 +00:00
Konstantin Zhuravlyov	f13c9969fc	AMDGPU: Fix v_dot{4, 8}* instruction encoding Differential Revision: https://reviews.llvm.org/D46848 llvm-svn: 332387	2018-05-15 19:32:47 +00:00
Tom Stellard	e182b28ae4	AMDGPU/GlobalISel: Implement select() for G_FCONSTANT Summary: Also clean up G_CONSTANT selection. Reviewers: arsenm, nhaehnle Subscribers: kzhuravl, wdng, yaxunl, rovka, kristof.beyls, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D46170 llvm-svn: 332379	2018-05-15 17:57:09 +00:00
Konstantin Zhuravlyov	603a43fcd5	AMDGPU: Add disasm tests for deep learning instructions + fix v_fmac_f32 disasm Differential Revision: https://reviews.llvm.org/D46853 llvm-svn: 332377	2018-05-15 17:39:13 +00:00
Nicola Zaghen	d34e60ca85	Rename DEBUG macro to LLVM_DEBUG. The DEBUG() macro is very generic so it might clash with other projects. The renaming was done as follows: - git grep -l 'DEBUG' \| xargs sed -i 's/\bDEBUG\s\?(/LLVM_DEBUG(/g' - git diff -U0 master \| ../clang/tools/clang-format/clang-format-diff.py -i -p1 -style LLVM - Manual change to APInt - Manually chage DOCS as regex doesn't match it. In the transition period the DEBUG() macro is still present and aliased to the LLVM_DEBUG() one. Differential Revision: https://reviews.llvm.org/D43624 llvm-svn: 332240	2018-05-14 12:53:11 +00:00
Matt Arsenault	432aaea63f	AMDGPU: Rename OpenCL lowering pass to be R600 specific. This pass is a) broken. b) r600 specific. Fixing (a) is a bit more non-trivial, but fixing (b) is easy. Move this pass to being R600 only for now. This pass does pass all the unit tests, however clang no longer generates code that looks like the unit test input, so fixing the pass requires fixing the tests and the pass as one, and checking it works with clang still. Patch by Dave Airlie llvm-svn: 332196	2018-05-13 10:04:48 +00:00
Matt Arsenault	dfb88dfe30	AMDGPU: Make undef legal for v2i16/v2f16 This is apparently necessary to stop undef from being turned into a build_vector of 0s. llvm-svn: 332195	2018-05-13 10:04:38 +00:00
Stanislav Mekhanoshin	7012c246c1	[AMDGPU] Fix amdgpu-waves-per-eu accounting in scheduler We cannot query this attribute from a subtarget given a machine function. At this point attribute itself is already unavailable and can only be obtained through MFI. Differential Revision: https://reviews.llvm.org/D46781 llvm-svn: 332166	2018-05-12 01:41:56 +00:00
Tom Stellard	655fdd3f82	AMDGPU/GlobalISel: Implement select() for >32-bit G_STORE Reviewers: arsenm, nhaehnle Subscribers: kzhuravl, wdng, yaxunl, rovka, kristof.beyls, dstuttard, tpr, llvm-commits, t-tye Differential Revision: https://reviews.llvm.org/D46153 llvm-svn: 332154	2018-05-11 23:12:49 +00:00
Changpeng Fang	f094885a9e	AMDGPU/SI: Don't promote alloca to vector for AddrSpaceCast instruction. Summary: We have no logic to promote alloca to vector for an AddrSpaceCast instruction. Reviewer: arsenm Differential Revision: https://reviews.llvm.org/D45993 llvm-svn: 332147	2018-05-11 22:17:57 +00:00
Yaxun Liu	deba150c27	[AMDGPU] Fix compilation failure when IR contains comdat Remove a useless SwitchSection which also causes compilation failure when IR contains comdat. The SwitchSection is useless because the current section is already correct text section for the function therefore no need to switch. It causes compilation failure for comdat because functions with comdat has specific text section, not the default .text section. Since HIP uses comdat, this bug caused failures for HIP. Differential Revision: https://reviews.llvm.org/D46770 llvm-svn: 332137	2018-05-11 20:40:14 +00:00
Tom Stellard	dcc95e9385	AMDGPU/GlobalISel: Implement select() for 32-bit G_FPTOUI Reviewers: arsenm, nhaehnle Subscribers: kzhuravl, wdng, yaxunl, rovka, kristof.beyls, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D45883 llvm-svn: 332082	2018-05-11 05:44:16 +00:00
Tom Stellard	1e0edad4bb	AMDGPU/GlobalISel: Implement select() for G_BITCAST s32 <--> <2 x s16> Reviewers: arsenm, nhaehnle Subscribers: kzhuravl, wdng, yaxunl, rovka, kristof.beyls, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D45881 llvm-svn: 332042	2018-05-10 21:20:10 +00:00
Tom Stellard	1dc90204bf	AMDGPU/GlobalISel: Enable TableGen'd instruction selector Reviewers: arsenm, nhaehnle Reviewed By: arsenm Subscribers: kzhuravl, wdng, mgorny, yaxunl, rovka, kristof.beyls, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D45994 llvm-svn: 332039	2018-05-10 20:53:06 +00:00
Farhana Aleen	e24f3ff8de	[AMDGPU] Support horizontal vectorization of min/max. Author: FarhanaAleen Reviewed By: rampitec Subscribers: AMDGPU Differential Revision: https://reviews.llvm.org/D46604 llvm-svn: 331920	2018-05-09 21:18:34 +00:00
Matt Arsenault	eac81b2448	AMDGPU: Ignore any_extend in mul24 combine If a multiply is truncated, SimplifyDemandedBits sometimes turns a zero_extend of the inputs into an any_extend, which makes the known bits computation unhelpful. Ignore these and compute known bits for the underlying value, since we insert the correct extend type after. llvm-svn: 331919	2018-05-09 21:11:35 +00:00
Matt Arsenault	74fd7600d2	AMDGPU: Handle partial shift reduction for variable shifts If the variable shift amount has known bits, we can still reduce the shift. llvm-svn: 331917	2018-05-09 20:52:54 +00:00
Matt Arsenault	b143d9a5ea	AMDGPU: Partially shrink 64-bit shifts if reduced to 16-bit This is an extension of an existing combine to reduce wider shls if the result fits in the final result type. This introduces the same combine, but reduces the shift to a middle sized type to avoid the slow 64-bit shift. llvm-svn: 331916	2018-05-09 20:52:43 +00:00
Matt Arsenault	762d498808	AMDGPU: Add combine for trunc of bitcast from build_vector If the truncate is only accessing the first element of the vector, we can use the original source value. This helps with some combine ordering issues after operations are lowered to integer operations between bitcasts of build_vector. In particular it stops unnecessarily materializing the unused top half of a vector in some cases. llvm-svn: 331909	2018-05-09 18:37:39 +00:00
Matt Arsenault	378f86998c	AMDGPU: Stop special casing constant indexes of extract_vector_elt The same result folds out of the dynamic expansion logic if the index is constant. llvm-svn: 331906	2018-05-09 18:29:26 +00:00
Shiva Chen	801bf7ebbe	[DebugInfo] Examine all uses of isDebugValue() for debug instructions. Because we create a new kind of debug instruction, DBG_LABEL, we need to check all passes which use isDebugValue() to check MachineInstr is debug instruction or not. When expelling debug instructions, we should expel both DBG_VALUE and DBG_LABEL. So, I create a new function, isDebugInstr(), in MachineInstr to check whether the MachineInstr is debug instruction or not. This patch has no new test case. I have run regression test and there is no difference in regression test. Differential Revision: https://reviews.llvm.org/D45342 Patch by Hsiangkai Wang. llvm-svn: 331844	2018-05-09 02:42:00 +00:00
Tim Renouf	64afc2d7f0	[AMDGPU] Provide machine -> name mapping Summary: AMDGPU stores a numerical code for the particular GPU variant in EFlags in the ELF file. This commit provides a mapping from that number into the machine name for use by objdump-type tools. Change-Id: Id37fc0bebad443bd89c0080985ce298c4e7e9319 Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D46587 llvm-svn: 331798	2018-05-08 18:53:04 +00:00
Matt Arsenault	869cbedc81	AMDGPU: Fix broken dynamic vector indexing for packed types The intention of this was to multiply by 16, not shift by 16. llvm-svn: 331793	2018-05-08 18:43:25 +00:00
Changpeng Fang	d049da3740	AMDGPU: Use eraseFromParent to delete am instruction when it is no longer needed. Reviewer: Nicolai Differential Revision: https://reviews.llvm.org/D46438 llvm-svn: 331788	2018-05-08 18:32:35 +00:00
Stanislav Mekhanoshin	432936161e	[AMDGPU] Added checks for dpp_ctrl value - Report error for invalid dpp_ctrl values. - Changed the way it is reported, now the error will be emitted into asm and will work with release build as well. - Added dpp_ctrl value verifier for codegen. - Added symbolic constants for dpp_ctrl. Differential Revision: https://reviews.llvm.org/D46565 llvm-svn: 331775	2018-05-08 16:53:02 +00:00
Tom Stellard	37444285f1	AMDGPU/GlobalISel: Don't try to lower hull shaders Summary: The AMDGPU_HS calling convention is not supported yet. Reviewers: arsenm, nhaehnle Subscribers: kzhuravl, wdng, yaxunl, rovka, kristof.beyls, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D46149 llvm-svn: 331691	2018-05-07 22:17:54 +00:00
Mark Searles	4a0f2c5047	[AMDGPU][Waitcnt] Remove the old waitcnt pass Remove the old waitcnt pass ( si-insert-waits ), which is no longer maintained and getting crufty Differential Revision: https://reviews.llvm.org/D46448 llvm-svn: 331641	2018-05-07 14:43:28 +00:00
Tim Renouf	18a1e9d03a	[AMDGPU] Don't force WQM for DS op Summary: Previously, all DS ops forced WQM in a pixel shader. That was a hack to allow for graphics frontends using ds_swizzle to implement explicit derivatives, on SI/CI at least where DPP is not available. But it forced WQM for _any_ DS op. With this commit, DS ops no longer force WQM. Both graphics frontends (Mesa and LLPC) need to change to issue an explicit llvm.amdgcn.wqm intrinsic call when calculating explicit derivatives. The required Mesa change is: "amd/common: use llvm.amdgcn.wqm for explicit derivatives". Subscribers: qcolombet, arsenm, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D46051 Change-Id: I9b745b626fa91bbd66456e6cf41ee07eeea42f81 llvm-svn: 331633	2018-05-07 13:21:26 +00:00
Konstantin Zhuravlyov	91a74f53db	AMDGPU/NFC: Update D16PreservesUnusedBits description based Tony Tye's comments llvm-svn: 331564	2018-05-04 22:53:55 +00:00
Konstantin Zhuravlyov	3fc4067ac4	AMDGPU/NFC: Fix formatting for 900, 902 ISA Version features llvm-svn: 331553	2018-05-04 20:21:31 +00:00
Konstantin Zhuravlyov	c2c2eb7d01	AMDGPU: Add D16 instructions preserve unused bits feature - Predicate D16 patterns on this new feature - Added this new feature to gfx900/2/4 Differential Revision: https://reviews.llvm.org/D46366 llvm-svn: 331551	2018-05-04 20:06:57 +00:00
Michael Berg	7acc81b744	Fast Math Flag mapping into SDNode Summary: Adding support for Fast flags in the SDNode to leverage fast math sub flag usage. Reviewers: spatel, arsenm, jbhateja, hfinkel, escha, qcolombet, echristo, wristow, javed.absar Reviewed By: spatel Subscribers: llvm-commits, rampitec, nhaehnle, tstellar, FarhanaAleen, nemanjai, javed.absar, jbhateja, hfinkel, wdng Differential Revision: https://reviews.llvm.org/D45710 llvm-svn: 331547	2018-05-04 18:48:20 +00:00
Tom Stellard	b03c98d1a3	AMDGPU: Make getSubRegFromChannel a static member of AMDGPURegisterInfo Summary: This makes is possible to have R600RegisterInfo and SIRegisterInfo not inherit from AMDGPURegisterInfo. Reviewers: arsenm, nhaehnle Reviewed By: arsenm Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, llvm-commits, t-tye Differential Revision: https://reviews.llvm.org/D46280 llvm-svn: 331490	2018-05-03 22:38:06 +00:00
Piotr Padlewski	5dde809404	Rename invariant.group.barrier to launder.invariant.group Summary: This is one of the initial commit of "RFC: Devirtualization v2" proposal: https://docs.google.com/document/d/16GVtCpzK8sIHNc2qZz6RN8amICNBtvjWUod2SujZVEo/edit?usp=sharing Reviewers: rsmith, amharc, kuhar, sanjoy Subscribers: arsenm, nhaehnle, javed.absar, hiraditya, llvm-commits Differential Revision: https://reviews.llvm.org/D45111 llvm-svn: 331448	2018-05-03 11:03:01 +00:00
Farhana Aleen	07e612340f	[AMDGPU] A trivial fix for a buildbot failure caused by "commit 224a839fcbbead221f872cd32a1dd0c308d37299". Author: FarhanaAleen llvm-svn: 331383	2018-05-02 18:16:39 +00:00
Farhana Aleen	150cb6d91a	Revert "[AMDGPU] performAddCombine should run after DAG is legalized." This reverts commit 6b97d2995566b4dddd6bf0d75579ff44501d4494. llvm-svn: 331371	2018-05-02 16:48:52 +00:00
Farhana Aleen	2f4100f56e	[AMDGPU] performAddCombine should run after DAG is legalized. Summary: performAddCombine should run after DAG is legalized; Otherwise generic optimization in the DAGCombiner can optimize an addcarry+trunc into an addcarry instruction with illegal types. Author: FarhanaAleen Reviewed By: rampitec Subscribers: llvm-commits, AMDGPU Differential Revision: https://reviews.llvm.org/D46337 llvm-svn: 331368	2018-05-02 16:24:10 +00:00
Farhana Aleen	e2dfe8a853	[AMDGPU] Support horizontal vectorization. Author: FarhanaAleen Reviewed By: rampitec, arsenm Subscribers: llvm-commits, AMDGPU Differential Revision: https://reviews.llvm.org/D46213 llvm-svn: 331313	2018-05-01 21:41:12 +00:00
Konstantin Zhuravlyov	1501af4846	AMDGPU: Remove remnants of gfx901 (it was deprecated some time ago) llvm-svn: 331298	2018-05-01 18:47:48 +00:00
Adrian Prantl	4dfcc4a788	Remove @brief commands from doxygen comments, too. This is a follow-up to r331272. We've been running doxygen with the autobrief option for a couple of years now. This makes the \brief markers into our comments redundant. Since they are a visual distraction and we don't want to encourage more \brief markers in new code either, this patch removes them all. Patch produced by for i in $(git grep -l '\@brief'); do perl -pi -e 's/\@brief //g' $i & done https://reviews.llvm.org/D46290 llvm-svn: 331275	2018-05-01 16:10:38 +00:00
Adrian Prantl	5f8f34e459	Remove \brief commands from doxygen comments. We've been running doxygen with the autobrief option for a couple of years now. This makes the \brief markers into our comments redundant. Since they are a visual distraction and we don't want to encourage more \brief markers in new code either, this patch removes them all. Patch produced by for i in $(git grep -l '\\brief'); do perl -pi -e 's/\\brief //g' $i & done Differential Revision: https://reviews.llvm.org/D46290 llvm-svn: 331272	2018-05-01 15:54:18 +00:00
Matt Arsenault	0084adc516	AMDGPU: Add Vega12 and Vega20 Changes by Matt Arsenault Konstantin Zhuravlyov llvm-svn: 331215	2018-04-30 19:08:16 +00:00
Tom Stellard	add59c052d	AMDGPU: Remove some dead code llvm-svn: 331196	2018-04-30 16:28:02 +00:00
Tom Stellard	6c81418a63	AMDGPU/GlobalISel: Don't try to lower geometry shaders Summary: The AMDGPU_GS calling convention is not supported yet. Reviewers: arsenm, nhaehnle Reviewed By: nhaehnle Subscribers: kzhuravl, wdng, yaxunl, rovka, kristof.beyls, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D46041 llvm-svn: 331186	2018-04-30 15:15:23 +00:00
Nico Weber	432a38838d	IWYU for llvm-config.h in llvm, additions. See r331124 for how I made a list of files missing the include. I then ran this Python script: for f in open('filelist.txt'): f = f.strip() fl = open(f).readlines() found = False for i in xrange(len(fl)): p = '#include "llvm/' if not fl[i].startswith(p): continue if fl[i][len(p):] > 'Config': fl.insert(i, '#include "llvm/Config/llvm-config.h"\n') found = True break if not found: print 'not found', f else: open(f, 'w').write(''.join(fl)) and then looked through everything with `svn diff \| diffstat -l \| xargs -n 1000 gvim -p` and tried to fix include ordering and whatnot. No intended behavior change. llvm-svn: 331184	2018-04-30 14:59:11 +00:00
Matt Arsenault	540512c297	DAG: Fix not legalizing vector fcanonicalizes If an fcanoncialize was done on a vector type that was legal, llvm-svn: 330981	2018-04-26 19:21:37 +00:00
Matt Arsenault	fcc5ba46b7	AMDGPU: Extend extract_vector_elt fneg combine to fabs Fixes a regression in a future commit. llvm-svn: 330980	2018-04-26 19:21:32 +00:00
Matt Arsenault	8474803c7c	AMDGPU: Consolidate SubtargetPredicate definitions llvm-svn: 330979	2018-04-26 19:21:26 +00:00
Mark Searles	2a19af6e17	[AMDGPU][Waitcnt] As of gfx7, VMEM operations do not increment the export counter and the input registers are available in the next instruction; update the waitcnt pass to take this into account. Differential Revision: https://reviews.llvm.org/D46067 llvm-svn: 330954	2018-04-26 16:11:19 +00:00
Tom Stellard	dce46fa1cf	AMDGPU/R600: Move int_r600_store_stream_output to the public intrinsic file Summary: The TableGen'd GlobalISel instruction selector assumes all intrinsics are in the public Intrinsic:: namespace. Reviewers: jvesely, nhaehnle Reviewed By: jvesely, nhaehnle Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D45989 llvm-svn: 330866	2018-04-25 20:02:53 +00:00
Mark Searles	ec58183e1b	[AMDGPU] Waitcnt pass: add debug options - Add "amdgpu-waitcnt-forcezero" to force all waitcnt instrs to be emitted as s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) - Add debug counters to control force emit of s_waitcnt instrs; debug counters: si-insert-waitcnts-forceexp: force emit s_waitcnt expcnt(0) instrs si-insert-waitcnts-forcevm: force emit s_waitcnt lgkmcnt(0) instrs si-insert-waitcnts-forcelgkm: force emit s_waitcnt vmcnt(0) instrs - Add some debug statements Note that a variant of this patch was previously committed/reverted. Differential Revision: https://reviews.llvm.org/D45888 llvm-svn: 330862	2018-04-25 19:21:26 +00:00
Alexander Timofeev	b934728cd2	[AMDGPU] Revert b0efc4fd6 (https://reviews.llvm.org/D40556 ) llvm-svn: 330818	2018-04-25 12:32:46 +00:00
Tom Stellard	a2be8f4c35	AMDGPU: Remove deprecated llvm.AMDGPU.kilp intrinsic Summary: This is no longer used by mesa since its 18.0.0 release. Reviewers: nhaehnle Reviewed By: nhaehnle Subscribers: arsenm, kzhuravl, wdng, yaxunl, dstuttard, tpr, llvm-commits, t-tye Differential Revision: https://reviews.llvm.org/D45988 llvm-svn: 330775	2018-04-24 21:37:57 +00:00
Tom Stellard	257882ff72	AMDGPU/GlobalISel: Fall-back to SelectionDAG for non-void functions Reviewers: arsenm, nhaehnle Reviewed By: nhaehnle Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, rovka, kristof.beyls, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D45843 llvm-svn: 330774	2018-04-24 21:29:36 +00:00
Tom Stellard	c7709e1c29	AMDGPU/GlobalISel: Add support for amdgpu_ps calling convention Reviewers: arsenm Reviewed By: arsenm Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, rovka, kristof.beyls, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D45837 llvm-svn: 330767	2018-04-24 20:51:28 +00:00
Stanislav Mekhanoshin	a4bfb3c446	[AMDGPU] Truncate packed inline constant If a packed inline constant is sign extended it must be truncated after the shift. I.e. a constant (0xH0000, 0xHBC00), will be represented as 0xFFFFFFFFBC000000 in the IR because the immediate is sign extended to 64 bit. After the value shifted right by 16 to use it in a low part with op_sel_hi it becomes 0xFFFFFFFFBC00 and does not qualify as inline constant any longer. Fixed the error and added verification code. Without the fix and with the verification bug is causing pk_max_f16_literal.ll to fail. Differential Revision: https://reviews.llvm.org/D45987 llvm-svn: 330752	2018-04-24 18:17:55 +00:00
Mark Searles	70901b9047	[AMDGPU][Waitcnt] NFC. Cleanup some code/naming consistency: - s/SWaitcnt/Waitcnt s/WaitCnt/Waitcnt llvm-svn: 330730	2018-04-24 15:59:59 +00:00
Matt Arsenault	b21f9592be	AMDGPU: Move a flawed assert when spilling SGPRs It's possible to validly spill the frame offset register in a call sequence to a VGPR. There are definitely issues with SGPR spilling to memory, so move the assert later. llvm-svn: 330612	2018-04-23 16:13:30 +00:00
Matt Arsenault	adc59d7076	AMDGPU: Assign enum name to stack ID Also assert that it is correct for SGPRs. There is currently a bug where stack slot coloring replaces SGPR spill FIs with one with the default ID, which results in a more confusing assert later about a dead object. llvm-svn: 330607	2018-04-23 15:51:26 +00:00
Nicolai Haehnle	cbebba4917	AMDGPU: Fix SDWA peephole for V_AND_B32 Summary: Found by inspection. We care about the operand that doesn't contain the immediate. I believe this is currently not hit because we fold 0xff / 0xffff immediates only later. Change-Id: Ic3cf8538bc7da5eff3200d96eccf9d339e6345a7 Reviewers: arsenm, rampitec Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D45886 llvm-svn: 330586	2018-04-23 13:06:03 +00:00
Nicolai Haehnle	5a995664f0	AMDGPU: Fix a corner case crash in SIOptimizeExecMasking Summary: See the new test case; this is really unlikely to happen with real code, but I ran into this while attempting to bugpoint-reduce a different issue. Change-Id: I9ade1dc1aa8fd9c4d9fc83661d7b80e310b5c4a6 Reviewers: arsenm, rampitec Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D45885 llvm-svn: 330585	2018-04-23 13:05:50 +00:00
Nico Weber	5d53aed419	Consistently sort add_subdirectory calls in lib/Target/*/CMakeLists.txt llvm-svn: 330584	2018-04-23 12:49:34 +00:00
Nicolai Haehnle	7a87977fb2	AMDGPU: Legalize the operand of SI_INIT_M0 Summary: This fixes a case where the argument to a sendmsg intrinsic ends up in a VGPR, for whatever reason. The underlying performance issue is that a multiplication that can be an s_mul_i32 is instead needlessly generated as v_mul_u32_u24, but this is not addressed by this patch. Change-Id: I61fd4034314d5acdf6074632c30b65364dfa7328 Reviewers: arsenm, rampitec Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D45826 llvm-svn: 330393	2018-04-20 07:14:25 +00:00
Stanislav Mekhanoshin	160f85794d	[AMDGPU] Use packed literals with zero either lower or hi part Differential Revision: https://reviews.llvm.org/D45790 llvm-svn: 330365	2018-04-19 21:16:50 +00:00
Mark Searles	1bc6e71f32	[AMDGPU] Do not only rely on BB number when finding bottom loop We should also check that the "bottom" basic block of a loopis a successor of the "header" basic block, otherwise we don't propagate the information correctly when the CFG is complex. This fixes an important rendering problem with Wolfsentein 2, because of one vector-memory wait was missing. Differential Revision: https://reviews.llvm.org/D43831 llvm-svn: 330337	2018-04-19 15:42:30 +00:00
David Stuttard	31f482c26b	[AMDGPU] Fix issues for backend divergence tracking Summary: A change to use divergence analysis in the AMDGPU backend was getting formal arguments incorrect (not tagged as divergent) unless they were VGPR0, VGPR1 or VGPR2 For graphics shaders it is possible to have more than these passed in as VGPR Modified the checking code to check for any VGPR registers passed in as formal arguments. Also, some intrinsics that are sources of divergence may have been lowered during instruction selection and are missed on subsequent calls to isSDNodeSourceOfDivergence - added the relevant AMDGPUISD checks as well. Finally, the FunctionLoweringInfo tracks virtual registers that are live across basic block boundaries. This is used to check for divergence of CopyFromRegister registers using the DivergenceAnalysis analysis. For multiple blocks the lazily evaluated inverted map VirtReg2Value was not cleared when the ValueMap map was. Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D45372 Change-Id: I112f3bd6dfe0f62e63ce9b43b893982778e4bee3 llvm-svn: 330257	2018-04-18 13:53:31 +00:00
Stanislav Mekhanoshin	8b20b7dc2b	[AMDGPU] Enabled v2.16 literals for VOP3P Literal encoding needs op_sel_hi to select low 16 bit in this case. Differential Revision: https://reviews.llvm.org/D45745 llvm-svn: 330230	2018-04-17 23:09:05 +00:00
Dmitry Preobrazhensky	4c45e6ff0e	[AMDGPU][MC][VI][GFX9] Added support of SDWA/DPP for v_cndmask_b32 See bug 36356: https://bugs.llvm.org/show_bug.cgi?id=36356 Differential Revision: https://reviews.llvm.org/D45446 Reviewers: artem.tamazov, arsenm, timcorringham llvm-svn: 330123	2018-04-16 12:41:38 +00:00
Hiroshi Inoue	ae17900997	[NFC] fix trivial typos in document and comments "not not" -> "not" etc llvm-svn: 330083	2018-04-14 08:59:00 +00:00
Hiroshi Inoue	372ffa15cb	[NFC] fix trivial typos in comments "the the" -> "the", "we we" -> "we", etc llvm-svn: 330006	2018-04-13 11:37:06 +00:00
Jonas Paulsson	e8f1ac7063	[MachineScheduler] NFC refactoring This patch makes tryCandidate() virtual and some utility functions like tryLess(), tryGreater(), ... externally available (used to be static). This makes it possible for a target to derive a new MachineSchedStrategy from GenericScheduler and reuse most parts. It was necessary to wrap functions with the same names in AMDGPU/SIMachineScheduler in a local namespace. Review: Andy Trick, Florian Hahn https://reviews.llvm.org/D43329 llvm-svn: 329884	2018-04-12 07:21:39 +00:00
Tim Renouf	fd8d4af3bc	[AMDGPU] Ensure there are enough registers for wave dispatch Summary: This fixes the number of SGPRs and VGPRs in the *_RSRC1 register to allow for registers set up in wave dispatch, even if those registers are not used in the shader. Re-landed after noticing that the buildbot failure from 329808 seemed to be unrelated. Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D45503 Change-Id: I6575f0e0d2a528d1319d0b289f0ebe4510fa5771 llvm-svn: 329826	2018-04-11 17:18:36 +00:00
Yaxun Liu	9381ae9791	[AMDGPU] Fix lowering enqueue_kernel Two issues were fixed: runtime has difficulty to allocate memory for an external symbol of a kernel and set the address of the external symbol, therefore make the runtime handle of an enqueued kernel an ordinary global variable. Runtime only needs to store the address of the loaded kernel to the handle and has verified that this approach works. handle the situation where __enqueue_kernel* gets inlined therefore the enqueued kernel may be used through a constant expr instead of an instruction. Differential Revision: https://reviews.llvm.org/D45187 llvm-svn: 329815	2018-04-11 14:46:15 +00:00
Tim Renouf	8ca33bfcf3	Revert "[AMDGPU] Ensure there are enough registers for wave dispatch" This reverts 329808. That change caused a report of a failure in test/CodeGen/MIR/AMDGPU/mir-canon-multi.mir that I didn't see. I suspect it is an expensive-check-only error. Change-Id: I8133f26f15e7d5ec2b09c687c12cd70e918461b0 llvm-svn: 329811	2018-04-11 14:27:41 +00:00
Tim Renouf	f26b723491	[AMDGPU] Ensure there are enough registers for wave dispatch Summary: This fixes the number of SGPRs and VGPRs in the *_RSRC1 register to allow for registers set up in wave dispatch, even if those registers are not used in the shader. Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D45503 Change-Id: I6575f0e0d2a528d1319d0b289f0ebe4510fa5771 llvm-svn: 329808	2018-04-11 14:02:41 +00:00
Dmitry Preobrazhensky	fc715551a3	[AMDGPU][MC][GFX9] Added v_screen_partition_4se_b32 See bug 36845: https://bugs.llvm.org/show_bug.cgi?id=36845 Differential Revision: https://reviews.llvm.org/D45443 Reviewers: artem.tamazov, arsenm, timcorringham llvm-svn: 329801	2018-04-11 13:13:30 +00:00
Marek Olsak	a9a58fa236	AMDGPU: enable 128-bit for local addr space under an option Author: Samuel Pitoiset ds_read_b128 and ds_write_b128 have been recently enabled under the amdgpu-ds128 option because the performance benefit is unclear. Though, using 128-bit loads/stores for the local address space appears to introduce regressions in tessellation shaders. Not sure what is broken, but as ds_read_b128/ds_write_b128 are not enabled by default, just introduce a global option and enable 128-bit only if requested (until it's fixed/used correctly). v2: - fix regressions in merge-stores.ll and multiple_tails.ll Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=105464 llvm-svn: 329764	2018-04-10 22:48:23 +00:00
Nicolai Haehnle	b1c3b22b4c	AMDGPU/MC: Allow disassembling without symbol info Summary: We would like the UMR debugging tool[0] to be able to provide disassembly for currently live waves based on plain memory dumps, and we want to leverage the LLVM disassembler for this. This mostly works, except that UMR clearly can't provide real symbol info, so it wants to set DisInfo == nullptr. [0] https://cgit.freedesktop.org/amd/umr/ Reviewers: arsenm, rampitec, artem.tamazov, dp Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D45477 Change-Id: Ibb2c5af2e66f2e100b4702fd81308e1932bc4ee6 llvm-svn: 329715	2018-04-10 15:46:43 +00:00
Tim Renouf	7190a4692a	[AMDGPU] For OS type AMDPAL, fixed scratch on compute shader Summary: For OS type AMDPAL, the scratch descriptor is loaded from offset 0 of the GIT, whose 32 bit pointer is in s0 (s8 for gfx9 merged shaders). This commit fixes that to use offset 0x10 instead of offset 0 for a compute shader, per the PAL ABI spec. V2: Ensure s0 (s8 for gfx9 merged shader) is marked live-in when loading scratch descriptor from GIT. Reviewers: kzhuravl, nhaehnle, timcorringham Subscribers: kzhuravl, wdng, yaxunl, t-tye, llvm-commits, dstuttard, nhaehnle, arsenm Differential Revision: https://reviews.llvm.org/D44468 Change-Id: I93dffa647758e37f613bb5e0dfca840d82e6d26f llvm-svn: 329690	2018-04-10 11:25:15 +00:00
Konstantin Zhuravlyov	6183065b97	AMDGPU: Remove max_scratch_backing_memory_byte_size from kernel header 1. Remove max_scratch_backing_memory_byte_size from kernel header 2. Make it a reserved field 3. Ignore it while parsing assembly for backwards compatibility 4. Bump up minor version of kernel header Differential Revision: https://reviews.llvm.org/D45452 llvm-svn: 329620	2018-04-09 20:47:22 +00:00
Alex Shlyapnikov	79f2c720b5	Revert "AMDGPU: enable 128-bit for local addr space under an option" This reverts commit r329591. It breaks various bots: http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux-fast/builds/16516 http://lab.llvm.org:8011/builders/clang-ppc64be-linux/builds/17374 http://lab.llvm.org:8011/builders/clang-ppc64le-linux/builds/15992 http://lab.llvm.org:8011/builders/clang-ppc64be-linux-lnt http://lab.llvm.org:8011/builders/clang-ppc64le-linux-lnt/builds/11251 ... llvm-svn: 329610	2018-04-09 19:47:38 +00:00
Marek Olsak	52b033b827	AMDGPU: enable 128-bit for local addr space under an option Author: Samuel Pitoiset ds_read_b128 and ds_write_b128 have been recently enabled under the amdgpu-ds128 option because the performance benefit is unclear. Though, using 128-bit loads/stores for the local address space appears to introduce regressions in tessellation shaders. Not sure what is broken, but as ds_read_b128/ds_write_b128 are not enabled by default, just introduce a global option and enable 128-bit only if requested (until it's fixed/used correctly). Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=105464 llvm-svn: 329591	2018-04-09 16:56:32 +00:00
Tom Stellard	e753c52227	AMDGPU: Initialize GlobalISel passes Summary: This fixes AMDGPU GlobalISel test failures when enabling the AMDGPU target without any other targets that use GlobalISel. Reviewers: arsenm Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, rovka, kristof.beyls, dstuttard, tpr, llvm-commits, t-tye Differential Revision: https://reviews.llvm.org/D45353 llvm-svn: 329588	2018-04-09 16:09:13 +00:00
Dmitry Preobrazhensky	2f8e146ad3	[AMDGPU][MC][GFX9] Added instructions s_mul_hi_32, s_lshl_add_u32 See bugs 36841: https://bugs.llvm.org/show_bug.cgi?id=36841 36842: https://bugs.llvm.org/show_bug.cgi?id=36842 Differential Revision: https://reviews.llvm.org/D45251 Reviewers: artem.tamazov, arsenm, timcorringham llvm-svn: 329562	2018-04-09 13:10:33 +00:00
Dmitry Preobrazhensky	ae31223ba7	[AMDGPU][MC][GFX9] Added s_call_b64 See bug 36843: https://bugs.llvm.org/show_bug.cgi?id=36843 Differential Revision: https://reviews.llvm.org/D45268 Reviewers: artem.tamazov, arsenm, timcorringham llvm-svn: 329440	2018-04-06 18:24:49 +00:00
Dmitry Preobrazhensky	306b1a0119	[AMDGPU][MC][GFX9] Added instruction s_endpgm_ordered_ps_done See bug 36844: https://bugs.llvm.org/show_bug.cgi?id=36844 Differential Revision: https://reviews.llvm.org/D45313 Reviewers: artem.tamazov, arsenm, timcorringham llvm-svn: 329430	2018-04-06 17:25:00 +00:00
Dmitry Preobrazhensky	f20aff565d	[AMDGPU][MC][GFX9] Added instructions saveexec, wrexec and bitreplicate See bug 36840: https://bugs.llvm.org/show_bug.cgi?id=36840 Differential Revision: https://reviews.llvm.org/D45250 Reviewers: artem.tamazov, arsenm, timcorringham llvm-svn: 329419	2018-04-06 16:35:11 +00:00
Dmitry Preobrazhensky	59399ae4cc	[AMDGPU][MC][VI][GFX9] Added s_atc_probe* instructions See bug 36839: https://bugs.llvm.org/show_bug.cgi?id=36839 Differential Revision: https://reviews.llvm.org/D45249 Reviewers: artem.tamazov, arsenm, timcorringham llvm-svn: 329408	2018-04-06 15:48:39 +00:00
Dmitry Preobrazhensky	4732d876ee	[AMDGPU][MC][GFX9] Added s_dcache_discard* instructions See bug 36838: https://bugs.llvm.org/show_bug.cgi?id=36838 Differential Revision: https://reviews.llvm.org/D45247 Reviewers: artem.tamazov, arsenm, timcorringham llvm-svn: 329397	2018-04-06 15:08:42 +00:00
Konstantin Zhuravlyov	c233ae8004	AMDGPU/Metadata: Always report a fixed number of hidden arguments Currently it is 6. If the "feature" was not used, report dummy hidden argument. Otherwise it does not match the kernarg size reported in the kernel header. Differential Revision: https://reviews.llvm.org/D45129 llvm-svn: 329341	2018-04-05 20:46:04 +00:00
Simon Pilgrim	1d793b8ac5	[SchedModel] Complete models shouldn't match against itineraries when they don't use them (PR35639) For schedule models that don't use itineraries, checkCompleteness still checks that an instruction has a matching itinerary instead of skipping and going straight to matching the InstRWs. That doesn't seem to match what happens in TargetSchedule.cpp This patch causes problems for a number of models that had been incorrectly flagged as complete. Differential Revision: https://reviews.llvm.org/D43235 llvm-svn: 329280	2018-04-05 13:11:36 +00:00
Dmitry Preobrazhensky	523872ea59	[AMDGPU][MC] Enabled instruction TBUFFER_LOAD_FORMAT_XYZ for SI/CI See bug 36958: https://bugs.llvm.org/show_bug.cgi?id=36958 Differential Revision: https://reviews.llvm.org/D45099 Reviewers: artem.tamazov, arsenm, timcorringham llvm-svn: 329197	2018-04-04 13:54:55 +00:00
Dmitry Preobrazhensky	a0b8cd038c	[AMDGPU][MC] Added support of 3-element addresses for MIMG instructions See bug 35999: https://bugs.llvm.org/show_bug.cgi?id=35999 Differential Revision: https://reviews.llvm.org/D45084 Reviewers: artem.tamazov, arsenm, timcorringham llvm-svn: 329187	2018-04-04 13:01:17 +00:00
Nico Weber	1cbd096914	Sort targetgen calls in lib/Target/*/CMakeLists. Makes it easier to see mistakes such as the one fixed in r329178 and makes the different target CMakeLists more consistent. Also remove some stale-looking comments from the Nios2 target cmakefile. No intended behavior change. llvm-svn: 329181	2018-04-04 12:37:44 +00:00
Nicolai Haehnle	2f5a73820c	AMDGPU: Dimension-aware image intrinsics Summary: These new image intrinsics contain the texture type as part of their name and have each component of the address/coordinate as individual parameters. This is a preparatory step for implementing the A16 feature, where coordinates are passed as half-floats or -ints, but the Z compare value and texel offsets are still full dwords, making it difficult or impossible to distinguish between A16 on or off in the old-style intrinsics. Additionally, these intrinsics pass the 'texfailpolicy' and 'cachectrl' as i32 bit fields to reduce operand clutter and allow for future extensibility. v2: - gather4 supports 2darray images - fix a bug with 1D images on SI Change-Id: I099f309e0a394082a5901ea196c3967afb867f04 Reviewers: arsenm, rampitec, b-sumner Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, llvm-commits, t-tye Differential Revision: https://reviews.llvm.org/D44939 llvm-svn: 329166	2018-04-04 10:58:54 +00:00
Nicolai Haehnle	3ffd383a15	AMDGPU: Fix copying i1 value out of loop with non-uniform exit Summary: When an i1-value is defined inside of a loop and used outside of it, we cannot simply use the SGPR bitmask from the loop's last iteration. There are also useful and correct cases of an i1-value being copied between basic blocks, e.g. when a condition is computed outside of a loop and used inside it. The concept of dominators is not sufficient to capture what is going on, so I propose the notion of "lane-dominators". Fixes a bug encountered in Nier: Automata. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103743 Change-Id: If37b969ddc71d823ab3004aeafb9ea050e45bd9a Reviewers: arsenm, rampitec Subscribers: kzhuravl, wdng, mgorny, yaxunl, dstuttard, tpr, llvm-commits, t-tye Differential Revision: https://reviews.llvm.org/D40547 llvm-svn: 329164	2018-04-04 10:57:58 +00:00
Farhana Aleen	e80aeac0f2	[AMDGPU] performMinMaxCombine should not optimize patterns of vectors to min3/max3. Summary: There are no packed instructions for min3 or max3. So, performMinMaxCombine should not optimize vectors of f16 to min3/max3. Author: FarhanaAleen Reviewed By: arsenm Subscribers: llvm-commits, AMDGPU Differential Revision: https://reviews.llvm.org/D45219 llvm-svn: 329131	2018-04-03 23:00:30 +00:00
Farhana Aleen	936947349a	Revert "MSG" This reverts commit 9a0ce889d1c39c74d69ecad5ce9c875155ae55de. This was committed by mistake. llvm-svn: 329119	2018-04-03 21:51:45 +00:00
Farhana Aleen	3ab409dc86	MSG llvm-svn: 329114	2018-04-03 21:20:39 +00:00
Dmitry Preobrazhensky	b181c7312e	[AMDGPU][MC][GFX9] Added instructions v_cvt_norm_*16_f16, v_sat_pk_u8_i16 See bug 36847: https://bugs.llvm.org/show_bug.cgi?id=36847 Differential Revision: https://reviews.llvm.org/D45097 Reviewers: artem.tamazov, arsenm, timcorringham llvm-svn: 328988	2018-04-02 17:09:20 +00:00
Dmitry Preobrazhensky	6bad04ecf5	[AMDGPU][MC][GFX9] Added s_atomic_* and s_buffer_atomic_* instructions Fixed a bug which caused Tablegen crash. See bug 36837: https://bugs.llvm.org/show_bug.cgi?id=36837 Differential Revision: https://reviews.llvm.org/D45085 Reviewers: artem.tamazov, arsenm, timcorringham llvm-svn: 328983	2018-04-02 16:10:25 +00:00
Nico Weber	f492f58182	Revert r328975, it makes TableGen assert on the bots. llvm-svn: 328978	2018-04-02 14:20:23 +00:00
Dmitry Preobrazhensky	32c450ae6a	[AMDGPU][MC][GFX9] Added s_atomic_* and s_buffer_atomic_* instructions See bug 36837: https://bugs.llvm.org/show_bug.cgi?id=36837 Differential Revision: https://reviews.llvm.org/D45085 Reviewers: artem.tamazov, arsenm, timcorringham llvm-svn: 328975	2018-04-02 13:52:23 +00:00
Nicolai Haehnle	4254d45a79	AMDGPU: Make isIntrinsicSourceOfDivergence table-driven Summary: This is in preparation for the new dimension-aware image intrinsics, which I'd rather not have to list here by hand. Change-Id: Iaa16e3a635a11283918ce0d9e1e618591b0bf6fa Reviewers: arsenm, rampitec, b-sumner Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, llvm-commits, t-tye Differential Revision: https://reviews.llvm.org/D44938 llvm-svn: 328939	2018-04-01 17:09:14 +00:00
Nicolai Haehnle	5d0d30304c	AMDGPU: Make getTgtMemIntrinsic table-driven for resource-based intrinsics Summary: Avoids having to list all intrinsics manually. This is in preparation for the new dimension-aware image intrinsics, which I'd rather not have to list here by hand. Change-Id: If7ced04998397ef68c4cb8f7de66b5050fb767e5 Reviewers: arsenm, rampitec, b-sumner Subscribers: kzhuravl, wdng, mgorny, yaxunl, dstuttard, tpr, llvm-commits, t-tye Differential Revision: https://reviews.llvm.org/D44937 llvm-svn: 328938	2018-04-01 17:09:07 +00:00
Stanislav Mekhanoshin	74e2974ac6	[AMDGPU] Fixed some instructions latencies Differential Revision: https://reviews.llvm.org/D45073 llvm-svn: 328874	2018-03-30 16:19:13 +00:00
Michael Bedy	59e5ef793c	[AMDGPU] Fix the SDWA Peephole phase to handle src for dst:UNUSED_PRESERVE. Summary: The phase attempts to transform operations that extract a portion of a value into an SDWA src operand in cases where that value is used only once. It was not prepared for this use to be the preserved portion of a value for dst:UNUSED_PRESERVE, resulting in a crash or assert. This change either rejects the illegal SDWA attempt, or in the case where dst:WORD_1 and the src_sel would be WORD_0, removes the unneeded extract instruction. Reviewers: arsenm, #amdgpu Reviewed By: arsenm, #amdgpu Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D44364 llvm-svn: 328856	2018-03-30 05:03:36 +00:00
Matt Arsenault	efd1b30436	AMDGPU: Fix build warning in release llvm-svn: 328832	2018-03-29 21:44:44 +00:00
Matt Arsenault	03ae399d50	AMDGPU: Support realigning stack While the stack access instructions don't care about alignment > 4, some transformations on the pointer calculation do make assumptions based on knowing the low bits of a pointer are 0. If a stack object ends up being accessed through its absolute address (relative to the kernel scratch wave offset), the addressing expression may depend on the stack frame being properly aligned. This was breaking in a testcase due to the add->or combine. I think some of the SP/FP handling logic is still backwards, and overly simplistic to support all of the stack features. Code which tries to modify the SP with inline asm for example or variable sized objects will probably require redoing this. llvm-svn: 328831	2018-03-29 21:30:06 +00:00
Matt Arsenault	ffb132e74b	AMDGPU: Increase default stack alignment 8 and 16-byte values are common, so increase the default alignment to avoid realigning the stack in most functions. llvm-svn: 328821	2018-03-29 20:22:04 +00:00
Matt Arsenault	6c041a3cab	AMDGPU: Fix selection error on constant loads with < 4 byte alignment llvm-svn: 328818	2018-03-29 19:59:28 +00:00
Craig Topper	2fa1436206	[IR][CodeGen] Remove dependency on EVT from IR/Function.cpp. Move EVT to CodeGen layer. Currently EVT is in the IR layer only because of Function.cpp needing a very small piece of the functionality of EVT::getEVTString(). The rest of EVT is used in codegen making CodeGen a better place for it. The previous code converted a Type* to EVT and then called getEVTString. This was only expected to handle the primitive types from Type*. Since there only a few primitive types, we can just print them as strings directly. Differential Revision: https://reviews.llvm.org/D45017 llvm-svn: 328806	2018-03-29 17:21:10 +00:00
David Blaikie	a373d18eb7	Transforms: Introduce Transforms/Utils.h rather than spreading the declarations amongst Scalar.h and IPO.h Fixes layering - Transforms/Utils shouldn't depend on including a Scalar or IPO header, because Scalar and IPO depend on Utils. llvm-svn: 328717	2018-03-28 17:44:36 +00:00
Dmitry Preobrazhensky	622bde8bc7	[AMDGPU][MC] Added ds_add_src2_f32 See bug 36833: https://bugs.llvm.org/show_bug.cgi?id=36833 Differential Revision: https://reviews.llvm.org/D44779 Reviewers: arsenm, artem.tamazov, timcorringham llvm-svn: 328713	2018-03-28 16:21:56 +00:00
Dmitry Preobrazhensky	2456ac696a	[AMDGPU][MC] Added PCK variants of image load/store instructions See bug 36834: https://bugs.llvm.org/show_bug.cgi?id=36834 Differential Revision: https://reviews.llvm.org/D44795 Reviewers: artem.tamazov, arsenm, timcorringham, nhaehnle llvm-svn: 328710	2018-03-28 15:44:16 +00:00
Dmitry Preobrazhensky	a917e88585	[AMDGPU][MC][GFX9] Added buffer_*_format_d16_hi_x See bug 36835: https://bugs.llvm.org/show_bug.cgi?id=36835 Differential Revision: https://reviews.llvm.org/D44825 Reviewers: artem.tamazov, arsenm, timcorringham llvm-svn: 328707	2018-03-28 14:53:13 +00:00
Dmitry Preobrazhensky	dd2b929ffb	[AMDGPU][MC][GFX9] Added s_scratch* instructions See bug 36836: https://bugs.llvm.org/show_bug.cgi?id=36836 Differential Revision: https://reviews.llvm.org/D44832 Reviewers: artem.tamazov, arsenm, timcorringham llvm-svn: 328704	2018-03-28 14:08:03 +00:00
Tim Renouf	cdac172e2a	Revert "[AMDGPU] For OS type AMDPAL, fixed scratch on compute shader" This reverts commit 0daf86291d3aa04d3cc280cd0ef24abdb0174981. It was causing an assert in test/CodeGen/AMDGPU/amdpal.ll only on a release-with-asserts build. I will resubmit the change when I have fixed that. Change-Id: If270594eba27a7dc4076bdeab3fa8e6bfda3288a llvm-svn: 328695	2018-03-28 11:21:07 +00:00
Matt Arsenault	bd49eccca1	AMDGPU: Really implement getFrameRegister Currently this seems to only really be used for debug info. llvm-svn: 328677	2018-03-27 23:26:59 +00:00
Tim Renouf	e4208bfa5b	[AMDGPU] For OS type AMDPAL, fixed scratch on compute shader Summary: For OS type AMDPAL, the scratch descriptor is loaded from offset 0 of the GIT, whose 32 bit pointer is in s0 (s8 for gfx9 merged shaders). This commit fixes that to use offset 0x10 instead of offset 0 for a compute shader, per the PAL ABI spec. Reviewers: kzhuravl, nhaehnle, timcorringham Subscribers: kzhuravl, wdng, yaxunl, t-tye, llvm-commits, dstuttard, nhaehnle, arsenm Differential Revision: https://reviews.llvm.org/D44468 Change-Id: I93dffa647758e37f613bb5e0dfca840d82e6d26f llvm-svn: 328673	2018-03-27 21:35:00 +00:00
Matt Arsenault	17f3338015	AMDGPU: Fix not preserving CSR VGPR if used for SGPR spills Before this was not done if the function had no calls in it. This is still a possible issue with any callable function, regardless of calls present. llvm-svn: 328659	2018-03-27 19:42:55 +00:00
Matt Arsenault	95329f8c53	AMDGPU: Set natural stack alignment in DataLayout Only 4 byte alignment is ever useful, so increasing anything beyond this may require realigning the stack. llvm-svn: 328656	2018-03-27 19:26:40 +00:00
Matt Arsenault	0a0c871f60	AMDGPU: Fix crash when MachinePointerInfo invalid The combine on a select of a load only triggers for addrspace 0, and discards the MachinePointerInfo. The conservative default needs to be used for this. llvm-svn: 328652	2018-03-27 18:39:45 +00:00
Matt Arsenault	e9f3679031	AMDGPU: Fix FP restore from being reordered with stack ops In a function, s5 is used as the frame base SGPR. If a function is calling another function, during the call sequence it is copied to a preserved SGPR and restored. Before it was possible for the scheduler to move stack operations before the restore of s5, since there's nothing to associate a frame index access with the restore. Add an implicit use of s5 to the adjcallstack pseudo which ends the call sequence to preven this from happening. I'm not 100% satisfied with this solution, but I'm not sure what else would be better. llvm-svn: 328650	2018-03-27 18:38:51 +00:00
Tim Corringham	7116e8963d	[AMDGPU] Improve disassembler error handling Summary: llvm-objdump now disassembles unrecognised opcodes as data, using the .long directive. We treat unrecognised opcodes as being 32 bit values, so move along 4 bytes rather than the single byte which previously resulted in a cascade of bogus disassembly following an unrecognised opcode. While no solution can always disassemble code that contains embedded data correctly this provides a significant improvement. The disassembler will now cope with an arbitrary length section as it no longer truncates it to a multiple of 4 bytes, and will use the .byte directive for trailing bytes. Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D44685 llvm-svn: 328553	2018-03-26 17:06:33 +00:00
Nicolai Haehnle	4f850eabb6	AMDGPU: Introduce common SOP_Pseudo and VOP_Pseudo TableGen base classes Differential revision: https://reviews.llvm.org/D44820 Change-Id: I732979e2964006aa15d78a333d8886e6855f319a llvm-svn: 328496	2018-03-26 13:56:53 +00:00
Mandeep Singh Grang	860adef9e6	[AMDGPU] Change std::sort to llvm::sort in response to r327219 Summary: r327219 added wrappers to std::sort which randomly shuffle the container before sorting. This will help in uncovering non-determinism caused due to undefined sorting order of objects having the same key. To make use of that infrastructure we need to invoke llvm::sort instead of std::sort. Reviewers: tstellar, RKSimon, arsenm Reviewed By: arsenm Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, llvm-commits, t-tye Differential Revision: https://reviews.llvm.org/D44856 llvm-svn: 328429	2018-03-24 17:15:04 +00:00
David Blaikie	36a0f226b1	Fix layering by moving ValueTypes.h from CodeGen to IR ValueTypes.h is implemented in IR already. llvm-svn: 328397	2018-03-23 23:58:31 +00:00
David Blaikie	13e77db2df	Fix layering of MachineValueType.h by moving it from CodeGen to Support This is used by llvm tblgen as well as by LLVM Targets, so the only common place is Support for now. (maybe we need another target for these sorts of things - but for now I'm at least making them correct & we can make them better if/when people have strong feelings) llvm-svn: 328395	2018-03-23 23:58:25 +00:00
David Blaikie	6054e650ff	Move TargetLoweringObjectFile from CodeGen to Target to fix layering It's implemented in Target & include from other Target headers, so the header should be in Target. llvm-svn: 328392	2018-03-23 23:58:19 +00:00
Tony Tye	7a893d4e34	[AMDGPU] Remove use of OpenCL triple environment and replace with function attribute for AMDGPU - Remove use of the opencl and amdopencl environment member of the target triple for the AMDGPU target. - Use function attribute to communicate to the AMDGPU backend to add implicit arguments for OpenCL kernels for the AMDHSA OS. Differential Revision: https://reviews.llvm.org/D43736 llvm-svn: 328349	2018-03-23 18:45:18 +00:00
David Blaikie	2be3922807	Fix a couple of layering violations in Transforms Remove #include of Transforms/Scalar.h from Transform/Utils to fix layering. Transforms depends on Transforms/Utils, not the other way around. So remove the header and the "createStripGCRelocatesPass" function declaration (& definition) that is unused and motivated this dependency. Move Transforms/Utils/Local.h into Analysis because it's used by Analysis/MemoryBuiltins.cpp. llvm-svn: 328165	2018-03-21 22:34:23 +00:00
Nirav Dave	3264c1bdf6	[DAG, X86] Revert r327197 "Revert r327170, r327171, r327172" Reland ISel cycle checking improvements after simplifying node id invariant traversal and correcting typo. llvm-svn: 327898	2018-03-19 20:19:46 +00:00
Nicolai Haehnle	4186cc7c08	TableGen: Check the dynamic type of !cast<Rec>(string) Summary: The docs already claim that this happens, but so far it hasn't. As a consequence, existing TableGen files get this wrong a lot, but luckily the fixes are all reasonably straightforward. To make this work with all the existing forms of self-references (since the true type of a record is only built up over time), the lookup of self-references in !cast is delayed until the final resolving step. Change-Id: If5923a72a252ba2fbc81a889d59775df0ef31164 Reviewers: arsenm, craig.topper, tra, MartinO Subscribers: wdng, javed.absar, llvm-commits Differential Revision: https://reviews.llvm.org/D44475 llvm-svn: 327849	2018-03-19 14:14:20 +00:00
Matt Arsenault	fed0a45036	AMDGPU/GlobalISel: RegBankSelect for basic int ops llvm-svn: 327843	2018-03-19 14:07:23 +00:00
Matt Arsenault	69932e4d69	AMDGPU: Don't leave dead illegal VGPR->SGPR copies Normally DCE kills these, but at -O0 these get left behind leaving suspicious looking illegal copies. Replace with IMPLICIT_DEF to avoid iterator issues. llvm-svn: 327842	2018-03-19 14:07:15 +00:00
Nirav Dave	5f0ab71b62	Revert "[DAG, X86] Revert r327197 "Revert r327170, r327171, r327172"" as it times out building test-suite on PPC. llvm-svn: 327778	2018-03-17 19:24:54 +00:00
Nirav Dave	982d3a56ea	[DAG, X86] Revert r327197 "Revert r327170, r327171, r327172" Reland ISel cycle checking improvements after simplifying and reducing node id invariant traversal. llvm-svn: 327777	2018-03-17 17:42:10 +00:00
Matt Arsenault	abdc4f2dc7	AMDGPU/GlobalISel: Cleanup constant legality llvm-svn: 327774	2018-03-17 15:17:48 +00:00
Matt Arsenault	685d1e8157	AMDGPU/GlobalISel: Basic G_GEP legality llvm-svn: 327773	2018-03-17 15:17:45 +00:00
Matt Arsenault	85803366d6	AMDGPU/GlobalISel: Basic legality for load/store llvm-svn: 327772	2018-03-17 15:17:41 +00:00
Farhana Aleen	c6c9dc8773	[AMDGPU] Supported ds_write_b128 generation. Summary: This is a follow-on patch of https://reviews.llvm.org/D44210 Author: FarhanaAleen Reviewed By: msearles Subscribers: llvm-commits, AMDGPU Differential Revision: https://reviews.llvm.org/D44319 llvm-svn: 327726	2018-03-16 18:12:00 +00:00
Dmitry Preobrazhensky	4c8f4234b6	[AMDGPU][MC][GFX8][GFX9][DISASSEMBLER] Added "_e32" suffix to 32-bit VINTRP opcodes See bug 36751: https://bugs.llvm.org/show_bug.cgi?id=36751 Differential Revision: https://reviews.llvm.org/D44529 Reviewers: artem.tamazov, arsenm llvm-svn: 327723	2018-03-16 16:38:04 +00:00
Dmitry Preobrazhensky	9c1a6e7e24	[AMDGPU][MC] Corrected default values for unused SDWA operands See bug 36355: https://bugs.llvm.org/show_bug.cgi?id=36355 Differential Revision: https://reviews.llvm.org/D44481 Reviewers: artem.tamazov, arsenm llvm-svn: 327720	2018-03-16 15:40:27 +00:00
Mark Searles	c3c02bde73	[AMDGPU] Waitcnt pass: Modify the waitcnt pass to propagate info in the case of a single basic block loop. mergeInputScoreBrackets() does this for us; update it so that it processes the single bb's score bracket when processing the single bb's preds. It is, after all, a pred of itself, so it's score bracket is needed. Differential Revision: https://reviews.llvm.org/D44434 llvm-svn: 327583	2018-03-14 22:04:32 +00:00
Dmitry Preobrazhensky	d98c97b4f9	[AMDGPU][MC][GFX8] Added BUFFER_STORE_LDS_DWORD Instruction See bug 36558: https://bugs.llvm.org/show_bug.cgi?id=36558 Differential Revision: https://reviews.llvm.org/D43950 Reviewers: artem.tamazov, arsenm llvm-svn: 327299	2018-03-12 17:29:24 +00:00
Yaxun Liu	a99e7d8e44	[AMDGPU] Fix lowering enqueue kernel when kernel has no name Since the enqueued kernels have internal linkage, their names may be dropped. In this case, give them unique names __amdgpu_enqueued_kernel or __amdgpu_enqueued_kernel.n where n is a sequential number starting from 1. Differential Revision: https://reviews.llvm.org/D44322 llvm-svn: 327291	2018-03-12 16:34:06 +00:00
Dmitry Preobrazhensky	da4a7c01bf	[AMDGPU][MC] Corrected GATHER4 opcodes See bug 36252: https://bugs.llvm.org/show_bug.cgi?id=36252 Differential Revision: https://reviews.llvm.org/D43874 Reviewers: artem.tamazov, arsenm llvm-svn: 327278	2018-03-12 15:03:34 +00:00
Matt Arsenault	7b9ed89dcf	AMDGPU/GlobalISel: Legality and RegBankInfo for G_{INSERT\|EXTRACT}_VECTOR_ELT llvm-svn: 327269	2018-03-12 13:35:53 +00:00
Matt Arsenault	c0aefd561e	AMDGPU/GlobalISel: InstrMapping for G_MERGE_VALUES llvm-svn: 327268	2018-03-12 13:35:49 +00:00
Matt Arsenault	503afda95f	AMDGPU/GlobalISel: Make some G_MERGE_VALUEs legal llvm-svn: 327267	2018-03-12 13:35:43 +00:00
Michael Bedy	80cf9ff564	Test commit - change comment slightly. llvm-svn: 327234	2018-03-11 03:27:50 +00:00
Matt Arsenault	cbda7ff4ae	AMDGPU: Fix crash when constant folding with physreg operand llvm-svn: 327209	2018-03-10 16:05:35 +00:00
Nirav Dave	042678bd55	Revert: r327172 "Correct load-op-store cycle detection analysis" r327171 "Improve Dependency analysis when doing multi-node Instruction Selection" r328170 "[DAG] Enforce stricter NodeId invariant during Instruction selection" Reverting patch as NodeId invariant change is causing pathological increases in compile time on PPC llvm-svn: 327197	2018-03-10 02:16:15 +00:00
Nirav Dave	071699bf82	[DAG] Enforce stricter NodeId invariant during Instruction selection Instruction Selection makes use of the topological ordering of nodes by node id (a node's operands have smaller node id than it) when doing cycle detection. During selection we may violate this property as a selection of multiple nodes may induce a use dependence (and thus a node id restriction) between two unrelated nodes. If a selected node has an unselected successor this may allow us to miss a cycle in detection an invalid selection. This patch fixes this by marking all unselected successors of a selected node have negated node id. We avoid pruning on such negative ids but still can reconstruct the original id for pruning. In-tree targets have been updated to replace DAG-level replacements with ISel-level ones which enforce this property. This preemptively fixes PR36312 before triggering commit r324359 relands Reviewers: craig.topper, bogner, jyknight Subscribers: arsenm, nhaehnle, javed.absar, llvm-commits, hiraditya Differential Revision: https://reviews.llvm.org/D43198 llvm-svn: 327170	2018-03-09 20:57:15 +00:00
Farhana Aleen	a7cb31123c	[AMDGPU] Supported ds_read_b128 generation; Widened vector length for local address-space. Summary: Starting from GCN 2nd generation, ISA supports ds_read_b128 on top of ds_read_b64. This patch supports ds_read_b128 instruction pattern and generation of this instruction. In the vectorizer, this patch also widen the vector length so that vectorizer generates 128 bit loads for local address-space which gets translated to ds_read_b128. Since the performance benefit is not clear; compiler generates ds_read_b128 under -amdgpu-ds128. Author: FarhanaAleen Reviewed By: rampitec, arsenm Subscribers: llvm-commits, AMDGPU Differential Revision: https://reviews.llvm.org/D44210 llvm-svn: 327153	2018-03-09 17:41:39 +00:00
Stanislav Mekhanoshin	c8127fc674	[AMDGPU] Fixed V_DIV_FIXUP_F16 selection on GFX9 GFX9 should select opsel version. Differential Revision: https://reviews.llvm.org/D44279 llvm-svn: 327106	2018-03-09 07:21:43 +00:00
Matt Arsenault	c3fe46bbcf	AMDGPU/GlobalISel: Pass subtarget + TM to LegalizerInfo These are the parameters x86 already uses. llvm-svn: 327020	2018-03-08 16:24:16 +00:00
Farhana Aleen	89196642f7	[AMDGPU] Increased vector length for global/constant loads. Summary: GCN ISA supports instructions that can read 16 consecutive dwords from memory through the scalar data cache; loadstoreVectorizer should take advantage of the wider vector length and pack 16/8 elements of dwords/quadwords. Author: FarhanaAleen Reviewed By: rampitec Subscribers: llvm-commits, AMDGPU Differential Revision: https://reviews.llvm.org/D44179 llvm-svn: 326910	2018-03-07 17:09:18 +00:00
Farhana Aleen	347d12b4ce	Revert "[AMDGPU] Widened vector length for global/constant address space." This reverts commit ce988cc100dc65e7c6c727aff31ceb99231cab03. llvm-svn: 326907	2018-03-07 16:55:27 +00:00
Farhana Aleen	0d03d0588d	[AMDGPU] Widened vector length for global/constant address space. llvm-svn: 326904	2018-03-07 16:29:05 +00:00
Craig Topper	80d3bb3b4b	[TargetLowering] Rename DAGCombinerInfo::isAfterLegalizeVectorOps to DAGCombiner::isAfterLegalizeDAG since that's what it checks. NFC The code checks Level == AfterLegalizeDAG which is the fourth and last of the possible DAG combine stages that we have. There is a Level called AfterLegalVectorOps, but that's the third DAG combine and it doesn't always run. A function called isAfterLegalVectorOps should imply it returns true in either of the DAG combines that runs after the legalize vector ops stage, but that's not what this function does. llvm-svn: 326832	2018-03-06 19:44:52 +00:00
Stanislav Mekhanoshin	0f72225433	[AMDGPU] Add default ISA version targets In case if -mattr used to modify feature set bits in llvm-mc call getIsaVersion can fail to identify specific ISA due to test mismatch. Adding default fallback tests which will always correctly report at least major version. Differential Revision: https://reviews.llvm.org/D44163 llvm-svn: 326825	2018-03-06 18:33:55 +00:00
Yaxun Liu	46439e8d4a	[AMDGPU] Fix lowering OpenCL enqueue_kernel One addrspacecast disappeared in clang emitted IR for block invoke function due to adoption of the new addr space mapping. Differential Revision: https://reviews.llvm.org/D43785 llvm-svn: 326806	2018-03-06 16:04:39 +00:00
Matt Arsenault	e31ab94e97	AMDGPU/GlobalISel: Add InstrMapping for G_EXTRACT llvm-svn: 326715	2018-03-05 16:25:18 +00:00
Matt Arsenault	71272e6d4e	AMDGPU/GlobalISel: Make some G_EXTRACTs legal As far as I can tell legalization of weird sizes for the output type isn't implemented. llvm-svn: 326714	2018-03-05 16:25:15 +00:00
Matt Arsenault	4cc0b85276	AMDGPU: Fix build warning about override llvm-svn: 326713	2018-03-05 16:25:10 +00:00
Alexander Timofeev	2e5eeceeb7	Pass Divergence Analysis data to Selection DAG to drive divergence dependent instruction selection. Differential revision: https://reviews.llvm.org/D35267 llvm-svn: 326703	2018-03-05 15:12:21 +00:00
Matt Arsenault	b9699c009d	AMDGPU/GlobalISel: InstrMapping for G_ZEXT llvm-svn: 326589	2018-03-02 16:55:37 +00:00
Matt Arsenault	1c1aab99ae	AMDGPU/GlobalISel: InstrMapping for G_TRUNC llvm-svn: 326588	2018-03-02 16:55:33 +00:00
Matt Arsenault	ef8db767d7	AMDGPU/GlobalISel: Define InstrMappings for G_FCMP Patch by Tom Stellard llvm-svn: 326587	2018-03-02 16:53:15 +00:00
Matt Arsenault	2607dc60de	AMDGPU/GlobalISel: Define instruction mapping for @llvm.minnum Patch by Tom Stellard llvm-svn: 326586	2018-03-02 16:40:17 +00:00
Matt Arsenault	b46c191c49	AMDGPU/GlobalISel: Define instruction mapping for @llvm.maxnum Patch by Tom Stellard llvm-svn: 326567	2018-03-02 12:23:00 +00:00
Jan Vesely	b283ea0f0f	AMDGPU/GCN: Promote i16 ctpop i16 capable ASICs do not support i16 operands for this instruction. Add tablegen pattern to merge chained i16 additions. Differential Revision: https://reviews.llvm.org/D43985 llvm-svn: 326535	2018-03-02 02:50:22 +00:00
Matt Arsenault	41d2e3d98e	AMDGPU/GlobalISel: Define instruction mapping for G_FPTOSI Patch by Tom Stellard llvm-svn: 326534	2018-03-02 02:19:16 +00:00
Matt Arsenault	b23041ad4d	AMDGPU/GlobalISel: Define instruction mapping for G_FPTOUI Patch by Tom Stellard llvm-svn: 326533	2018-03-02 02:19:11 +00:00
Matt Arsenault	327d5fb2e5	AMDGPU/GlobalISel: Define instruction mapping for G_FMUL llvm-svn: 326532	2018-03-02 02:17:01 +00:00
Matt Arsenault	5a9e834eac	AMDGPU/GlobalISel: Define instruction mapping for G_FADD Patch by Tom Stellard llvm-svn: 326526	2018-03-02 01:22:13 +00:00
Matt Arsenault	d99317f1b3	AMDGPU/GlobalISel: Define instruction mapping for G_SHL Patch by Tom Stellard llvm-svn: 326525	2018-03-02 01:22:10 +00:00
Matt Arsenault	3c7a123ccc	AMDGPU/GlobalISel: Define instruction mapping for G_XOR llvm-svn: 326524	2018-03-02 01:22:06 +00:00
Matt Arsenault	c0f34c9e36	AMDGPU/GlobalISel: Define instruction mapping for G_AND Patch by Tom Stellard llvm-svn: 326523	2018-03-02 01:22:01 +00:00
Matt Arsenault	364f12e8f9	AMDGPU/GlobalISel: Define instruction mapping for @llvm.amdgcn.cvt.pkrtz Patch by Tom Stellard llvm-svn: 326490	2018-03-01 21:25:30 +00:00
Matt Arsenault	5320ee4a05	AMDGPU/GlobalISel: Define instruction mapping for G_OR Patch by Tom Stellard llvm-svn: 326489	2018-03-01 21:25:25 +00:00
Matt Arsenault	e65404f5c5	AMDGPU/GlobalISel: Remove default register mapping This crashes for some opcodes, which prevents the SelectionDAG fallback from working. Patch by Tom Stellard llvm-svn: 326487	2018-03-01 21:20:44 +00:00
Matt Arsenault	1422a19a88	AMDGPU/GlobalISel: Use a more correct getValueMapping This was finding the wrong size registers for anything with more than 2 components. Patch by Tom Stellard llvm-svn: 326483	2018-03-01 21:08:51 +00:00
Matt Arsenault	62669ede94	AMDGPU/GlobalISel: Define instruction mapping for G_BITCAST Patch by Tom Stellard llvm-svn: 326482	2018-03-01 20:59:44 +00:00
Matt Arsenault	0529a8e2de	AMDGPU/GlobalISel: Mark i32->i64 zext as legal llvm-svn: 326481	2018-03-01 20:56:21 +00:00
Matt Arsenault	36b99e1937	AMDGPU/GlobalISel: InstrMapping for llvm.amdgcn.exp.compr Patch by Tom Stellard llvm-svn: 326479	2018-03-01 20:40:55 +00:00
Matt Arsenault	8931bbf8df	AMDGPU/GlobalISel: Define instruction mapping for @llvm.amdgcn.exp Patch by Tom Stellard llvm-svn: 326477	2018-03-01 20:24:37 +00:00
Matt Arsenault	50721ab325	AMDGPU/GlobalISel: Define InstrMappings for G_ICMP Patch by Tom Stellard llvm-svn: 326472	2018-03-01 19:27:10 +00:00
Matt Arsenault	dc14ec05d4	AMDGPU/GlobalISel: Make i32 mul legal llvm-svn: 326471	2018-03-01 19:22:05 +00:00
Matt Arsenault	06cbb27a79	AMDGPU/GlobalISel: Define instruction mapping for G_IMPLICIT_DEF Patch by Tom Stellard llvm-svn: 326470	2018-03-01 19:16:52 +00:00
Matt Arsenault	e3d9ecf2b9	AMDGPU/GlobalISel: Define instruction mapping for G_FCONSTANT Patch by Tom Stellard llvm-svn: 326468	2018-03-01 19:13:30 +00:00
Matt Arsenault	51b0b20023	AMDGPU/GlobalISel: Add copyCost for VGPR->SGPR copies Patch by Tom Stellard llvm-svn: 326467	2018-03-01 19:09:25 +00:00
Matt Arsenault	3f6a204eaa	AMDGPU/GlobalISel: Make i32 xor legal llvm-svn: 326466	2018-03-01 19:09:21 +00:00
Matt Arsenault	8e80a5fbca	AMDGPU/GlobalISel: Mark 32/64-bit G_FCMP as legal Patch by Tom Stellard llvm-svn: 326465	2018-03-01 19:09:16 +00:00
Matt Arsenault	dd022ce064	AMDGPU/GlobalISel: Mark 32-bit G_FPTOSI as legal Patch by Tom Stellard llvm-svn: 326464	2018-03-01 19:04:25 +00:00
Alexander Timofeev	0081d23fd8	[AMDGPU] : fix for the crash in SIRegisterInfo when the regiser class not found Differential revision: https://reviews.llvm.org./D43334 llvm-svn: 326451	2018-03-01 17:36:43 +00:00
Tim Renouf	2a99fa2c08	[AMDGPU] added writelane intrinsic Summary: For use by LLPC SPV_AMD_shader_ballot extension. The v_writelane instruction was already implemented for use by SGPR spilling, but I had to add an extra dummy operand tied to the destination, to represent that all lanes except the selected one keep the old value of the destination register. .ll test changes were due to schedule changes caused by that new operand. Differential Revision: https://reviews.llvm.org/D42838 llvm-svn: 326353	2018-02-28 19:10:32 +00:00
Konstantin Zhuravlyov	40b09e86b9	AMDGPU: Add fast fmaf feature to gfx702 Differential Revision: https://reviews.llvm.org/D43790 llvm-svn: 326252	2018-02-27 21:46:15 +00:00
Matt Arsenault	2a26a286db	AMDGPU/GlobalISel: Make f64 constants legal llvm-svn: 326101	2018-02-26 17:20:43 +00:00
Tim Renouf	832f90fa0c	[AMDGPU] Scratch setup fix on AMDPAL gfx9+ merge shader Summary: With OS type AMDPAL, the scratch descriptor is hardwired to be loaded from offset 0 of the global information table, whose low pointer is passed in s0. For a merge shader on gfx9+, it needs to be s8 instead, as the hardware reserves s0-s7. Reviewers: kzhuravl Subscribers: arsenm, nhaehnle, dstuttard, llvm-commits, t-tye, yaxunl, wdng, kzhuravl Differential Revision: https://reviews.llvm.org/D42203 llvm-svn: 326088	2018-02-26 14:46:43 +00:00
Stanislav Mekhanoshin	fa48c496e2	[AMDGPU] Shrinking V_SUBBREV_U32 V_SUBBREV_U32 is a commute opcode for V_SUBB_U32. However, when we try to commute V_SUBB_U32 in order to shrink it we do not then process V_SUBBREV_U32 and it stay VOP3. This is fixed. Differential Revision: https://reviews.llvm.org/D43699 llvm-svn: 326011	2018-02-24 01:32:32 +00:00
Geoff Berry	d6ba3dbbbd	Fix compiler warning introduced in r325931. NFC. llvm-svn: 325938	2018-02-23 19:11:33 +00:00
Geoff Berry	f8bf2ec0a8	[MachineOperand][Target] MachineOperand::isRenamable semantics changes Summary: Add a target option AllowRegisterRenaming that is used to opt in to post-register-allocation renaming of registers. This is set to 0 by default, which causes the hasExtraSrcRegAllocReq/hasExtraDstRegAllocReq fields of all opcodes to be set to 1, causing MachineOperand::isRenamable to always return false. Set the AllowRegisterRenaming flag to 1 for all in-tree targets that have lit tests that were effected by enabling COPY forwarding in MachineCopyPropagation (AArch64, AMDGPU, ARM, Hexagon, Mips, PowerPC, RISCV, Sparc, SystemZ and X86). Add some more comments describing the semantics of the MachineOperand::isRenamable function and how it is set and maintained. Change isRenamable to check the operand's opcode hasExtraSrcRegAllocReq/hasExtraDstRegAllocReq bit directly instead of relying on it being consistently reflected in the IsRenamable bit setting. Clear the IsRenamable bit when changing an operand's register value. Remove target code that was clearing the IsRenamable bit when changing registers/opcodes now that this is done conservatively by default. Change setting of hasExtraSrcRegAllocReq in AMDGPU target to be done in one place covering all opcodes that have constant pipe read limit restrictions. Reviewers: qcolombet, MatzeB Subscribers: aemerson, arsenm, jyknight, mcrosier, sdardis, nhaehnle, javed.absar, tpr, arichardson, kristof.beyls, kbarton, fedor.sergeev, asb, rbar, johnrusso, simoncook, jordy.potman.lists, apazos, sabuasal, niosHD, escha, nemanjai, llvm-commits Differential Revision: https://reviews.llvm.org/D43042 llvm-svn: 325931	2018-02-23 18:25:08 +00:00
Nicolai Haehnle	6cf306deca	AMDGPU: Track physreg uses in SILoadStoreOptimizer Summary: This handles def-after-use of physregs, and allows us to merge loads and stores even across some physreg defs (typically M0 defs). Change-Id: I076484b2bda27c2cf46013c845a0380c5b89b67b Reviewers: arsenm, mareko, rampitec Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D42647 llvm-svn: 325882	2018-02-23 10:45:56 +00:00
Nicolai Haehnle	40b140fef1	AMDGPU: Stop using .NAME in .td files Summary: .NAME is a bit of an odd duck, in that we should really treat it like a template argument, but we currently don't, and so when and where NAME is initialized and how is pretty inconsistent. Best to just avoid using it as a field of already instantiated records, and use cast to string instead. Change-Id: I5a0c202401cede3d5c3827ab9c7858ea48b29108 Reviewers: arsenm, rampitec Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D43551 llvm-svn: 325794	2018-02-22 15:25:11 +00:00
Hiroshi Inoue	7f9f92f8b6	[NFC] fix trivial typos in comments "a a" -> "a" llvm-svn: 325752	2018-02-22 07:48:29 +00:00
Nicolai Haehnle	770397f4cd	AMDGPU: Do not combine loads/store across physreg defs Summary: Since this pass operates on machine SSA form, this should only really affect M0 in practice. Fixes various piglit variable-indexing/vs-varying-array-mat4-index-* Change-Id: Ib2a1dc3a8d7b08225a8da49a86f533faa0986aa8 Fixes: r317751 ("AMDGPU: Merge S_BUFFER_LOAD_DWORD_IMM into x2, x4") Reviewers: arsenm, mareko, rampitec Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D40343 llvm-svn: 325677	2018-02-21 13:31:35 +00:00
Dmitry Preobrazhensky	d6e1a9404d	[AMDGPU][MC] Added lds support for MUBUF instructions See bug 28234: https://bugs.llvm.org/show_bug.cgi?id=28234 Differential Revision: https://reviews.llvm.org/D43472 Reviewers: vpykhtin, artem.tamazov, arsenm llvm-svn: 325676	2018-02-21 13:13:48 +00:00
Konstantin Zhuravlyov	5c1237a1fd	Revert "[AMDGPU] Increased vector length for global/constant loads." https://reviews.llvm.org/rL325518 It breaks following OpenCL conformance tests: - Basic - parameter_types - Basic - vload_private llvm-svn: 325643	2018-02-20 23:30:21 +00:00
Tim Renouf	8234b4893a	[AMDGPU] stop buffer_store being moved illegally Summary: The machine instruction scheduler was illegally moving a buffer store past a buffer load with the same descriptor and offset. Fixed by marking buffer ops as mayAlias and isAliased. This may be overly conservative, and we may need to revisit. Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D43332 Change-Id: Iff3173d9e0653e830474546276ab9d30318b8ef7 llvm-svn: 325567	2018-02-20 10:03:38 +00:00

... 3 4 5 6 7 ...

2814 Commits