llvm-project

Commit Graph

Author	SHA1	Message	Date
Jay Foad	7e43483dd1	[AMDGPU] Remove set_gpr_idx instructions in conditional blocks SIPreEmitPeephole did not try to remove redundant s_set_gpr_idx_* instructions in blocks that end with a conditional branch instruction. This seems like a simple oversight. Differential Revision: https://reviews.llvm.org/D101629	2021-04-30 22:15:45 +01:00
Daniil Fukalov	3489c2d7b1	[TTI] NFC: Change getTypeLegalizationCost to return InstructionCost. This patch migrates the TTI cost interfaces to return an InstructionCost. See this patch for the introduction of the type: https://reviews.llvm.org/D91174 See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html Reviewed By: sdesmalen, kparzysz Differential Revision: https://reviews.llvm.org/D101533	2021-04-30 22:51:51 +03:00
David Stuttard	a67a377014	[AMDGPU] Tidy up some simple expressions for clarity NFC Slight refactor for clarity. Change-Id: Ib25e7f4582c67a7c57f066cfd5382c1405d7d4c5 Differential Revision: https://reviews.llvm.org/D101610	2021-04-30 11:13:54 +01:00
Jay Foad	f251379a91	[AMDGPU] Simplify getWaitStatesSince. NFC.	2021-04-30 08:58:24 +01:00
Christudasan Devadasan	544be70864	[AMDGPU] Skip promote-alloca for insertelement/insertvalue users It is difficult to track the users of vector and aggregate types. Reviewed by: arsenm Differential Revision: https://reviews.llvm.org/D101562	2021-04-30 08:37:26 +05:30
Carl Ritson	424f1f6f96	[AMDGPU][NFC] Refactor hazard recognition IsHazardFn and IsExpiredFn Refactor IsHazardFn and IsExpiredFn to use constant references as these should not be mutating the instructions visited and the instruction can never be null. Reviewed By: foad Differential Revision: https://reviews.llvm.org/D101430	2021-04-30 09:18:56 +09:00
Carl Ritson	749702fc6b	[AMDGPU] Remove dead early-out in GCNHazardRecognizer Remove an early-out in wait state counting which can never be taken. Reviewed By: foad, rampitec Differential Revision: https://reviews.llvm.org/D101520	2021-04-30 08:55:49 +09:00
Jay Foad	16d707e656	[AMDGPU] Fix v_swap_b32 formation on physical registers As explained in the comments, matchSwap matches: // mov t, x // mov x, y // mov y, t and turns it into: // mov t, x (t is potentially dead and move eliminated) // v_swap_b32 x, y On physical registers we don't have full use-def chains so the check for T being live-out was not working properly with subregs/superregs. Differential Revision: https://reviews.llvm.org/D101546	2021-04-29 20:53:40 +01:00
Alexey Bataev	12c51f2358	[COST] Improve shuffle kind detection if shuffle mask is provided. Added an extra analysis for better choosing of shuffle kind in getShuffleCost functions for better cost estimation if mask was provided. Differential Revision: https://reviews.llvm.org/D100865	2021-04-29 12:48:00 -07:00
Alexey Bataev	6e859f3cd4	Revert "[COST] Improve shuffle kind detection if shuffle mask is provided." This reverts commit `9239932221` to fix a compiler crash on mask checks.	2021-04-29 12:40:33 -07:00
Petar Avramovic	c34900e133	AMDGPU/GlobalISel: Fix selection of image intrinsics with unused return When atomic image intrinsic return value is unused, register class for destination of a sub-register copy of return value ends up not being set. This copy then hits 'Register class not set' assert later. If return value has uses, register class is determined by use instruction. Fix is to not create sub-register copy when image intrinsic destination has no uses because it would be deleted by dead-mi-elimination later anyway. Differential Revision: https://reviews.llvm.org/D101448	2021-04-29 20:56:03 +02:00
Alexey Bataev	9239932221	[COST] Improve shuffle kind detection if shuffle mask is provided. Added an extra analysis for better choosing of shuffle kind in getShuffleCost functions for better cost estimation if mask was provided. Differential Revision: https://reviews.llvm.org/D100865	2021-04-29 09:42:56 -07:00
Sebastian Neubauer	9569d5ba02	[AMDGPU] Allow buildSpillLoadStore in empty bb This allows calling buildSpillLoadStore for an empty basic block, where MI points at the end of the block instead of to an instruction. This only happens with downstream CFI changes, so I was not able to create a testcase that works with upstream LLVM. Differential Revision: https://reviews.llvm.org/D101356	2021-04-29 12:53:20 +02:00
Joe Nash	168228d76a	[AMDGPU] Make some VOP3 insts commutable Note, only src0 and src1 will be commuted if the isCommutable flag is set. This patch does not change that, it just makes it possible to commute src0 and src1 of some U/I/B vop3 instructions. This patch revises `d35d8da7d6`. It contains the commute opportunities excluding float insts Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D101474 Change-Id: I62938173d750453839f2457a3851661a29135faf	2021-04-28 13:59:08 -04:00
Jay Foad	12011b5217	[AMDGPU] GCNHazardRecognizer: ignore all meta instructions This is hopefully NFC, but should be more robust in ignoring all instructions that should be ignored, instead of just some of them. Differential Revision: https://reviews.llvm.org/D101372	2021-04-27 20:17:15 +01:00
Jay Foad	dc2f6bf566	[AMDGPU] Minor refactoring in AMDGPUUnifyDivergentExitNodes. NFC. Make unifyReturnBlockSet a member function so we don't have to pass TTI around as an argument.	2021-04-27 14:21:51 +01:00
Petar Avramovic	8110fcc8fc	AMDGPU/GlobalISel: Fix negative offset folding for buffer_load Buffer_load does unsigned offset calculations. Don't fold operands of 32-bit add that are likely to cause unsigned add overflow (common case is when one of the operands is negative). Differential Revision: https://reviews.llvm.org/D91336	2021-04-27 14:45:22 +02:00
Petar Avramovic	fb7be0d912	AMDGPU/GlobalISel: Remove redundant G_FCANONICALIZE Add basic version of isCanonicalized for global-isel. Copied from sdag. Add post legalizer combine that deletes G_FCANONICALIZE when its input is already Canonicalized. Differential Revision: https://reviews.llvm.org/D96605	2021-04-27 12:26:37 +02:00
Petar Avramovic	4a9bc59867	AMDGPU/GlobalISel: Add integer med3 combines Add signed and unsigned integer version of med3 combine. Source pattern is min(max(Val, K0), K1) or max(min(Val, K1), K0) where K0 and K1 are constants and K0 <= K1. Destination is med3 that corresponds to signedness of min/max in source. Differential Revision: https://reviews.llvm.org/D90050	2021-04-27 11:52:23 +02:00
Baptiste Saleil	caf1294d95	[AMDGPU] Experiments show that the GCNRegBankReassign pass significantly impacts the compilation time and there is no case for which we see any improvement in performance. This patch removes this pass and its associated test cases from the tree. Differential Revision: https://reviews.llvm.org/D101313 Change-Id: I0599169a7609c19a887f8d847a71e664030cc141	2021-04-26 17:21:49 -04:00
Sebastian Neubauer	fcc40d9c17	[AMDGPU] Use MapVector for WWMReservedRegs Use MapVector instead of SmallDenseMap because it has a deterministic iteration order. Differential Revision: https://reviews.llvm.org/D101299	2021-04-26 17:43:00 +02:00
Tim Renouf	8710eff6c3	[MC][AMDGPU][llvm-objdump] Synthesized local labels in disassembly 1. Add an accessor function to MCSymbolizer to retrieve addresses referenced by a symbolizable operand, but not resolved to a symbol. That way, the caller can synthesize labels at those addresses and then retry disassembling the section. 2. Implement that in AMDGPU -- a failed symbol lookup results in the address being added to a vector returned by the new function. 3. Use that in llvm-objdump when using MCSymbolizer (which only happens on AMDGPU) and SymbolizeOperands is on. Differential Revision: https://reviews.llvm.org/D101145 Change-Id: I19087c3bbfece64bad5a56ee88bcc9110d83989e	2021-04-26 13:56:36 +01:00
Sebastian Neubauer	3366d81153	[AMDGPU] Save WWM registers in functions The values of registers in inactive lanes needs to be saved during function calls. Save all registers used for whole wave mode, similar to how it is done for VGPRs that are used for SGPR spilling. Differential Revision: https://reviews.llvm.org/D99429 Reapply with fixed tests on window.	2021-04-23 18:09:24 +02:00
dfukalov	9ab17a60eb	[TTI] NFC: Use InstructionCost to store ScalarizationCost in IntrinsicCostAttributes. This patch migrates the TTI cost interfaces to return an InstructionCost. See this patch for the introduction of the type: https://reviews.llvm.org/D91174 See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html Reviewed By: samparker Differential Revision: https://reviews.llvm.org/D101151	2021-04-23 18:02:00 +03:00
Sebastian Neubauer	22d99cb63f	Revert "[AMDGPU] Save WWM registers in functions" This reverts commit `91464c30bf`. Seems to break tests on windows.	2021-04-23 16:38:50 +02:00
Sebastian Neubauer	91464c30bf	[AMDGPU] Save WWM registers in functions The values of registers in inactive lanes needs to be saved during function calls. Save all registers used for whole wave mode, similar to how it is done for VGPRs that are used for SGPR spilling. Differential Revision: https://reviews.llvm.org/D99429	2021-04-23 16:09:31 +02:00
Matt Arsenault	b58332774f	AMDGPU: Fix assert on inline asm on gfx90a This was assuming all mayLoad instructions have one def.	2021-04-23 09:00:25 -04:00
Matt Arsenault	ed633a1daa	AMDGPU: Restore atomic fp feature on FP atomic instruction definitions `9931b1f7a4` switched this to checking for the two specific subtargets, instead of the dedicated feature. This broke supporting functions which force added the feature when emitting targets that do not actually support them. This stil does not work for the targets that use the gfx6/7 or gfx10 encodings.	2021-04-22 21:32:01 -04:00
Jay Foad	79cb3ba08f	[AMDGPU] SIWholeQuadMode: don't add duplicate implicit $exec operands STRICT_WWM and STRICT_WQM are already defined with Uses = [EXEC], so there is no need to add another implicit use of $exec when lowering them to V_MOV_B32 instructions. Differential Revision: https://reviews.llvm.org/D100969	2021-04-22 09:19:47 +01:00
Matt Arsenault	987e52851e	AMDGPU: Fix assert when trying to fold reg_sequence of physreg copies	2021-04-21 21:58:18 -04:00
Stanislav Mekhanoshin	f9d0d0d7e0	[AMDGPU] Lower regbanks reassign threshold to 15000 Let it work on a very small kernels only. Measurements showed the performance benefit is not worth the compile time. Differential Revision: https://reviews.llvm.org/D100904	2021-04-21 08:34:11 -07:00
dfukalov	a8b35e0f52	[TTI] NFC: Change getVectorSplitCost to return InstructionCost This patch migrates the TTI cost interfaces to return an InstructionCost. See this patch for the introduction of the type: https://reviews.llvm.org/D91174 See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html Reviewed By: sdesmalen Differential Revision: https://reviews.llvm.org/D100952	2021-04-21 17:32:02 +03:00
Matt Arsenault	70ab76a81b	AMDGPU: Fix indirect tail calls Fix a selection error on uniform callees, and use a regular call if divergent.	2021-04-21 09:15:24 -04:00
Jay Foad	ec8c61efdf	[AMDGPU] Allow multiple uses of the same literal In GFX10 VOP3 can have a literal, which opens up the possibility of two operands using the same literal value, which is allowed and only counts as one use of the constant bus. AMDGPUAsmParser::validateConstantBusLimitations already knew about this but SIInstrInfo::verifyInstruction did not. Differential Revision: https://reviews.llvm.org/D100770	2021-04-20 16:44:01 +01:00
Matt Arsenault	1cb8a9d595	AMDGPU/GlobalISel: Fix uitofp/sitofp with non-power-of-2 integers	2021-04-20 11:13:29 -04:00
Sebastian Neubauer	4897effb14	[AMDGPU] Add TransVALU to gfx10 Instructions on the transcendental unit are executed in parallel to the normal VALU, so add this as an extra resource. This doesn't seem to have any effect, but it should be more correct. Differential Revision: https://reviews.llvm.org/D100123	2021-04-20 15:34:43 +02:00
Jay Foad	2aea830ec4	[AMDGPU] Use if instead of foreach in a few places. NFC.	2021-04-20 14:20:30 +01:00
Jay Foad	edea476142	[AMDGPU] Use simpler alternatives to !foldl. NFC.	2021-04-20 12:59:04 +01:00
hsmahesha	840c4e4e90	[AMDGPU] Re-arrange ds_read/ds_write ISel pattern for better readability. Reviewed By: foad Differential Revision: https://reviews.llvm.org/D100773	2021-04-20 16:17:15 +05:30
Jay Foad	b22721f01a	[AMDGPU] GCNDPPCombine: don't shrink V_ADD_CO_U32 if carry out is used Don't shrink VOP3 instructions if there are any uses of a carry-out operand, because the shrunken form of the instruction would write the carry-out to vcc instead of to a virtual register. Differential Revision: https://reviews.llvm.org/D100760	2021-04-20 09:17:52 +01:00
madhur13490	6a4d9cb7e0	[AMDGPU] Remove error check for indirect calls and add missing queue-ptr This patch removes -fixed-abi check for indirect calls and also adds queue-ptr which is required for indirect calls to work. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D100633	2021-04-20 00:35:17 +05:30
Jay Foad	a02aa91313	[AMDGPU] GCNDPPCombine: simplify API of isShrinkable. NFC.	2021-04-19 14:20:46 +01:00
Jay Foad	ef443390a9	[AMDGPU] Remove MachineDCE after SIFoldOperands Remove the MachineDCE pass after the first SIFoldOperands pass now that SIFoldOperands deletes its own dead instructions. Reapply after fixing dependent change D100188. Differential Revision: https://reviews.llvm.org/D100189	2021-04-19 12:08:02 +01:00
Jay Foad	323ef0eb45	[AMDGPU] SIFoldOperands: eagerly erase dead REG_SEQUENCEs This is fairly cheap to implement and means less work for future passes like MachineDCE. Reapply with a fix for using InstToErase after it had been erased. Differential Revision: https://reviews.llvm.org/D100188	2021-04-19 12:05:41 +01:00
Dmitry Preobrazhensky	bcc29e0fcf	[AMDGPU][MC] Corrected parsing of carry in/out operands in VOP3 Disabled constants as carry in/out operands. See bug 48711. Differential Revision: https://reviews.llvm.org/D100642	2021-04-19 13:42:31 +03:00
Yaxun (Sam) Liu	3597f02fd5	[AMDGPU] Add GlobalDCE before internalization pass The internalization pass only internalizes global variables with no users. If the global variable has some dead user, the internalization pass will not internalize it. To be able to internalize global variables with dead users, a global dce pass is needed before the internalization pass. This patch adds that. Reviewed by: Artem Belevich, Matt Arsenault Differential Revision: https://reviews.llvm.org/D98783	2021-04-17 11:25:25 -04:00
Serge Guelton	d6de1e1a71	Normalize interaction with boolean attributes Such attributes can either be unset, or set to "true" or "false" (as string). throughout the codebase, this led to inelegant checks ranging from if (Fn->getFnAttribute("no-jump-tables").getValueAsString() == "true") to if (Fn->hasAttribute("no-jump-tables") && Fn->getFnAttribute("no-jump-tables").getValueAsString() == "true") Introduce a getValueAsBool that normalize the check, with the following behavior: no attributes or attribute set to "false" => return false attribute set to "true" => return true Differential Revision: https://reviews.llvm.org/D99299	2021-04-17 08:17:33 +02:00
Joe Nash	a0ed70abde	[AMDGPU] Remove redundant field from DPP8 def These lines set the value to what it already was, so they are redundant. NFC Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D100664 Change-Id: Ibf6f27d50a7fa1f76c127f01b799821378bfd3b3	2021-04-16 16:23:52 -04:00
Joe Nash	919236e608	[AMDGPU] NFC, Comment in disassembler for dpp8 Gives reasoning for convertDPP8. Also corrects typo in Operand type comment. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D100665 Change-Id: I33ff269db8072d83e5e0ecdbfb731d6000fc26c4	2021-04-16 16:21:47 -04:00
Christudasan Devadasan	97618522dc	[AMDGPU] Remove dead dcode (NFC).	2021-04-16 23:03:31 +05:30

1 2 3 4 5 ...

5948 Commits