llvm-project

Commit Graph

Author	SHA1	Message	Date
Christudasan Devadasan	d6aa4aa29a	[AMDGPU] Some refactoring after D90404. NFC.	2020-11-01 13:18:53 +05:30
Christudasan Devadasan	9bb2b4f0aa	[AMDGPU] Add alignment check for v3 to v4 load type promotion It should be enabled only when the load alignment is at least 8-byte. Fixes: SWDEV-256824 Reviewed By: foad Differential Revision: https://reviews.llvm.org/D90404	2020-11-01 12:05:34 +05:30
Jay Foad	5b91a6a88b	[AMDGPU] Allow some modifiers on VOP3B instructions V_DIV_SCALE_F32/F64 are VOP3B encoded so they can't use the ABS src modifier, but they can still use NEG and the usual output modifiers. This partially reverts `3b99f12a4e` "AMDGPU: Remove modifiers from v_div_scale_*". Differential Revision: https://reviews.llvm.org/D90296	2020-10-28 21:54:14 +00:00
Stanislav Mekhanoshin	038d884a50	[AMDGPU] Use flat scratch instructions where available The support is disabled by default. So far there is instruction selection, spilling, and frame elimination. It also changes SP from unswizzled to swizzled as used by flat scratch instructions, so it cannot be mixed with MUBUF stack access. At the very least missing: - GlobalISel; - Some optimizations in frame elimination in between vector and scalar ALU; - It shall finally allow to always materialize frame index as an SGPR, but that is not implemented and frame elimination cannot handle it yet; - Unaligned and/or multidword flat scratch shall work, but it is legalized now for MUBUF; - Operand folding cannot optimize FI like with MUBUF yet; - It will need scaling the value of the SP/FP in the DWARF expression to recover the unswizzled scratch address; Differential Revision: https://reviews.llvm.org/D89170	2020-10-26 14:40:42 -07:00
Piotr Sobczak	c872faf6e0	[AMDGPU] Do not generate S_CMP_LG_U64 on gfx7 S_CMP_LG_U64 was added in gfx8 and is guarded by hasScalarCompareEq64(). Rewrite S_CMP_LG_U64 to S_OR_B32 + S_CMP_LG_U32 for targets that do not support 64-bit scalar compare. Differential Revision: https://reviews.llvm.org/D89536	2020-10-19 14:44:31 +02:00
Jay Foad	b59d8d7c72	[AMDGPU][GlobalISel] Compute known bits for zero-extending loads Implement computeKnownBitsForTargetInstr for G_AMDGPU_BUFFER_LOAD_UBYTE and G_AMDGPU_BUFFER_LOAD_USHORT. This allows generic combines to remove some unnecessary G_ANDs. Differential Revision: https://reviews.llvm.org/D89316	2020-10-13 16:22:00 +01:00
Sebastian Neubauer	f53b43c00a	[AMDGPU] Use isLegalMUBUFImmOffset more Instead of hardcoding isUInt<12>. Differential Revision: https://reviews.llvm.org/D88961	2020-10-08 14:31:44 +02:00
Mirko Brkusanin	7c88d13fd1	[AMDGPU] Prefer SplitVectorLoad/Store over expandUnalignedLoad/Store ExpandUnalignedLoad/Store can sometimes produce unnecessary copies to temporary stack slot. We should prefer splitting vectors if possible. Differential Revision: https://reviews.llvm.org/D88882	2020-10-08 10:17:15 +02:00
Rodrigo Dominguez	f71f5f39f6	[AMDGPU] Implement hardware bug workaround for image instructions Summary: This implements a workaround for a hardware bug in gfx8 and gfx9, where register usage is not estimated correctly for image_store and image_gather4 instructions when D16 is used. Change-Id: I4e30744da6796acac53a9b5ad37ac1c2035c8899 Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D81172	2020-10-07 07:39:52 -04:00
Sebastian Neubauer	6a089ce0e4	[AMDGPU] Use tablegen for argument indices Use tablegen generic tables to get the index of image intrinsic arguments. Before, the computation of which image intrinsic argument is at which index was scattered in a few places, tablegen, the SDag instruction selection and GlobalISel. This patch changes that, so only tablegen contains code to compute indices and the ImageDimIntrinsicInfo table provides these information. Differential Revision: https://reviews.llvm.org/D86270	2020-10-05 11:50:52 +02:00
Mirko Brkusanin	8b08fa0103	Revert "[AMDGPU] Reorganize GCN subtarget features for unaligned access" This reverts commit `f5cd7ec9f3`. Certain rocPRIM/rocThrust/hipCUB tests were failing because of this change.	2020-09-29 15:33:34 +02:00
Jay Foad	d3a8e333ec	[AMDGPU] Reformat SITargetLowering::isSDNodeSourceOfDivergence. NFC.	2020-09-28 14:42:05 +01:00
Sebastian Neubauer	6f7cd16d29	[AMDGPU] Fix v3f16 handling for getresinfo v3f32 should not be expanded to v4f32. getresinfo with a dmask of 7 created an image sample with a v3f32 return value, which was bitcasted to a v4f32 in constructRetValue. Differential Revision: https://reviews.llvm.org/D88206	2020-09-24 16:03:02 +02:00
Matt Arsenault	af0207f2ba	AMDGPU: Check global FP atomics match default FP mode We would always select global FP atomics from atomicrmw fadd, although they have a hardcoded FP mode.	2020-09-23 09:07:50 -04:00
Matt Arsenault	6daddc213f	AMDGPU: Don't add frame register to frame pseudos We no longer treat the frame register like a function argument, so the problem this avoided is no longer relevant.	2020-09-21 16:18:47 -04:00
Matt Arsenault	3105d0f84b	CodeGen: Move split block utility to MachineBasicBlock AMDGPU needs this in several places, so consolidate them here.	2020-09-18 14:05:18 -04:00
Matt Arsenault	27df165270	Revert "[amdgpu] Lower SGPR-to-VGPR copy in the final phase of ISel." This reverts commit `c3492a1aa1`. I think this is the wrong strategy and wrong place to do this transform anyway. Also reverts follow up commit `7d593d0d69`.	2020-09-18 09:48:33 -04:00
Mirko Brkusanin	ae36c02ad0	[AMDGPU] Set DS alignment requirements to be more strict Alignment requirements for ds_read/write_b96/b128 for gfx9 and onward are now the same as for other GCN subtargets. This way we can avoid any unintentional use of these instructions on systems that do not support dword alignment and instead require natural alignment. This also makes 'SH_MEM_CONFIG.alignment_mode == STRICT' the default. Differential Revision: https://reviews.llvm.org/D87821	2020-09-18 15:26:24 +02:00
Bogdan Graur	7d593d0d69	[amdgpu] Compilation fix for Release Reviewed By: bkramer Differential Revision: https://reviews.llvm.org/D87838	2020-09-17 18:04:53 +02:00
Michael Liao	c3492a1aa1	[amdgpu] Lower SGPR-to-VGPR copy in the final phase of ISel. - Need to lower COPY from SGPR to VGPR to a real instruction as the standard COPY is used where the source and destination are from the same register bank so that we potentially coalesc them together and save one COPY. Considering that, backend optimizations, such as CSE, won't handle them. However, the copy from SGPR to VGPR always needs materializing to a native instruction, it should be lowered into a real one before other backend optimizations. Differential Revision: https://reviews.llvm.org/D87556	2020-09-17 11:04:17 -04:00
alex-t	0efbb70b71	[AMDGPU] should expand ROTL i16 to shifts. Instruction combining pass turns library rotl implementation to llvm.fshl.i16. In the selection dag the intrinsic is turned to ISD::ROTL node that cannot be selected. Need to expand it to shifts again. Reviewed By: rampitec, arsenm Differential Revision: https://reviews.llvm.org/D87618	2020-09-17 17:34:33 +03:00
Stanislav Mekhanoshin	91f503c3af	[AMDGPU] gfx1030 RT support Differential Revision: https://reviews.llvm.org/D87782	2020-09-16 11:40:58 -07:00
Matt Arsenault	71131db689	AMDGPU: Improve <2 x i24> arguments and return value handling This was asserting for GlobalISel. For SelectionDAG, this was passing this on the stack. Instead, scalarize this as if it were a 32-bit vector.	2020-09-16 11:21:56 -04:00
Sebastian Neubauer	833b3b0d3a	[AMDGPU] Add v3f16/v3i16 support to SDag Fix lowering and instruction selection for v3x16 types and enable InstCombine to emit them. This patch only implements it for the selection dag. GlobalISel tests in GlobalISel/llvm.amdgcn.image.load.1d.d16.ll and GlobalISel/llvm.amdgcn.image.store.2d.d16.ll still don't work. Differential Revision: https://reviews.llvm.org/D84420	2020-09-16 17:20:27 +02:00
Jay Foad	90777e2924	[AMDGPU] Enable scheduling around FP MODE-setting instructions Pre-gfx10 all MODE-setting instructions were S_SETREG_B32 which is marked as having unmodeled side effects, which makes the machine scheduler treat it as a barrier. Now that we have proper implicit $mode operands we can use a no-side-effects S_SETREG_B32_mode pseudo instead for setregs that only touch the FP MODE bits, to give the scheduler more freedom. Differential Revision: https://reviews.llvm.org/D87446	2020-09-16 16:10:47 +01:00
Stanislav Mekhanoshin	277de43d88	[AMDGPU] Unify intrinsic ret/nortn interface We have a single noret intrinsic an a lot of special handling around it. Declare it just as any other but do not define rtn instructions itself instead. Differential Revision: https://reviews.llvm.org/D87719	2020-09-15 15:26:42 -07:00
Craig Topper	c193a689b4	[SelectionDAG] Use Align/MaybeAlign in calls to getLoad/getStore/getExtLoad/getTruncStore. The versions that take 'unsigned' will be removed in the future. I tried to use getOriginalAlign instead of getAlign in some places. getAlign factors in the minimum alignment implied by the offset in the pointer info. Since we're also passing the pointer info we can use the original alignment. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D87592	2020-09-14 13:54:50 -07:00
Jay Foad	649bde488c	[AMDGPU] Simplify S_SETREG_B32 case in EmitInstrWithCustomInserter NFC.	2020-09-09 15:18:31 +01:00
Mirko Brkusanin	43af2a6faa	[AMDGPU] Workaround for LDS Misalignment bug on GFX10 Add subtarget feature check to avoid using ds_read/write_b96/128 with too low alignment if a bug is present on that specific hardware. Add this "feature" to GFX 10.1.1 as it is also affected. Add global-isel test.	2020-09-09 11:46:09 +02:00
Simon Pilgrim	1673a08044	SelectionDAG.h - remove unnecessary FunctionLoweringInfo.h include. NFCI. Use forward declarations and move the include down to dependent files that actually use it. This also exposes a number of implicit dependencies on KnownBits.h	2020-09-03 18:33:25 +01:00
Jay Foad	4bdab2e86a	[AMDGPU] Fix offset for REL32_HI relocs The addend in a REL32 reloc needs to be adjusted to account for the offset from the PC value returned by the s_getpc instruction to the point where the reloc is applied. This was being done correctly for (GOTPC)REL32_LO but not for (GOTPC)REL32_HI. This will only make a difference if the target symbol happens to get loaded almost exactly a multiple of 4G away from the relocated instructions. Differential Revision: https://reviews.llvm.org/D86938	2020-09-02 10:55:55 +01:00
Matt Arsenault	af1c1e20f4	AMDGPU/GlobalISel: Implement computeKnownBits for groupstaticsize	2020-08-27 19:39:44 -04:00
Matt Arsenault	9d3dc276a6	AMDGPU: Fix broken switch braces	2020-08-27 19:39:39 -04:00
Matt Arsenault	70cd9f5b77	AMDGPU/GlobalISel: Start implementing computeKnownBitsForTargetInstr Handle workitem intrinsics. There isn't really away to adequately test this right now, since none of the known bits users are fine grained enough to test the edge conditions. This triggers a number of instances of the new 64-bit to 32-bit shift combine in the existing tests.	2020-08-24 09:53:27 -04:00
Matt Arsenault	e1644a3779	GlobalISel: Reduce G_SHL width if source is extension shl ([sza]ext x, y) => zext (shl x, y). Turns expensive 64 bit shifts into 32 bit if it does not overflow the source type: This is a port of an AMDGPU DAG combine added in `5fa289f0d8`. InstCombine does this already, but we need to do it again here to apply it to shifts introduced for lowered getelementptrs. This will help matching addressing modes that use 32-bit offsets in a future patch. TableGen annoyingly assumes only a single match data operand, so introduce a reusable struct. However, this still requires defining a separate GIMatchData for every combine which is still annoying. Adds a morally equivalent function to the existing getShiftAmountTy. Without this, we would have to do try to repeatedly query the legalizer info and guess at what type to use for the shift.	2020-08-24 09:42:40 -04:00
Mirko Brkusanin	0654ff703d	[AMDGPU] Use ds_read/write_b96/b128 when possible for SDag Do not break down local loads and stores so ds_read/write_b96/b128 in ISelLowering can be selected on subtargets that support them and if align requirements allow them. Differential Revision: https://reviews.llvm.org/D84403	2020-08-21 12:26:31 +02:00
Mirko Brkusanin	f5cd7ec9f3	[AMDGPU] Reorganize GCN subtarget features for unaligned access Features UnalignedBufferAccess and UnalignedDSAccess are now used to determine whether hardware supports such access. UnalignedAccessMode should be used to enable them. hasUnalignedBufferAccessEnabled() and hasUnalignedDSAccessEnabled() can be now used to quickly check both. Differential Revision: https://reviews.llvm.org/D84522	2020-08-21 12:26:31 +02:00
Mirko Brkusanin	5bd1febe21	[AMDGPU] Fix alignment requirements for 96bit and 128bit local loads and stores Adjust alignment requirements for ds_read/write_b96/b128. GFX9 and onwards allow misaligned access for reads and writes but only if SH_MEM_CONFIG.alignment_mode allows it. UnalignedDSAccess is set on GCN subtargets from GFX9 onward to let us know if we can relax alignment requirements. UnalignedAccessMode acts similary to UnalignedBufferAccess for DS instructions but only from GFX9 onward and is supposed to match alignment_mode. By default alignment of 4 is required. Differential Revision: https://reviews.llvm.org/D82788	2020-08-21 12:26:31 +02:00
Jay Foad	98de0d22f5	[AMDGPU] Apply llvm-prefer-register-over-unsigned from clang-tidy	2020-08-21 10:14:35 +01:00
Michael Liao	5257a60ee0	[amdgpu] Add codegen support for HIP dynamic shared memory. Summary: - HIP uses an unsized extern array `extern __shared__ T s[]` to declare the dynamic shared memory, which size is not known at the compile time. Reviewers: arsenm, yaxunl, kpyzhov, b-sumner Subscribers: kzhuravl, jvesely, wdng, nhaehnle, dstuttard, tpr, t-tye, hiraditya, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D82496	2020-08-20 21:29:18 -04:00
Jay Foad	3497860203	[AMDGPU] Remove uses of Register::isPhysicalRegister/isVirtualRegister ... in favour of the isPhysical/isVirtual methods.	2020-08-20 17:59:11 +01:00
Matt Arsenault	e14474a39a	AMDGPU/GlobalISel: Select llvm.amdgcn.global.atomic.fadd Remove the intermediate transform in the DAG path. I believe this is the last non-deprecated intrinsic that needs handling.	2020-08-12 10:04:53 -04:00
Matt Arsenault	701228c411	AMDGPU: Handle intrinsics in performMemSDNodeCombine This avoids a possible regression in a future patch	2020-08-12 10:04:53 -04:00
Kerry McLaughlin	85c7e89f3b	[CodeGen] Refactor getMemBasePlusOffset & getObjectPtrOffset to accept a TypeSize Changes the Offset arguments to both functions from int64_t to TypeSize & updates all uses of the functions to create the offset using TypeSize::Fixed() Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D85220	2020-08-11 12:17:10 +01:00
Matt Arsenault	6fe6b29c29	AMDGPU: Fix assertion in performSHLPtrCombine for 64-bit pointers	2020-08-10 13:46:52 -04:00
Matt Arsenault	3c0597a9e4	AMDGPU: Avoid explicitly listing all the memory nodes	2020-08-07 19:22:46 -04:00
Matt Arsenault	1a0c0944c6	AMDGPU: Define raw/struct variants of buffer atomic fadd Somehow the new FP atomic buffer intrinsics ended up using the legacy style for buffer intrinsics.	2020-08-06 13:36:19 -04:00
Matt Arsenault	d188a608bd	AMDGPU: Fix code duplication between the selectors Not sure this is the right place for this helper.	2020-08-06 10:42:15 -04:00
Matt Arsenault	6c7f640bf7	AMDGPU/GlobalISel: Implement LLT version of allowsMisalignedMemoryAccesses	2020-08-06 09:50:36 -04:00
Matt Arsenault	0ee1eba581	AMDGPU: Remove ATOMIC_PK_FADD The f32 and v2f16 cases should be handled the same way.	2020-08-05 22:00:52 -04:00
Matt Arsenault	83eaf5d55d	AMDGPU: Eliminate BUFFER_ATOMIC_PK_ADD_F16 node This is redundant with the other no return buffer atomic node, and we don't really need a separate type profile for it.	2020-08-05 15:16:51 -04:00
Matt Arsenault	43c0c9252a	AMDGPU: Refactor buffer atomic intrinsic lowering Move raw/struct buffer atomic lowering to separate functions. This avoids a long nested switch, and simplifies a future patch.	2020-08-05 14:44:55 -04:00
Matt Arsenault	57bd64ff84	Support addrspacecast initializers with isNoopAddrSpaceCast Moves isNoopAddrSpaceCast to the TargetMachine. It logically belongs with the DataLayout.	2020-07-31 10:42:43 -04:00
Stanislav Mekhanoshin	5b32518f96	[AMDGPU] Do not use undef on indirect source We are using undef on the indirect move source subreg and then using implicit super-reg. This creates a problem in RA when Greedy decides to split the register. It reassigns the implicit super-reg but does not bother to change undef source because it is really does not matter. The fix is to stop lying to RA and drop undef flag. This has also hit a problem in SIFoldOperands as it can fold immediate into an indirect move since there is no undef flag anymore. That results in multiple test failures, so added the check for this case. Differential Revision: https://reviews.llvm.org/D84899	2020-07-30 10:41:59 -07:00
Matt Arsenault	c230965ccf	AMDGPU: Make saturating add/sub legal for DAG path	2020-07-29 08:27:31 -04:00
Matt Arsenault	cdd45d5f9c	AMDGPU/GlobalISel: Select llvm.amdgcn.global.atomic.csub Remove the custom node boilerplate. Not sure why this tried to handle the LDS atomic stuff.	2020-07-29 08:27:31 -04:00
Matt Arsenault	8860daf0ed	AMDGPU: Handle a few missing cases in getAddrModeArguments	2020-07-28 20:22:38 -04:00
Matt Arsenault	8b81d0633f	AMDGPU: global_atomic_csub is not always dereferenceable	2020-07-27 18:47:39 -04:00
Sebastian Neubauer	896679733d	[AMDGPU] Fix typo. NFC	2020-07-23 17:01:12 +02:00
Matt Arsenault	1168119c2f	AMDGPU: Start interpreting byref on kernel arguments These are treated identically to value aggregates placed in the kernel argument list. A %struct.foo or %struct.foo addrspace(4)* byref(sizeof(%struct.foo)) align(alignof(%struct.foo)) argument should produce the same offsets and argument metadata. This handles all 3 kernel ABI implementations, and the two HSA metadata emission paths.	2020-07-21 18:11:22 -04:00
Matt Arsenault	994fb86bc2	AMDGPU: Fix promoting f16 fpowi with legal f16	2020-07-17 11:29:05 -04:00
Matt Arsenault	79f67cae91	AMDGPU: Rename add/sub with carry out instructions The hardware has created a real mess in the naming for add/sub, which have been renamed basically every generation. Switch the carry out pseudos to have the gfx9/gfx10 names. We were using the original SI/CI v_add_i32/v_sub_i32 names. Later targets reintroduced these names as carryless instructions with a saturating clamp bit, which we do not define. Do this rename so we can unambiguously add these missing instructions. The carry-in versions should also be renamed, but at least those had a consistent _u32 name to begin with. The 16-bit instructions were also renamed, but aren't ambiguous. This does regress assembler error message quality in some cases. In mismatched wave32/wave64 situations, this will switch from "unsupported instruction" to "invalid operand", with the error pointing at the wrong position. I couldn't quite follow how the assembler selects these, but the previous behavior seemed accidental to me. It looked like there was a partial attempt to handle this which was never completed (i.e. there is an AMDGPUOperand::isBoolReg but it isn't used for anything).	2020-07-16 13:16:30 -04:00
dstuttar	69a89b54c6	[NFC] Change isFPPredicate comparison to ignore lower bound Summary: Since changing the Predicate to be an unsigned enum, the lower bound check for isFPPredicate no longer needs to check the lower bound, since it will always evaluate to true. Also fixed a similar issue in SIISelLowering.cpp by removing the need for comparing to FIRST and LAST predicates Added an assert to the isFPPredicate comparison to flag if the FIRST_FCMP_PREDICATE is ever changed to anything other than 0, in which case the logic will break. Without this change warnings are generated in VS. Change-Id: I358f0daf28c0628c7bda8ad4cab4e1757b761bab Subscribers: arsenm, jvesely, nhaehnle, hiraditya, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D83540	2020-07-10 11:57:20 +01:00
Carl Ritson	560292fa99	[AMDGPU] Update isFMAFasterThanFMulAndFAdd assumptions MAD/MAC is no longer always available. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D83207	2020-07-07 15:40:44 +09:00
Stanislav Mekhanoshin	f7a7efbf88	[AMDGPU] Tweak getTypeLegalizationCost() Even though wide vectors are legal they still cost more as we will have to eventually split them. Not all operations can be uniformly done on vector types. Conservatively add the cost of splitting at least to 8 dwords, which is our widest possible load. We are more or less lying to cost mode with this change but this can prevent vectorizer from creation of wide vectors which results in RA problems for us. Differential Revision: https://reviews.llvm.org/D83078	2020-07-06 14:07:48 -07:00
Matt Arsenault	f25d020c2e	AMDGPU/GlobalISel: Add types to special inputs When passing special ABI inputs, we have no existing context for the type to use.	2020-07-06 17:00:55 -04:00
Matt Arsenault	c19c153e74	AMDGPU: Don't ignore carry out user when expanding add_co_pseudo This was resulting in a missing vreg def in the use select instruction. The output of the pseudo doesn't make sense, since it really shouldn't have the vreg output in the first place, and instead an implicit scc def to match the real scalar behavior. We could have easier to understand tests if we selected scalar versions of the [us]{add\|sub}.with.overflow intrinsics. This does still end up producing vector code in the end, since it gets moved later.	2020-07-06 14:28:01 -04:00
Guillaume Chatelet	87e2751cf0	[Alignment][NFC] Use proper getter to retrieve alignment from ConstantInt and ConstantSDNode This patch is part of a series to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790 Differential Revision: https://reviews.llvm.org/D83082	2020-07-03 08:06:43 +00:00
Guillaume Chatelet	3587c9c427	[NFC] Use ADT/Bitfields in Instructions This is an example patch for D81580. Differential Revision: https://reviews.llvm.org/D81662	2020-07-03 07:20:22 +00:00
Dmitry Preobrazhensky	1c9d681092	[AMDGPU][CODEGEN] Added support of new inline assembler constraints Added support for constraints 'I', 'J', 'B', 'C', 'DA', 'DB'. See https://gcc.gnu.org/onlinedocs/gcc/Machine-Constraints.html#Machine-Constraints. Reviewers: arsenm, rampitec Differential Revision: https://reviews.llvm.org/D81651	2020-07-02 17:20:15 +03:00
Guillaume Chatelet	52911428ef	[Alignment][NFC] Migrate AMDGPU backend to Align This patch is part of a series to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790 Differential Revision: https://reviews.llvm.org/D82743	2020-06-29 11:56:06 +00:00
Michael Liao	20a1700293	[amdgpu] Fix REL32 relocations with negative offsets. Summary: - The offset should be treated as a signed one. Reviewers: rampitec, arsenm Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D82234	2020-06-21 23:09:03 -04:00
Matt Arsenault	95605b784b	AMDGPU/GlobalISel: Implement computeKnownAlignForTargetInstr We probably need to move where intrinsics are lowered to copies to make this useful.	2020-06-18 17:28:00 -04:00
Matt Arsenault	c5c58fd6b5	AMDGPU: Remove intermediate DAG node for trig_preop intrinsic We weren't doing anything with this, and keeping it would just add more boilerplate for GlobalISel.	2020-06-16 21:06:25 -04:00
Stanislav Mekhanoshin	9ee272f13d	[AMDGPU] Add gfx1030 target Differential Revision: https://reviews.llvm.org/D81886	2020-06-15 16:18:05 -07:00
Michael Liao	ec02635d10	[amdgpu] Skip OR combining on 64-bit integer before legalizing ops. Reviewers: arsenm, rampitec Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D81710	2020-06-12 15:22:38 -04:00
Sebastian Neubauer	29a6ad94fd	[AMDGPU] Add G16 support to image instructions Add G16 feature for GFX10 and support A16 and G16 in GlobalISel. Differential Revision: https://reviews.llvm.org/D76836	2020-06-12 11:26:31 +02:00
Stanislav Mekhanoshin	295d1fe733	[AMDGPU] Custom lowering of i64 umulo/smulo Differential Revision: https://reviews.llvm.org/D81430	2020-06-08 23:14:19 -07:00
Guillaume Chatelet	1778564f91	[Alignment][NFC] Migrate the rest of backends Summary: This is a followup on D81196 Reviewers: courbet Subscribers: arsenm, dschuff, jyknight, dylanmckay, sdardis, nemanjai, jvesely, nhaehnle, sbc100, jgravelle-google, hiraditya, aheejin, kbarton, fedor.sergeev, asb, rbar, johnrusso, simoncook, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, PkmX, jocewei, Jim, lenary, s.egerton, pzheng, sameer.abuasal, apazos, luismarques, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D81278	2020-06-08 07:17:20 +00:00
Stanislav Mekhanoshin	5d62606f90	AMDGPU/GlobalISel: cmp/select method for extract element Differential Revision: https://reviews.llvm.org/D80749	2020-06-05 12:57:40 -07:00
Matt Arsenault	af867b7850	DAG: Change computeKnownBitsForFrameIndex to be usable by GISel This wasn't getting much value from the DAG or depth arguments, since it's only called on the frame index root nodes. FrameIndexes can also only return a scalar value, so it also didn't need DemandedElts.	2020-06-04 10:50:26 -04:00
Matt Arsenault	89d48ccabe	AMDGPU: Fix not emitting nofpexcept on fdiv expansion In this awkward case, we have to emit custom pseudo-constrained FP wrappers. InstrEmitter concludes that since a mayRaiseFPException instruction had a chain, it can't add nofpexcept. Test deferred until mayRaiseFPException is really set on everything.	2020-06-01 14:10:26 -04:00
Matt Arsenault	7ad36491ca	AMDGPU: Fix alignment for dynamic allocas The alignment value also needs to be scaled by the wave size.	2020-06-01 13:06:37 -04:00
Jay Foad	2768edfff1	[AMDGPU] Propagate fast-math flags when lowering FSIN and FCOS Differential Revision: https://reviews.llvm.org/D80813	2020-05-31 05:21:55 +01:00
Changpeng Fang	234eba90f4	AMDGPU: Add setTruncStoreAction for vector i64 types made legal recently Reviewers: rampitec, arsenm Differential Revision: https://reviews.llvm.org/D80853	2020-05-30 20:45:27 -07:00
Matt Arsenault	0892a96a05	AMDGPU: Optimize s_setreg_b32 to s_denorm_mode/s_round_mode This is a custom inserter because it was less work than teaching tablegen a way to indicate that it is sometimes OK to have a no side effect instruction in the output of a side effecting pattern. The asm is needed to look like a read of the mode register to prevent it from being deleted. However, there seems to be a bug where the mode register def instructions are moved across the asm sideeffect by the post-RA scheduler. Another oddity is the immediate is formatted differently between s_denorm_mode and s_round_mode.	2020-05-29 21:11:36 -04:00
Matt Arsenault	f012c58abd	AMDGPU: Move MIMG MMO check to verifier	2020-05-29 20:58:23 -04:00
Jay Foad	b28d038ff3	[AMDGPU] Better use of llvm::numbers Tweak a few constant expressions involving numbers::pi etc to avoid rounding errors. NFCI though it's possible some of these will now be more accurate in the last bit.	2020-05-29 09:55:36 +01:00
Jay Foad	036d4b0dbf	[AMDGPU] Use numbers::pi instead of M_PI. NFC.	2020-05-29 09:55:36 +01:00
Matt Arsenault	97f3f0bab0	AMDGPU: Add intrinsic for s_setreg This will be more useful with fenv access implemented.	2020-05-28 14:26:38 -04:00
Matt Arsenault	5e007fe998	AMDGPU: Support non-entry block static sized allocas OpenMP emits these for some reason, so handle them. Assume these use 4096 bytes by default, with a flag to override this. Also change the related stack assumption for calls to have a flag.	2020-05-27 18:46:10 -04:00
Matt Arsenault	d37ce53ad3	AMDGPU: Set StackPointerRegisterToSaveRestore This will enable selecting non-entry block allocas. Skip the SP write check in the base isSchedulingBoundary implementation to preserve the previous scheduling behavior and avoid test churn. It's apparently for compile time reasons, but if we were to use this more work would be needed since in some of the failing tests, we seem to incorrectly get hazard nops inserted.	2020-05-27 13:44:05 -04:00
Matt Arsenault	e09064e97f	AMDGPU: Update store node checks for atomics Prepare to switch to using StoreSDNode for atomic stores.	2020-05-26 15:20:03 -04:00
Matt Arsenault	9786e7552d	Revert "[AMDGPU] NFC target dependent requiresUniformRegister refactored out" This reverts commit `fb38b98338`. This will regress compile time.	2020-05-26 12:58:18 -04:00
alex-t	fb38b98338	[AMDGPU] NFC target dependent requiresUniformRegister refactored out Summary: Target specific method encapsulated into the Target Lowering Info. Reviewers: rampitec, vpykhtin Reviewed By: rampitec Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D70085	2020-05-26 19:49:20 +03:00
Dmitry Preobrazhensky	b087b91c91	[AMDGPU][CODEGEN] Added 'A' constraint for inline assembler Summary: 'A' constraint requires an immediate int or fp constant that can be inlined in an instruction encoding. Reviewers: arsenm, rampitec Differential Revision: https://reviews.llvm.org/D78494	2020-05-25 14:23:34 +03:00
Matt Arsenault	66fe60220c	AMDGPU/GlobalISel: Fix masked control flow with fallthrough blocks Unlike SelectionDAGBuilder, IRTranslator omits the unconditional branch in fallthrough cases. Confusingly, the control flow pseudos function in the opposite way the intrinsics are used, and the branch targets always need to be swapped. We're inverting the target blocks, so we need to figure out the old fallthrough block and insert a branch to the original unconditional branch target.	2020-05-22 10:31:44 -04:00
Stanislav Mekhanoshin	1dfd1b3e4b	[AMDGPU] Tune threshold for cmp/select vector lowering It was set in total vector size while the idea was to limit a number of instructions. Now it started to work with doubles and thresholds needs to be updated. Differential Revision: https://reviews.llvm.org/D80322	2020-05-21 08:59:35 -07:00
Stanislav Mekhanoshin	4eecf17164	[AMDGPU] Always expand ext/insertelement with divergent idx Even though series of cmd/cndmask can produce quite a lot of code that is still better than a loop. In case of doubles we would even produce two loops. Differential Revision: https://reviews.llvm.org/D80032	2020-05-20 15:51:29 -07:00
Matt Arsenault	074b802654	AMDGPU: Fix DAG divergence for implicit function arguments This should be directly implied from the register class, and there's no need to special case live ins here. This was getting the wrong answer for the queue ptr argument in callable functions, since it's not an explicit IR argument and is always uniform. Fixes not using scalar loads for the aperture in addrspacecast lowering, and any other places that use implicit SGPR arguments.	2020-05-19 18:11:34 -04:00

1 2 3 4 5 ...

925 Commits