llvm-project

Commit Graph

Author	SHA1	Message	Date
Matt Arsenault	a162048a47	AMDGPU/GlobalISel: Fix fixed ABI special VGPR function arguments I forgot to copy the new fixed function ABI into GlobalISel, so this was mismatched with the DAG compiled calling function. This was allocating part of the argument list to v31, which was supposed to be reserved for the workitem IDs.	2020-06-23 21:21:35 -04:00
Eli Friedman	e9d4e34ab8	[AArch64][SVE] Add legalization support for i32/i64 vector srem/urem Implement them on top of sdiv/udiv, similar to what we do for integer types. Potential future work: implementing i8/i16 srem/urem, optimizations for constant divisors, optimizing the mul+sub to mls. Differential Revision: https://reviews.llvm.org/D81511	2020-06-23 16:27:52 -07:00
Your Name	cc9d693856	[AMDGPU/MemOpsCluster] Implement new heuristic for computing max mem ops cluster size Summary: Make use of both the - (1) clustered bytes and (2) cluster length, to decide on the max number of mem ops that can be clustered. On an average, when loads are dword or smaller, consider `5` as max threshold, otherwise `4`. This heuristic is purely based on different experimentation conducted, and there is no analytical logic here. Reviewers: foad, rampitec, arsenm, vpykhtin Reviewed By: rampitec Subscribers: llvm-commits, kerbowa, hiraditya, t-tye, Anastasia, tpr, dstuttard, yaxunl, nhaehnle, wdng, jvesely, kzhuravl, thakis Tags: #llvm Differential Revision: https://reviews.llvm.org/D82393	2020-06-24 00:39:41 +05:30
Matt Arsenault	db777eaea3	AMDGPU/GlobalISel: Fix asserts on non-s32 sitofp/uitofp sources The combine to form cvt_f32_ubyte0 was assuming the source type was always 32-bit, but this needs to tolerate any legal source type.	2020-06-23 10:00:35 -04:00
hsmahesha	5832950adb	[AMDGPU/MemOpsCluster] Compute `width` for `MIMG` instruction class. Summary: `width` computation is missing for newly added `MIMG` instruction class. Add it. Reviewers: foad, rampitec, arsenm Reviewed By: foad Subscribers: MatzeB, javed.absar, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D81649	2020-06-23 17:32:17 +05:30
Michael Liao	b1360caa82	[SDAG] Add new AssertAlign ISD node. Summary: - AssertAlign node records the guaranteed alignment on its source node, where these alignments are retrieved from alignment attributes in LLVM IR. These tracked alignments could help DAG combining and lowering generating efficient code. - In this patch, the basic support of AssertAlign node is added. So far, we only generate AssertAlign nodes on return values from intrinsic calls. - Addressing selection in AMDGPU is revised accordingly to capture the new (base + offset) patterns. Reviewers: arsenm, bogner Subscribers: jvesely, wdng, nhaehnle, tpr, hiraditya, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D81711	2020-06-23 00:51:11 -04:00
Jay Foad	9761d3cf9c	[AMDGPU] Update more live intervals in SIWholeQuadMode This fixes various assertion failures that would otherwise be triggered by a later patch to move SIWholeQuadMode later in the pass pipeline. Differential Revision: https://reviews.llvm.org/D82190	2020-06-22 13:50:15 +01:00
Tim Corringham	96ecead5a2	[AMDGPU] clang-format of SIModeRegister.cpp Ran clang-format just to ease future reviews. No functional changes.	2020-06-22 13:31:52 +01:00
Michael Liao	20a1700293	[amdgpu] Fix REL32 relocations with negative offsets. Summary: - The offset should be treated as a signed one. Reviewers: rampitec, arsenm Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D82234	2020-06-21 23:09:03 -04:00
Eric Christopher	cf23852587	[Target] As part of using inclusive language within the llvm project, migrate away from the use of blacklist and whitelist. This change affects an internal llvm command line option.	2020-06-20 00:06:39 -07:00
Carl Ritson	4a7de36afc	[AMDGPU] Avoid use of V_READLANE into EXEC in SGPR spills Always prefer to clobber input SGPRs and restore them after the spill. This applies to both spills to VGPRs and scratch. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D81914	2020-06-20 12:10:47 +09:00
Stanislav Mekhanoshin	2b87a44c49	[AMDGPU] Some formatting fixes. NFC.	2020-06-19 09:02:59 -07:00
Piotr Sobczak	6d9565d6d5	Revert "[AMDGPU] Select s_cselect" This caused some failures detected by the buildbot with expensive checks enabled. This reverts commit `4067de569f`.	2020-06-19 16:41:04 +02:00
dfukalov	129388ddc4	[AMDGPU][CostModel] Add fneg cost estimation Summary: The estimation uses AMDGPUTargetLowering::isFNegFree() Reviewers: rampitec Reviewed By: rampitec Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D82065	2020-06-19 17:31:35 +03:00
Piotr Sobczak	4067de569f	[AMDGPU] Select s_cselect Summary: Add patterns to select s_cselect in the isel. Handle more cases of implicit SCC accesses in si-fix-sgpr-copies to allow new patterns to work. Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, asbirlea, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D81925	2020-06-19 16:17:46 +02:00
Carl Ritson	8f3b2c8aa3	AMDGPU/GlobalISel: Remove selection of MAD/MAC when not available Add code to respect mad-mac-f32-insts target feature. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D81990	2020-06-19 10:30:19 +09:00
Matt Arsenault	95605b784b	AMDGPU/GlobalISel: Implement computeKnownAlignForTargetInstr We probably need to move where intrinsics are lowered to copies to make this useful.	2020-06-18 17:28:00 -04:00
Matt Arsenault	7f8b2e1b91	GlobalISel: Pass LegalizerHelper to custom legalize callbacks This was passing in all the parameters needed to construct a LegalizerHelper in the custom legalization, when it's simpler to just pass in the existing helper. This is slightly more annoying to use in the common case where you don't need the legalizer helper, but we could add back the common parameters back in addition to the helper. I didn't propagate this to all the internal target changes that this logically implies, but did update a sample one for legalizeMinNumMaxNum. This is in preparation for moving AMDGPU load/store legalization entirely into custom lowering. The current set of legalization actions is really constraining and not really capable of expressing all the actions needed to legalize loads/stores. In particular there's no way to express when the memory access itself needs to change size vs. the result type. There's also a lot of redundancy since the same split/widen actions need to be applied in both vector and scalar cases. All of the sub-cases logically belong as steps in the legalizer helper, but it will be easier to consider everything at once in custom lowering.	2020-06-18 17:17:38 -04:00
Matt Arsenault	779cba79ec	AMDGPU: Remove mayLoad/mayStore from some side effecting intrinsics These don't really modify any memory, and should not expect memory operands.	2020-06-18 14:12:19 -04:00
Stanislav Mekhanoshin	6c7e1b16fa	[AMDGPU] Added new encoding to getMCOpcodeGen Nothing breaks yet, but all encodings shall be in the map. Differential Revision: https://reviews.llvm.org/D81974	2020-06-18 10:11:33 -07:00
Matt Arsenault	6f09bb7da2	AMDGPU: Don't pass MachineFunction if only the IR Function is used	2020-06-18 11:06:46 -04:00
Matt Arsenault	5f5f566b26	AMDGPU: Don't use 16-bit FP inline constants in integer operands It seems to be a hardware defect that the half inline constants do not work as expected for the 16-bit integer operations (the inverse does work correctly). Experimentation seems to show these are really reading the 32-bit inline constants, which can be observed by writing inline asm using op_sel to see what's in the high half of the constant. Theoretically we could fold the high halves of the 32-bit constants using op_sel. The *_asm_all.s MC tests are broken, and I don't know where the script to autogenerate these are. I started manually fixing it, but there's just too many cases to fix. This also does break the assembler/disassembler support for these values, and I'm not sure what to do about it. These are still valid encodings, so it seems like you should be able to use them in some way. If you wrote assembly using them, you could have really meant it (perhaps to read the high bits with op_sel?). The disassembler will print the invalid literal constant which will fail to re-assemble. The behavior is also different depending on the use context. Consider this example, which was previously accepted and encoded using the inline constant: v_mad_i16 v5, v1, -4.0, v3 ; encoding: [0x05,0x00,0xec,0xd1,0x01,0xef,0x0d,0x04] In contexts where an inline immediate is required (such as on gfx8/9), this will now be rejected. For gfx10, this will produce the literal encoding and change the printed format: v_mad_i16 v5, v1, 0xc400, v3 ; encoding: [0x05,0x00,0x5e,0xd7,0x01,0xff,0x0d,0x04,0x00,0xc4,0x00,0x00] This is just another variation of the issue that we don't perfectly handle round trip assembly/disassembly due to not tracking how immediates were encoded. This doesn't matter much in practice, since compilers don't emit the suboptimal encoding. I doubt any users are relying on this behavior (although I did make use of the old behavior to figure out what was wrong). Fixes bug 46302.	2020-06-17 19:14:10 -04:00
Scott Linder	691ff4682f	[AMDGPU] Skip CFIInstructions in SIInsertWaitcnts Summary: CFI emitted during PEI at the beginning of the prologue needs to apply to any inserted waitcnts on function entry. Reviewers: arsenm, t-tye, RamNalamothu Reviewed By: arsenm Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, kerbowa, llvm-commits Tags: #llvm, #debug-info Differential Revision: https://reviews.llvm.org/D76881	2020-06-17 12:41:03 -04:00
vnalamot	2e28009981	[NFC] Move getAll{S,V}GPR{32,128} methods to SIFrameLowering Summary: Future patch needs some of these in multiple places. The definitions of these can't be in the header and be eligible for inlining without making the full declaration of GCNSubtarget visible. I'm not sure what the right trade-off is, but I opted to not bloat SIRegisterInfo.h Reviewers: arsenm, cdevadas Reviewed By: arsenm Subscribers: RamNalamothu, qcolombet, jvesely, wdng, nhaehnle, hiraditya, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D79878	2020-06-17 12:08:09 -04:00
Jay Foad	def2e4c47f	[AMDGPU] Simplify GCNPassConfig::addOptimizedRegAlloc. NFC.	2020-06-17 15:56:15 +01:00
Carl Ritson	ac8a2f132b	[AMDGPU] Fix failure in VCC spilling Spills of VCC (SGPR64) will fail with new SGPR spill code, because super register is not correctly resolved. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D81224	2020-06-17 20:11:15 +09:00
Matt Arsenault	3b34f3fcca	AMDGPU/GlobalISel: Fix obvious bug in ported 32-bit udiv/urem This was hidden by the IR expansion in AMDGPUCodeGenPrepare, which I forgot to turn off.	2020-06-16 22:46:35 -04:00
Matt Arsenault	c5c58fd6b5	AMDGPU: Remove intermediate DAG node for trig_preop intrinsic We weren't doing anything with this, and keeping it would just add more boilerplate for GlobalISel.	2020-06-16 21:06:25 -04:00
Daniel Sanders	e35ba09961	[gicombiner] Allow generated combiners to store additional members Summary: Adds the ability to add members to a generated combiner via a State base class. In the current AArch64PreLegalizerCombiner this is used to make Helper available without having to provide it to every call. As part of this, split the command line processing into a separate object so that it still only runs once even though the generated combiner is constructed more frequently. Depends on D81862 Reviewers: aditya_nandakumar, bogner, volkan, aemerson, paquette, arsenm Reviewed By: arsenm Subscribers: jvesely, wdng, nhaehnle, kristof.beyls, hiraditya, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D81863	2020-06-16 14:47:04 -07:00
Stanislav Mekhanoshin	3f0c9c1634	Fix ubsan error in tblgen with signed left shift UBSAN complains when tblgen performs SHL of a negative value. Differential Revision: https://reviews.llvm.org/D81952	2020-06-16 11:15:09 -07:00
Stanislav Mekhanoshin	576fa5a50c	[AMDGPU] make ubsan happy with unsigned left shift Fixes UBSAN error after rG9ee272f13d88f090817235ef4f91e56bb2a153d6 A trivial signed/unsigned shift.	2020-06-15 17:21:10 -07:00
Stanislav Mekhanoshin	9ee272f13d	[AMDGPU] Add gfx1030 target Differential Revision: https://reviews.llvm.org/D81886	2020-06-15 16:18:05 -07:00
Matt Arsenault	e07cf92377	AMDGPU/GlobalISel: Don't hardcode maximum register size This is a somewhat artifical limit, so avoid repeating it many places in case it changes.	2020-06-15 15:01:19 -04:00
Matt Arsenault	1a7f115dce	AMDGPU/GlobalISel: Extend load/store workaround to i128 vectors	2020-06-15 14:55:11 -04:00
Matt Arsenault	2ca552322c	AMDGPU/GlobalISel: Fix 8-byte aligned, 96-bit scalar loads These are legal since we can do a 96-bit load on some subtargets, but this is only for vector loads. If we can't widen the load, it needs to be broken down once known scalar. For 16-byte alignment, widen to a 128-bit load.	2020-06-15 11:33:16 -04:00
Matt Arsenault	dae9554b2b	AMDGPU/GlobalISel: Workaround some load/store type selection patterns The logic is written for what loads/stores should be selectable. There are a set of cases that should be selectable, but due to missing MVTs and/or selection patterns, will fail to select. I think eventually load/store select patterns should ignore the type and only look at the value size, but until that happens, bitcast these to equivalent i32 vectors.	2020-06-15 07:42:20 -04:00
Sam Parker	2596da3174	[CostModel] getCFInstrCost in getUserCost. Have BasicTTI call the base implementation so that both agree on the default behaviour, which the default being a cost of '1'. This has required an X86 specific implementation as it seems to be very reliant on those instructions being free. Changes are also made to AMDGPU so that their implementations distinguish between cost kinds, so that the unrolling isn't affected. PowerPC also has its own implementation to prevent changes to the reg-usage vectorizer test. The cost model test changes now reflect that ret instructions are not generally free. Differential Revision: https://reviews.llvm.org/D79164	2020-06-15 09:28:46 +01:00
Sam Parker	321ebfd175	[NFCI][CostModel] Unify FNeg cost Enable TTIImpl::getUserCost to handle FNeg so that getInstructionThroughput can call that instead. This means we can remove the code in the AMDGPU backend too. Differential Revision: https://reviews.llvm.org/D81635	2020-06-15 08:33:04 +01:00
Sam Parker	51541c068a	[CostModel] Unify ExtractElement cost. Move the cost modelling, with the reduction pattern matching, from getInstructionThroughput into generic TTIImpl::getUserCost. The modelling in the AMDGPU backend can now be removed. Differential Revision: https://reviews.llvm.org/D81643	2020-06-15 08:27:14 +01:00
Matt Arsenault	804397dde6	AMDGPU: Do not bundle inline asm Fixes bug 46285	2020-06-14 13:24:50 -04:00
Matt Arsenault	fb51d508ee	AMDGPU/GlobalISel: Select general case for G_PTRMASK	2020-06-14 13:12:29 -04:00
Matt Arsenault	46579471fd	AMDGPU: Fix spill/restore of 192-bit registers I tried to use an IR inline asm test, but that doesn't work since the inline asm handling asserts without an MVT to use.	2020-06-14 13:12:01 -04:00
Michael Liao	ec02635d10	[amdgpu] Skip OR combining on 64-bit integer before legalizing ops. Reviewers: arsenm, rampitec Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D81710	2020-06-12 15:22:38 -04:00
Sebastian Neubauer	29a6ad94fd	[AMDGPU] Add G16 support to image instructions Add G16 feature for GFX10 and support A16 and G16 in GlobalISel. Differential Revision: https://reviews.llvm.org/D76836	2020-06-12 11:26:31 +02:00
Matt Arsenault	7d913becfc	AMDGPU/GlobalISel: Fix select of private <2 x s16> load	2020-06-11 19:25:25 -04:00
Matt Arsenault	27f8bd94cb	AMDGPU/GlobalISel: Fix select of <8 x s64> scalar load	2020-06-11 19:09:43 -04:00
Matt Arsenault	2247072b65	AMDGPU/GlobalISel: Set insert point when emitting control flow pseudos This was implicitly assuming the branch instruction was the next after the pseudo. It's possible for another non-terminator instruction to be inserted between the intrinsic and the branch, so adjust the insertion point. Fixes a non-terminator after terminator verifier error (which without the verifier, manifested itself as an infinite loop in analyzeBranch much later on).	2020-06-11 18:53:26 -04:00
Matt Arsenault	19b3b886b7	AMDGPU/GlobalISel: Fix porting error in 32-bit division The baffling thing is this passed the OpenCL conformance test for 32-bit integer divisions, but only failed in the 32-bit path of BypassSlowDivisions for the 64-bit tests.	2020-06-10 21:48:58 -04:00
Stanislav Mekhanoshin	09d325b20c	AMDGPU/GlobalISel: cmp/select method for insert element Differential Revision: https://reviews.llvm.org/D80754	2020-06-10 13:12:54 -07:00
Stanislav Mekhanoshin	6e1eee6034	[AMDGPU] Fixed promote alloca with ptr/int casts There is an invalid cast produced when a pointee is a pointer and the alloca type is cast to a pointer to int. Differential Revision: https://reviews.llvm.org/D81606	2020-06-10 11:46:57 -07:00
Matt Arsenault	721f8f7530	AMDGPU: Stop using getSelectCC in division lowering This was promoting booleans to i32 to perform a comparison against them to feed to a select condition. Just use the booleans directly. This produces the same final code, since the combiner is unable to undo the mess this creates. I untangled this logic when I ported this code to GlobalISel, so port the cleanups back.	2020-06-10 13:56:53 -04:00
Matt Arsenault	ea1bd95411	AMDGPU/GlobalISel: Make G_IMPLICIT_DEF legality more consistent Makes <6 x s16> legal, <4 x s8> illegal, and clamps the maximum size to 1024.	2020-06-10 11:05:59 -04:00
Sam Parker	09d30cb977	[CostModel] Unify Shuffle and InsertElement Costs Extract the existing code from getInstructionThroughput into TTImpl::getUserCost. The duplicated code in the AMDGPU backend has also been removed. Differential Revision: https://reviews.llvm.org/D81448	2020-06-10 09:13:34 +01:00
Sam Parker	fa8bff0cd1	[CostModel] Unify getArithmeticInstrCost Add the remaining arithmetic opcodes into the generic implementation of getUserCost and then call this from getInstructionThroughput. Most of the backends have been modified to return the base implementation for cost kinds other RecipThroughput. The outlier here is AMDGPU which already uses getArithmeticInstrCost for all the cost kinds. This change means that most of the opcodes can be removed from that backends implementation of getUserCost. Differential Revision: https://reviews.llvm.org/D80992	2020-06-10 09:08:45 +01:00
Matt Arsenault	32823091c3	GlobalISel: Set instr/debugloc before any legalizer action It was annoying enough that every custom lowering needed to set the insert point, but this was made worse since now these all needed to be updated to setInstrAndDebugLoc. Consolidate these so every legalization action has the right insert position by default. This should fix dropping debug info in every custom AMDGPU legalization.	2020-06-09 15:37:02 -04:00
hsmahesha	7410571ce9	Revert "[AMDGPU/MemOpsCluster] Implement new heuristic for computing max mem ops cluster size" This reverts commit `40a632a335`.	2020-06-09 19:27:17 +05:30
hsmahesha	40a632a335	[AMDGPU/MemOpsCluster] Implement new heuristic for computing max mem ops cluster size Summary: Make use of both the - (1) clustered bytes and (2) cluster length, to decide on the max number of mem ops that can be clustered. On an average, when loads are dword or smaller, consider `5` as max threshold, otherwise `4`. This heuristic is purely based on different experimentation conducted, and there is no analytical logic here. Reviewers: foad, rampitec, arsenm, vpykhtin Reviewed By: foad, rampitec Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, Anastasia, t-tye, hiraditya, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D81085	2020-06-09 14:09:14 +05:30
Sameer Sahasrabuddhe	d8f651d3e8	[AMDGPU] Enable structurizer workarounds by default Reviewed By: nhaehnle Differential Revision: https://reviews.llvm.org/D81211	2020-06-09 13:14:15 +05:30
Stanislav Mekhanoshin	295d1fe733	[AMDGPU] Custom lowering of i64 umulo/smulo Differential Revision: https://reviews.llvm.org/D81430	2020-06-08 23:14:19 -07:00
dfukalov	6a31a9a543	[AMDGPU][NFC] Skip processing intrinsics that do not become real instructions Reviewers: rampitec Reviewed By: rampitec Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D81260	2020-06-09 03:45:33 +03:00
Jay Foad	275ecaae16	[AMDGPU] Cluster MIMG instructions Differential Revision: https://reviews.llvm.org/D74035	2020-06-08 14:01:53 +01:00
Guillaume Chatelet	1778564f91	[Alignment][NFC] Migrate the rest of backends Summary: This is a followup on D81196 Reviewers: courbet Subscribers: arsenm, dschuff, jyknight, dylanmckay, sdardis, nemanjai, jvesely, nhaehnle, sbc100, jgravelle-google, hiraditya, aheejin, kbarton, fedor.sergeev, asb, rbar, johnrusso, simoncook, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, PkmX, jocewei, Jim, lenary, s.egerton, pzheng, sameer.abuasal, apazos, luismarques, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D81278	2020-06-08 07:17:20 +00:00
Benjamin Kramer	3badd17b69	SmallPtrSet::find -> SmallPtrSet::count The latter is more readable and more efficient. While there clean up some double lookups. NFCI.	2020-06-07 22:38:08 +02:00
Matt Arsenault	38fb446fc7	AMDGPU/GlobalISel: Fix test failure in release build The annoying behavior where the output is different due to the legality check struck again, plus the subtarget predicate wasn't really correctly set for DS FP atomics. Some of the FP min/max instructions seem to be in the gfx6/gfx7 manuals, but IIRC this might have been one of the cases where the manual got ahead of the actual hardware support, but I've left these as-is for now since the assembler tests seem to expect them.	2020-06-06 11:01:18 -04:00
Matt Arsenault	bc20bdb9f9	AMDGPU/GlobalISel: Start rewriting load/store legality rules The current set is an incomprehensible mess riddled with ordering hacks for various limitations in the legalizer at the time of writing, many of which have been fixed. This takes a very small step in correcting this. The core first change is to start checking for fully legal cases first, rather than trying to figure out all of the actions that could need to be performed. It's recommended to check the legal cases first for faster legality checks in the common case. This still has a table listing some common cases, but it needs measuring whether this really helps or not. More significantly, stop trying to allow any arbitrary type with a legal bitwidth as a legal memory type, and start using the bitcast legalize action for them. Allowing loads of these weird vector types produced new burdens we don't need for handling all of the legalization artifacts. Unlike the SelectionDAG handling, this is still not casting 64 or 16-bit element vectors to 32-bit vectors. These cases should still be handled by increasing/decreasing the number of 16-bit elements. This is primarily to fix 8-bit element vectors. Another change is to stop trying to handle the load-widening based on a higher alignment. We should still do this, but the way it was handled wasn't really correct. We really need to modify the MMO's size at the same time, and not just increase the result type. The LegalizerHelper does not do this, and I think this would really require a separate WidenMemory action (or to add a memory action payload to the LegalizeMutation). These will now fail to legalize. The structure of the legalizer rules makes writing concise rules here difficult. It would be easier if the same function could answer the query the query, and report the action to perform at the same time. Instead these two are split into distinct predicate and action functions. This is mostly tolerable for other cases, but the load/store rules get pretty complicated so it's difficult to keep two versions of these functions in sync.	2020-06-06 09:59:46 -04:00
dfukalov	c94d32a6b3	[AMDGPU] Increase max iterations count to analyze complete unroll Summary: In some cases inner loops may not get boosts so try to analyze them deeper. Reviewers: rampitec, mzolotukhin Reviewed By: rampitec Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, zzheng, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D81204	2020-06-06 16:32:45 +03:00
Simon Pilgrim	eda13c2420	LegacyDivergenceAnalysis.h - reduce DivergenceAnalysis.h include to forward declaration. NFC. Move implicit include dependencies down to source file.	2020-06-06 13:30:00 +01:00
Stanislav Mekhanoshin	5d62606f90	AMDGPU/GlobalISel: cmp/select method for extract element Differential Revision: https://reviews.llvm.org/D80749	2020-06-05 12:57:40 -07:00
Jay Foad	72e4da45bd	Correctly report modified status for AMDGPUUnifyDivergentExitNodes Related to https://reviews.llvm.org/D80916 Differential Revision: https://reviews.llvm.org/D81271	2020-06-05 19:49:37 +01:00
Matt Arsenault	43bb1c239c	AMDGPU: Fix incorrect selection of buffer atomic fadd There were additional standalone patterns for these nodes which were missing the subtarget predicate.	2020-06-05 14:34:15 -04:00
Matt Arsenault	1657f0ebc2	AMDGPU: Fix overriding global FP atomic feature predicates Global TableGen let override blocks are pretty dangerous and override any local special cases. In this case, the broader HasFlatGlobalInsts was overriding the more specific predicate for FeatureAtomicFaddInsts. Make sure HasFlatGlobalInsts is implied by FeatureAtomicFaddInsts, and make sure the right predicate is used. One issue with independently setting the subtarget features on incompatible targets is all of the encoding families do not define all opcodes. This will hit an assert on gfx10 for example, since we set the encoding independently based on the generation and not based on a feature.	2020-06-04 17:50:38 -04:00
Matt Arsenault	651c36b508	AMDGPU: Select strict_fmul	2020-06-04 17:49:00 -04:00
Matt Arsenault	483d4daa5e	AMDGPU: Select strict_fma Like with strict_fadd, the legalization is scalarizing the v4f16 when it should split.	2020-06-04 17:49:00 -04:00
Matt Arsenault	ae26c064ce	AMDGPU: Select strict_fadd	2020-06-04 17:49:00 -04:00
Matt Arsenault	d259668731	AMDGPU: Set mayRaiseFPException This may be missing a few overrides to set it off still in some special cases. Since the flags set during selection should now be reliably preserved, this should not change codegen for non-strictfp functions.	2020-06-04 17:35:27 -04:00
Matt Arsenault	fe0d5121fa	AMDGPU/GlobalISel: Fix making LDS FP atomics legal on SI/CI	2020-06-04 16:50:19 -04:00
Matt Arsenault	af867b7850	DAG: Change computeKnownBitsForFrameIndex to be usable by GISel This wasn't getting much value from the DAG or depth arguments, since it's only called on the frame index root nodes. FrameIndexes can also only return a scalar value, so it also didn't need DemandedElts.	2020-06-04 10:50:26 -04:00
Jay Foad	590964c835	[AMDGPU] More accurate gfx10 latencies Differential Revision: https://reviews.llvm.org/D81012	2020-06-04 10:29:32 +01:00
Jay Foad	9ce0f7eed6	[AMDGPU] Introduce new sched classes for transcendental instructions This is in preparation for scheduling them slightly differently on gfx10. NFC. Differential Revision: https://reviews.llvm.org/D81011	2020-06-04 10:29:32 +01:00
Sam Parker	6f24ebc4ba	[NFCI][CostModel][AMDGPU] Simplify getUserCost Casts and intrinsics are now handled by the default implementation of getUserCost, so remove them from the backends switch statement. https://reviews.llvm.org/D80994	2020-06-04 08:51:28 +01:00
Matt Arsenault	a1a93ca48a	AMDGPU/GlobalISel: Handle uniform G_DYN_STACKALLOC	2020-06-03 19:56:07 -04:00
hsmahesha	29c17ed96e	[AMDGPU/MemOpsCluster] Code clean-up around accessing of memory operand width Summary: Clean-up the width computing logic given a memory operand, and re-arrange code to avoid code duplication. Reviewers: foad, rampitec, arsenm, vpykhtin, javedabsar Reviewed By: foad Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D80946	2020-06-03 14:03:52 +05:30
Carl Ritson	da33c96d47	[AMDGPU] Make SGPR spills exec mask agnostic Explicitly set the exec mask for SGPR spills and reloads. This fixes a bug where SGPR spills to memory could be incorrect if the exec mask was 0 (or differed between spill and reload). Additionally pack scalar subregisters (upto 16/32 per VGPR), so that the majority of scalar types can be spilt or reloaded with a simple memory access. This should amortize some of the additional overhead of manipulating the exec mask. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D80282	2020-06-03 12:34:26 +09:00
Matt Arsenault	4b1f6cdbf9	AMDGPU: Don't run indexing mode switches with exec = 0 Add mode defs rather than special casing this like some of the other instructions.	2020-06-02 13:47:48 -04:00
Matt Arsenault	452e0d9023	AMDGPU: Don't run mode switches with exec 0 These are scalar instructions that change vector instructions, so they should not be executed without any active lanes. The implementation of -amdgpu-skip-threshold also seem to be backwards from expected, since decreasing it prevents removal.	2020-06-02 13:47:48 -04:00
Matt Arsenault	85117e286d	AMDGPU: Fix not using scalar loads for global reads in shaders The pass which infers when it's legal to load a global address space as SMRD was only considering amdgpu_kernel, and ignoring the shader entry type calling conventions.	2020-06-02 09:49:23 -04:00
Matt Arsenault	a8f7209255	AMDGPU: Change internal tracking of wave size Store the log2 wave size instead of forcing division and log2 operations when querying either.	2020-06-01 17:55:08 -04:00
Matt Arsenault	89d48ccabe	AMDGPU: Fix not emitting nofpexcept on fdiv expansion In this awkward case, we have to emit custom pseudo-constrained FP wrappers. InstrEmitter concludes that since a mayRaiseFPException instruction had a chain, it can't add nofpexcept. Test deferred until mayRaiseFPException is really set on everything.	2020-06-01 14:10:26 -04:00
Matt Arsenault	20793b2aef	AMDGPU: Fix test in code directory	2020-06-01 13:26:51 -04:00
Matt Arsenault	ed08c4fb2e	AMDGPU: Remove dead file	2020-06-01 13:26:51 -04:00
hsmahesha	0ed2c04636	[AMDGPU/MemOpsCluster] Let mem ops clustering logic also consider number of clustered bytes Summary: While clustering mem ops, AMDGPU target needs to consider number of clustered bytes to decide on max number of mem ops that can be clustered. This patch adds support to pass number of clustered bytes to target mem ops clustering logic. Reviewers: foad, rampitec, arsenm, vpykhtin, javedabsar Reviewed By: foad Subscribers: MatzeB, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, javed.absar, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D80545	2020-06-01 22:52:34 +05:30
Matt Arsenault	7ad36491ca	AMDGPU: Fix alignment for dynamic allocas The alignment value also needs to be scaled by the wave size.	2020-06-01 13:06:37 -04:00
Matt Arsenault	a8ca0ec267	AMDGPU/GlobalISel: Add stub reg-bank aware combiner pass	2020-05-31 20:40:14 -04:00
Jay Foad	2768edfff1	[AMDGPU] Propagate fast-math flags when lowering FSIN and FCOS Differential Revision: https://reviews.llvm.org/D80813	2020-05-31 05:21:55 +01:00
Changpeng Fang	234eba90f4	AMDGPU: Add setTruncStoreAction for vector i64 types made legal recently Reviewers: rampitec, arsenm Differential Revision: https://reviews.llvm.org/D80853	2020-05-30 20:45:27 -07:00
Carl Ritson	d04147789f	[AMDGPU] Remove assertion on S1024 SGPR to VGPR spill Summary: Replace an assertion that blocks S1024 SGPR to VGPR spill. The assertion pre-dates S1024 and is not wave size dependent. Reviewers: arsenm, sameerds, rampitec Reviewed By: arsenm Subscribers: qcolombet, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D80783	2020-05-30 11:16:19 +09:00
Matt Arsenault	0892a96a05	AMDGPU: Optimize s_setreg_b32 to s_denorm_mode/s_round_mode This is a custom inserter because it was less work than teaching tablegen a way to indicate that it is sometimes OK to have a no side effect instruction in the output of a side effecting pattern. The asm is needed to look like a read of the mode register to prevent it from being deleted. However, there seems to be a bug where the mode register def instructions are moved across the asm sideeffect by the post-RA scheduler. Another oddity is the immediate is formatted differently between s_denorm_mode and s_round_mode.	2020-05-29 21:11:36 -04:00
Matt Arsenault	f012c58abd	AMDGPU: Move MIMG MMO check to verifier	2020-05-29 20:58:23 -04:00
Christopher Tetreault	aad9365482	[SVE] Eliminate calls to default-false VectorType::get() from AMDGPU Reviewers: efriedma, david-arm, fpetrogalli, arsenm Reviewed By: david-arm Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, tschuett, hiraditya, rkruppe, psnobl, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D80328	2020-05-29 17:54:17 -07:00
Matt Arsenault	2484109378	AMDGPU/GlobalISel: Add boilerplate for inline asm lowering Test mostly from minor adjustments to the AArch64 one.	2020-05-29 16:49:23 -04:00
Matt Arsenault	2d2627d47a	AMDGPU: Remove fp-exceptions feature This was never used, and the only thing it changed was removed in `284472be6d`. The floating point mode is also not a property of the subtarget.	2020-05-29 15:19:59 -04:00
Jay Foad	b28d038ff3	[AMDGPU] Better use of llvm::numbers Tweak a few constant expressions involving numbers::pi etc to avoid rounding errors. NFCI though it's possible some of these will now be more accurate in the last bit.	2020-05-29 09:55:36 +01:00
Jay Foad	036d4b0dbf	[AMDGPU] Use numbers::pi instead of M_PI. NFC.	2020-05-29 09:55:36 +01:00
Matt Arsenault	e13c84c3be	GlobalISel: Work on improving stock set of legality predicates I get confused by a lot of the predicate names here, since I would assume they apply to vectors as well. Rename to reflect they only apply to scalars. Also add a few predicates AMDGPU uses that should be generally useful. Also add any() to complement all. I've wanted to use this a few times but then worked around it not being there.	2020-05-28 20:28:24 -04:00
Matt Arsenault	4859dd4170	AMDGPU: Handle rewriting ptrmask for more address spaces If this mask only clears bits in the low 32-bit half of a flat pointer, these bits are always preserved in the result address space. If the high bits are modified, they may need to be preserved for some kind of user pointer tagging.	2020-05-28 14:35:15 -04:00
Matt Arsenault	97f3f0bab0	AMDGPU: Add intrinsic for s_setreg This will be more useful with fenv access implemented.	2020-05-28 14:26:38 -04:00
alex-t	b726d071b4	[AMDGPU] Reject moving PHI to VALU if the only VGPR input originated from move immediate Summary: PHIs result register class is set to VGPR or SGPR depending on the cross block value divergence. In some cases uniform PHI need to be converted to return VGPR to prevent the oddnumber of moves values from VGPR to SGPR and back. PHI should certainly return VGPR if it has at least one VGPR input. This change adds the exception. We don't want to convert uniform PHI to VGPRs in case the only VGPR input is a VGPR to SGPR COPY and definition od the source VGPR in this COPY is move immediate. bb.0: %0:vgpr_32 = V_MOV_B32_e32 0, implicit $exec %2:sreg_32 = ..... bb.1: %3:sreg_32 = PHI %1, %bb.3, %2, %bb.1 S_BRANCH %bb.3 bb.3: %1:sreg_32 = COPY %0 S_BRANCH %bb.2 Reviewers: rampitec Reviewed By: rampitec Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D80434	2020-05-28 19:25:51 +03:00
Matt Arsenault	1a9e0d7092	AMDGPU: Make S_DENORM_MODE not be a scheduling boundary Now that the mode register uses/defs should be properly modeled, we don't need to treat the FP mode switch as an arbitrary side effect.	2020-05-28 10:39:33 -04:00
Matt Arsenault	d6671ee90c	InferAddressSpaces: Handle ptrmask intrinsic This one is slightly odd since it counts as an address expression, which previously could never fail. Allow the existing TTI hook to return the value to use, and re-use it for handling how to handle ptrmask. Handles the no-op addrspacecasts for AMDGPU. We could probably do something better based on analysis of the mask value based on the address space, but leave that for now.	2020-05-28 10:04:02 -04:00
Dmitry Preobrazhensky	f47e27e260	[AMDGPU][MC][GFX908] Corrected src0 of v_accvgpr_write to accept only VGPRs and inline constants. This change disables use of special SGPR registers like scc, vccz, execz, etc as operands of v_accvgpr_write. See bug 45414: https://bugs.llvm.org/show_bug.cgi?id=45414 Reviewers: arsenm, rampitec Differential Revision: https://reviews.llvm.org/D80530	2020-05-28 15:10:55 +03:00
Dmitry Preobrazhensky	45251ef534	[AMDGPU][MC] Corrected v_writelane_b32 to fix a decoding bug Corrected vdst_in to match vdst operand type. See bug 45193: https://bugs.llvm.org/show_bug.cgi?id=45193 Reviewers: arsenm, rampitec Differential Revision: https://reviews.llvm.org/D80636	2020-05-28 14:43:49 +03:00
Dmitry Preobrazhensky	bab5dadfcd	[AMDGPU][MC][DISASSEMBLER] Corrected decoder to consume each code fragment only once Summary: disabled disassembly of successfully decoded fragments of code. See detailed bug description: https://bugs.llvm.org/show_bug.cgi?id=46101 Reviewers: arsenm, rampitec Differential Revision: https://reviews.llvm.org/D80637	2020-05-28 14:20:18 +03:00
Stanislav Mekhanoshin	7392bbc301	AMDGPU/GlobalISel: Fixed insert element for non-standard vectors Differential Revision: https://reviews.llvm.org/D80653	2020-05-27 16:26:22 -07:00
Matt Arsenault	5e007fe998	AMDGPU: Support non-entry block static sized allocas OpenMP emits these for some reason, so handle them. Assume these use 4096 bytes by default, with a flag to override this. Also change the related stack assumption for calls to have a flag.	2020-05-27 18:46:10 -04:00
Stanislav Mekhanoshin	8aa81aaebe	AMDGPU/GlobalISel: Fixed handling of non-standard vectors We do not have register classes for all possible vector sizes, so round it up for extract vector element. Also fixes selection of G_MERGE_VALUES when vectors are not a power of two. This has required to refactor getRegSplitParts() in way that it can handle not just power of two vectors. Ideally we would like RegSplitParts to be generated by tablegen. Differential Revision: https://reviews.llvm.org/D80457	2020-05-27 15:44:09 -07:00
alex-t	eb1092ada3	[AMDGPU] Fix for the lost CarryOut/CarryIn register operands in S_ADD/SUB_CO_PSEUDO. Summary: This fixes the `5b898bddff` bug when the carry-in and carry-out registers became lost in lowering S_ADD/SUB_CO_PSEUDO. Reviewers: rampitec, arsenm Reviewed By: arsenm Subscribers: msearles, arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D80158	2020-05-27 22:41:04 +03:00
Matt Arsenault	4b4496312e	AMDGPU: Start adding MODE register uses to instructions This is the groundwork required to implement strictfp. For now, this should be NFC for regular instructoins (many instructions just gain an extra use of a reserved register). Regalloc won't rematerialize instructions with reads of physical registers, but we were suffering from that anyway with the exec reads. Should add it for all the related FP uses (possibly with some extras). I did not add it to either the gpr index mode instructions (or every single VALU instruction) since it's a ridiculous feature already modeled as an arbitrary side effect. Also work towards marking instructions with FP exceptions. This doesn't actually set the bit yet since this would start to change codegen. It seems nofpexcept is currently not implied from the regular IR FP operations. Add it to some MIR tests where I think it might matter.	2020-05-27 14:47:00 -04:00
Matt Arsenault	d37ce53ad3	AMDGPU: Set StackPointerRegisterToSaveRestore This will enable selecting non-entry block allocas. Skip the SP write check in the base isSchedulingBoundary implementation to preserve the previous scheduling behavior and avoid test churn. It's apparently for compile time reasons, but if we were to use this more work would be needed since in some of the failing tests, we seem to incorrectly get hazard nops inserted.	2020-05-27 13:44:05 -04:00
Matt Arsenault	07cd19efa2	AMDGPU: Fix dropping MI flags when rewriting instructions All 3 passes that change instruction encodings were dropping MI flags. This avoids scheduling regressions caused by setting mayRaiseFPExceptions on FP instructions for non-strictfp functions.	2020-05-27 13:27:06 -04:00
Matt Arsenault	833996cef1	AMDGPU: Fix backwards s_cselect_* operands The vector equivalent has backwards operands, but the scalar version does not. The passes that use these hooks aren't enabled by default, so this doesn't really change anything.	2020-05-27 09:26:09 -04:00
Matt Arsenault	ef3e831226	GlobalISel: Basic legalization for G_PTRMASK	2020-05-26 21:20:30 -04:00
Stanislav Mekhanoshin	512e806a33	[AMDGPU] Bail alloca vectorization if GEP not found Differential Revision: https://reviews.llvm.org/D80587	2020-05-26 13:59:49 -07:00
Matt Arsenault	bb10fa3a53	AMDGPU: Fix wrong null value for private address space I'm guessing this was a holdover from when 0 was an invalid stack pointer, but surprised nobody has discovered this before. Also don't allow offset folding for -1 pointers, since it looks weird to partially fold this.	2020-05-26 16:35:13 -04:00
Matt Arsenault	e09064e97f	AMDGPU: Update store node checks for atomics Prepare to switch to using StoreSDNode for atomic stores.	2020-05-26 15:20:03 -04:00
Matt Arsenault	9786e7552d	Revert "[AMDGPU] NFC target dependent requiresUniformRegister refactored out" This reverts commit `fb38b98338`. This will regress compile time.	2020-05-26 12:58:18 -04:00
alex-t	fb38b98338	[AMDGPU] NFC target dependent requiresUniformRegister refactored out Summary: Target specific method encapsulated into the Target Lowering Info. Reviewers: rampitec, vpykhtin Reviewed By: rampitec Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D70085	2020-05-26 19:49:20 +03:00
Matt Arsenault	50d4b22ca0	AMDGPU/GlobalISel: Fix assert on 16-bit G_EXTRACT results I consider this to be a hack, since we probably should not mark any 16-bit extract as legal, and require all extracts to be done on multiples of 32. There are quite a few more battles to fight in the legalizer for sub-dword vectors, so just select this for now so we can pass OpenCL conformance without crashing. Also fix the same assert for G_INSERTs. Unlike G_EXTRACT there's not a trivial way to select this so just fail on it.	2020-05-26 12:14:08 -04:00
Matt Arsenault	8bc03d2168	GlobalISel: Merge G_PTR_MASK with llvm.ptrmask intrinsic Confusingly, these were unrelated and had different semantics. The G_PTR_MASK instruction predates the llvm.ptrmask intrinsic, but has a different format. G_PTR_MASK only allows clearing the low bits of a pointer, and only a constant number of bits. The ptrmask intrinsic allows an arbitrary mask. Replace G_PTR_MASK to match the intrinsic. Only selects the cases that look like the old instruction. More work is needed to select the general case. Also new legalization code is still needed to deal with the case where the incoming mask size does not match the pointer size, which has a specified behavior in the langref.	2020-05-26 11:48:13 -04:00
Matt Arsenault	2dd7714b8d	AMDGPU/GlobalISel: Don't select boolean phi by default This is currently missing most of the hard parts to lower correctly, so disable it for now. This fixes at least one OpenCL conformance test and allows it to pass with fallback. Hide this behind an option for now.	2020-05-26 11:01:21 -04:00
vpykhtin	92f3828dc5	[AMDGPU] Fix wait counts in the presence of 16bit subregisters Differential Revision: https://reviews.llvm.org/D80033	2020-05-26 12:19:27 +03:00
Sam Parker	871556a494	[CostModel] Unify Intrinsic Costs. Recommitting most of the remaining changes from `259eb619ff`, but excluding the call to getUserCost from getInstructionThroughput. Though there's still no test changes, I doubt that this is an NFC... With the two getIntrinsicInstrCosts folded into one, now fold in the scalar/code-size orientated getIntrinsicCost. The remaining scalar intrinsics were memcpy, cttz and ctlz which now have special handling in the BasicTTI implementation. This had required a change in the AMDGPU backend for fabs as it should always be 'free'. I've also changed the X86 backend to return the BaseT implementation when the CostKind isn't RecipThroughput. Differential Revision: https://reviews.llvm.org/D80012	2020-05-26 09:48:26 +01:00
Dmitry Preobrazhensky	77aec3b4c0	[AMDGPU][MC][GFX8+] Enabled clamp for v_add_u16, v_sub_u16 and v_subrev_u16 See https://bugs.llvm.org/show_bug.cgi?id=45926 Reviewers: arsenm, rampitec, vpykhtin Differential Revision: https://reviews.llvm.org/D80430	2020-05-25 19:55:38 +03:00
Dmitry Preobrazhensky	b087b91c91	[AMDGPU][CODEGEN] Added 'A' constraint for inline assembler Summary: 'A' constraint requires an immediate int or fp constant that can be inlined in an instruction encoding. Reviewers: arsenm, rampitec Differential Revision: https://reviews.llvm.org/D78494	2020-05-25 14:23:34 +03:00
Simon Pilgrim	71bed8206b	AMDGPU.h - reduce TargetMachine.h include. NFC. Replace TargetMachine.h include with forward declaration and CodeGen.h include in AMDGPU.h. Exposes a couple of implicit dependencies that require additional forward declarations/includes.	2020-05-24 15:27:41 +01:00
Simon Pilgrim	b05b69e056	AMDGPUInstPrinter.cpp - add CommandLine.h include. NFC. Fixes implicit dependency that will be exposed by a future patch.	2020-05-24 14:17:04 +01:00
Simon Pilgrim	725b3463c5	AMDGPUTargetObjectFile.h - remove unnecessary includes. NFC. As we're inheriting from TargetLoweringObjectFileELF, TargetLoweringObjectFileImpl.h already declares all types we require in the overrides.	2020-05-24 13:57:02 +01:00
Simon Pilgrim	a650256062	AMDGPULibFunc - fix include order. NFC. Ensure AMDGPULibFunc.h module header is first, and fix exposed missing forward declaration.	2020-05-24 13:25:59 +01:00
Matt Arsenault	76e3dd0a49	AMDGPU: Implement isConstantPhysReg I don't think any of these registers are used in contexts where this would do anything yet.	2020-05-23 13:24:42 -04:00
Matt Arsenault	2e82667f60	AMDGPU: Define mode register This should eventually model FP mode constraints as well as the other special fields it tracks.	2020-05-23 13:24:42 -04:00
Stanislav Mekhanoshin	62fb3fa6d9	[AMDGPU] Define 6 dword subregs This prevents autogeneration of degenerate names for these. Differential Revision: https://reviews.llvm.org/D80451	2020-05-22 13:53:29 -07:00
Matt Arsenault	66fe60220c	AMDGPU/GlobalISel: Fix masked control flow with fallthrough blocks Unlike SelectionDAGBuilder, IRTranslator omits the unconditional branch in fallthrough cases. Confusingly, the control flow pseudos function in the opposite way the intrinsics are used, and the branch targets always need to be swapped. We're inverting the target blocks, so we need to figure out the old fallthrough block and insert a branch to the original unconditional branch target.	2020-05-22 10:31:44 -04:00
Dmitry Preobrazhensky	933ebc4078	[AMDGPU][MC][GFX8+] Enabled clamp for v_mul_i32_i24_e64 and v_mul_u32_u24_e64 See bug 45925: https://bugs.llvm.org/show_bug.cgi?id=45925 Reviewers: arsenm, rampitec Differential Revision: https://reviews.llvm.org/D80287	2020-05-22 14:11:31 +03:00
Tim Renouf	d13a508820	[AMDGPU] Fixed incorrect PAL metadata register naming This only affects assembly and -filetype=asm codegen of PAL metadata. Differential Revision: https://reviews.llvm.org/D78860 Change-Id: I7b822e1917bf7b403486820d31afc483be207652	2020-05-21 22:13:19 +01:00
Stanislav Mekhanoshin	689e616ed0	[AMDGPU] Promote alloca to vector in opt Promote alloca to vector before SROA and loop unroll. If we manage to eliminate allocas before unroll we may choose to unroll less. Differential Revision: https://reviews.llvm.org/D80386	2020-05-21 13:49:51 -07:00
Stanislav Mekhanoshin	1dfd1b3e4b	[AMDGPU] Tune threshold for cmp/select vector lowering It was set in total vector size while the idea was to limit a number of instructions. Now it started to work with doubles and thresholds needs to be updated. Differential Revision: https://reviews.llvm.org/D80322	2020-05-21 08:59:35 -07:00
Sam Parker	259eb619ff	Revert "[CostModel] Unify Intrinsic Costs." This reverts commit `de71def3f5`. This is causing some very large changes, so I'm first going to break this patch down and re-commit in parts.	2020-05-21 12:50:24 +01:00
Sam Parker	de71def3f5	[CostModel] Unify Intrinsic Costs. With the two getIntrinsicInstrCosts folded into one, now fold in the scalar/code-size orientated getIntrinsicCost. This involved sinking cost of the TTIImpl into the base implementation, as it performs no target checks. The opcodes remaining were memcpy, cttz and ctlz which now have special handling in the BasicTTI implementation. getInstructionThroughput can now directly return the result of getUserCost. This had required a change in the AMDGPU backend for fabs and its always 'free'. I've also changed the X86 backend to return '1' for any intrinsic when the CostKind isn't RecipThroughput. Though this intended to be a non-functional change, there are many paths being combined here so I would be very surprised if this didn't have an effect. Differential Revision: https://reviews.llvm.org/D80012	2020-05-21 07:38:25 +01:00
Stanislav Mekhanoshin	4eecf17164	[AMDGPU] Always expand ext/insertelement with divergent idx Even though series of cmd/cndmask can produce quite a lot of code that is still better than a loop. In case of doubles we would even produce two loops. Differential Revision: https://reviews.llvm.org/D80032	2020-05-20 15:51:29 -07:00
Matt Arsenault	e8f6b0e583	AMDGPU/GlobalISel: Fix splitting 64-bit extensions This was replicating the low bits into the high bits for G_ZEXT, rather than using 0.	2020-05-20 11:13:32 -04:00
Sam Parker	8cc911fa5b	[NFCI][CostModel] Refactor getIntrinsicInstrCost Combine the two API calls into one by introducing a structure to hold the relevant data. This has the added benefit of moving the boiler plate code for arguments and flags, into the constructors. This is intended to be a non-functional change, but the complicated web of logic involved here makes it very hard to guarantee. Differential Revision: https://reviews.llvm.org/D79941	2020-05-20 11:59:08 +01:00
Stanislav Mekhanoshin	677929e352	[AMDGPU] Process V_MOV_B32_indirect in SET_GPR_IDX optimization Differential Revision: https://reviews.llvm.org/D80256	2020-05-19 21:37:14 -07:00
QingShan Zhang	2b59e9f1bd	[DAGCombine] Remove the getNegatibleCost to avoid the out of sync with getNegatedExpression We have the getNegatibleCost/getNegatedExpression to evaluate the cost and negate the expression. However, during negating the expression, the cost might change as we are changing the DAG, and then, hit the assertion if we negated the wrong expression as the cost is not trustful anymore. This patch is target to remove the getNegatibleCost to avoid the out of sync with getNegatedExpression, and check the cost during negating the expression. It also reduce the duplicated code between getNegatibleCost and getNegatedExpression. And fix the crash for the test in D76638 Reviewed By: RKSimon, spatel Differential Revision: https://reviews.llvm.org/D77319	2020-05-20 02:12:16 +00:00
Matt Arsenault	21d2884a9c	AMDGPU: Annotate functions that have stack objects Relying on any MachineFunction state in the MachineFunctionInfo constructor is hazardous, because the construction time is unclear and determined by the first use. The function may be only partially constructed, which is part of why we have many of these hacky string attributes to track what we need for ABI lowering. For SelectionDAG, all stack objects are created up-front before calling convention lowering so stack objects are visible at construction time. For GlobalISel, none of the IR function has been visited yet and the allocas haven't been added to the MachineFrameInfo yet. This should fix failing to set flat_scratch_init in GlobalISel when needed. This pass really needs to be turned into some kind of analysis, but I haven't found a nice way use one here.	2020-05-19 18:51:00 -04:00
Matt Arsenault	074b802654	AMDGPU: Fix DAG divergence for implicit function arguments This should be directly implied from the register class, and there's no need to special case live ins here. This was getting the wrong answer for the queue ptr argument in callable functions, since it's not an explicit IR argument and is always uniform. Fixes not using scalar loads for the aperture in addrspacecast lowering, and any other places that use implicit SGPR arguments.	2020-05-19 18:11:34 -04:00
Matt Arsenault	61813b8069	AMDGPU: Use member initializers in MFI	2020-05-19 18:11:34 -04:00
Matt Arsenault	4dad4914f7	CodeGen: Use Register	2020-05-19 17:56:55 -04:00
Stanislav Mekhanoshin	50f3bb1329	[AMDGPU] Fixed selection error for 64 bit extract_subvector Differential Revision: https://reviews.llvm.org/D80155	2020-05-18 14:17:59 -07:00
Matt Arsenault	b27a538dda	AMDGPU: Fix illegally constant folding from V_MOV_B32_sdwa This was assumed to be a simple move, and interpreting the immediate modifier operand as a materialized immediate. Apparently the SDWA pass never produces these, but GlobalISel does emit these for some vector shuffles.	2020-05-18 15:34:33 -04:00
Matt Arsenault	bf527a1dc4	AMDGPU/GlobalISel: Fix f64 G_FDIV lowering This was using an integer multiply instead of FP.	2020-05-18 15:14:08 -04:00
Matt Arsenault	4c70074e54	AMDGPU/GlobalISel: Fix splitting wide VALU, non-vector loads	2020-05-18 12:06:53 -04:00
Matt Arsenault	681a161ff5	AMDGPU: Remove outdated comment	2020-05-18 12:06:16 -04:00
Dmitry Preobrazhensky	f997370d9c	[AMDGPU][MC] Corrected branch relocation handling to detect undefined labels Fixed ELF object writer to die gracefully when an undefined label is encountered in a branch instruction. See https://bugs.llvm.org/show_bug.cgi?id=41914. Reviewers: arsenm, rampitec Differential Revision: https://reviews.llvm.org/D79943	2020-05-18 14:04:58 +03:00
Christudasan Devadasan	7c4e711ef8	[AMDGPU] Enable base pointer. When the callee requires a dynamic stack realignment, it is not possible to correcty access the incoming stack arguments using the stack pointer. We reserve a base pointer in such cases to access the function arguments inside the callee. The base pointer will hold the incoming stack pointer value before any kind of delta added to it. Reviewed By: arsenm, scott.linder Differential Revision: https://reviews.llvm.org/D78811	2020-05-17 16:13:55 +05:30
Eli Friedman	4f04db4b54	AllocaInst should store Align instead of MaybeAlign. Along the lines of D77454 and D79968. Unlike loads and stores, the default alignment is getPrefTypeAlign, to match the existing handling in various places, including SelectionDAG and InstCombine. Differential Revision: https://reviews.llvm.org/D80044	2020-05-16 14:53:16 -07:00
Carl Ritson	a065a01bf7	[AMDGPU] Allow use of StackPtrOffsetReg when building spills Summary: When spilling in the entry function we should be able to borrow StackPtrOffsetReg as a last resort. This restores behaviour removed in D75138, and fixes failures when shaders use all SGPRs, VGPRs and spill in the entry function. Reviewers: scott.linder, arsenm, tpr Reviewed By: scott.linder, arsenm Subscribers: qcolombet, foad, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, t-tye, hiraditya, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D79776	2020-05-16 11:54:43 +09:00
Mircea Trofin	08e2386dee	Revert "Revert "[llvm][NFC] Cleanup uses of std::function in Inlining-related APIs"" This reverts commit `454de99a6f`. The problem was that one of the ctor arguments of CallAnalyzer was left to be const std::function<>&. A function_ref was passed for it, and then the ctor stored the value in a function_ref field. So a std::function<> would be created as a temporary, and not survive past the ctor invocation, while the field would. Tested locally by following https://github.com/google/sanitizers/wiki/SanitizerBotReproduceBuild Original Differential Revision: https://reviews.llvm.org/D79917	2020-05-15 12:29:16 -07:00
Eli Friedman	11aa3707e3	StoreInst should store Align, not MaybeAlign This is D77454, except for stores. All the infrastructure work was done for loads, so the remaining changes necessary are relatively small. Differential Revision: https://reviews.llvm.org/D79968	2020-05-15 12:26:58 -07:00
Jay Foad	10c10f2419	[AMDGPU] Fix assertion failure in SIInsertHardClauses This new pass failed an assertion whenever there were s_nops after the end of clause. Differential Revision: https://reviews.llvm.org/D80007	2020-05-15 15:49:52 +01:00
Mircea Trofin	454de99a6f	Revert "[llvm][NFC] Cleanup uses of std::function in Inlining-related APIs" This reverts commit `767db5be67`.	2020-05-14 22:32:44 -07:00
Mircea Trofin	767db5be67	[llvm][NFC] Cleanup uses of std::function in Inlining-related APIs Summary: Replacing uses of std::function pointers or refs, or Optional, to function_ref, since the usage pattern allows that. If the function is optional, using a default parameter value (nullptr). This led to a few parameter reshufles, to push all optionals to the end of the parameter list. Reviewers: davidxl, dblaikie Subscribers: arsenm, jvesely, nhaehnle, eraman, hiraditya, haicheng, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D79917	2020-05-14 22:13:53 -07:00
Stanislav Mekhanoshin	7d16a22eb0	[AMDGPU] Peephole adjacent equivalent S_SET_GPR_IDX_ON Differential Revision: https://reviews.llvm.org/D79907	2020-05-14 15:44:33 -07:00
Stanislav Mekhanoshin	9d4cf5bd42	[AMDGPU] Make v16f64/v16i64 legal This allows indirect VGPR addressing to work. Differential Revision: https://reviews.llvm.org/D79960	2020-05-14 14:46:55 -07:00
Stanislav Mekhanoshin	184b383457	Add v16f64 value type We need to use it to handle <16 x double> indirect indexes in the AMDGPU BE. The only visible change from adding it is in ARM cost model. To me it looks reasonable. With doubling a vector size it quadruples the cost up to the size 8 and then it did only double it. Now it also quadruples, which seems a logical progression to me. Actual AMDGPU code is to follow, this is a common part, plus load/store legalization in the AMDGPU BE not to break what works now. Differential Revision: https://reviews.llvm.org/D79952	2020-05-14 14:28:00 -07:00
Jay Foad	42a5560503	[AMDGPU] New SIInsertHardClauses pass Enable clausing of memory loads on gfx10 by adding a new pass to insert the s_clause instructions that mark the start of each hard clause. Differential Revision: https://reviews.llvm.org/D79792	2020-05-14 18:54:49 +01:00
Christopher Tetreault	3254a001fc	[SVE] Remove usages of VectorType::getNumElements() from AMDGPU Reviewers: efriedma, arsenm, david-arm, fpetrogalli Reviewed By: efriedma Subscribers: dmgreen, arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, tschuett, hiraditya, rkruppe, psnobl, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D79807	2020-05-13 15:57:55 -07:00
Stanislav Mekhanoshin	591b029f40	[AMDGPU] Optimized indirect multi-VGPR addressing SelectMOVRELOffset prevents peeling of a constant from an index if final base could be negative. isBaseWithConstantOffset() succeeds if a value is an "add" or "or" operator. In case of "or" it shall be an add-like "or" which never changes a sign of the sum given a non-negative offset. I.e. we can safely allow peeling if operator is an "or". Differential Revision: https://reviews.llvm.org/D79898	2020-05-13 14:53:16 -07:00
Matt Arsenault	704b539f65	AMDGPU: Use Register	2020-05-13 15:31:54 -04:00
Carl Ritson	195de442da	[AMDGPU] Strengthen export cluster ordering Summary: When removing barrier edges on exports then dependencies need to be propagated. Reviewers: foad Reviewed By: foad Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D79855	2020-05-13 23:07:37 +09:00
Dmitry Preobrazhensky	18a5428e60	[AMDGPU][MC][GFX9+] Enabled clamp for v_add_i32 and v_sub_i32 See bug 45830: https://bugs.llvm.org/show_bug.cgi?id=45830 Reviewers: arsenm, rampitec Differential Revision: https://reviews.llvm.org/D79585	2020-05-13 14:17:20 +03:00
Stanislav Mekhanoshin	71ed66d97f	[AMDGPU] Make v4i64/v4f64/v8i64/v8f64 legal We can produce such vectors in the Promote Alloca pass, but we are unable to use movrel to operate it and lower via scratch. Making it legal makes SI_INDIRECT patterns work. There is more work to do in subsequent changes: 1. We initialize m0 twice to access each dword. It shall be possible to only do it once and increment base register number instead. 2. We also need v16i64/v16f64 but these first need to be added to tablegen. Differential Revision: https://reviews.llvm.org/D79808	2020-05-12 16:05:12 -07:00
Austin Kerbow	9f0b736126	[AMDGPU] Add AGPRs to getRegClassForSizeOnBank Differential Revision: https://reviews.llvm.org/D79761	2020-05-12 10:14:00 -07:00
Carl Ritson	58f1417ebc	[AMDGPU] Order pos exports before param exports Summary: Modify export clustering DAG mutation to move position exports before other exports types. Reviewers: foad, arsenm, rampitec, nhaehnle Reviewed By: foad Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, hiraditya, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D79670	2020-05-12 23:02:23 +09:00
Eric Christopher	59a299cbb3	Fix a release+noasserts werror for unused variable.	2020-05-11 20:03:23 -07:00
Austin Kerbow	1429e4c399	[AMDGPU][GlobalISel] Revise handling of wide loads in RegBankSelect When splitting loads in RegBankSelect G_EXTRACT_VECTOR_ELT were being added which could not be selected. Since invoking the legalizer will generate instructions that split and combine wide loads, we can remove the redundant repair instructions which are added by RegBankSelect. Differential Revision: https://reviews.llvm.org/D75547	2020-05-11 18:10:16 -07:00
Saiyedul Islam	117e5609e9	[AMDGPU] Reserving VGPR for future SGPR Spill Summary: One VGPR register is allocated to handle a future spill of SGPR if "--amdgpu-reserve-vgpr-for-sgpr-spill" option is used Reviewers: arsenm, rampitec, msearles, cdevadas Reviewed By: arsenm Subscribers: madhur13490, qcolombet, kerbowa, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits Tags: #amdgpu, #llvm Differential Revision: https://reviews.llvm.org/D70379	2020-05-12 00:33:00 +00:00
Austin Kerbow	09253b608a	[AMDGPU] Allow spilling FP to memory If there are no available lanes in a reserved VGPR, no free SGPR, and no unused CSR VGPR when trying to save the FP it needs to be spilled to memory as a last resort. This can be done in the prolog/epilog if we manually add the spill and manage exec. Differential Revision: https://reviews.llvm.org/D79610	2020-05-11 16:42:59 -07:00
Stanislav Mekhanoshin	310d32cb80	[AMDGPU] Fix promote alloca which is already vector Just do not touch loads and stores which are already vector. Previously pass was just unable to see these loads and stores because these were hidden bitcasts. Differential Revision: https://reviews.llvm.org/D79738	2020-05-11 14:52:31 -07:00
Sam McCall	728cf6d86b	Revert "[DAGCombine] Remove the getNegatibleCost to avoid the out of sync with getNegatedExpression" This reverts commit `3c44c441db`. Causes infloops on some inputs, see https://reviews.llvm.org/D77319 for repro	2020-05-11 16:44:01 +02:00
QingShan Zhang	3c44c441db	[DAGCombine] Remove the getNegatibleCost to avoid the out of sync with getNegatedExpression We have the getNegatibleCost/getNegatedExpression to evaluate the cost and negate the expression. However, during negating the expression, the cost might change as we are changing the DAG, and then, hit the assertion if we negated the wrong expression as the cost is not trustful anymore. This patch is target to remove the getNegatibleCost to avoid the out of sync with getNegatedExpression, and check the cost during negating the expression. It also reduce the duplicated code between getNegatibleCost and getNegatedExpression. And fix the crash for the test in D76638 Reviewed By: RKSimon, spatel Differential Revision: https://reviews.llvm.org/D77319	2020-05-11 02:41:10 +00:00
Matt Arsenault	3af85fa8f0	GlobalISel: Handle more cases in lowerUnmergeValues Handle scalar sources, as well as vectors.	2020-05-09 19:33:32 -04:00
Matt Arsenault	69999605ee	GlobalISel: Move code into lowering for G_MERGE_VALUES Currently this code exists in widenScalar for G_MERGE_VALUE sources. I'm not sure if the existing expansion in widenScalar should be removed or not. The widenScalar variant tries to extend to the requested size, but this just uses the original bitwidth.	2020-05-09 16:39:37 -04:00
Matt Arsenault	beda9d04c2	AMDGPU: Skip GetUnderlyingObject check in pointsToConstantMemory Check the address space first before searching for the object definition to save compile time. As an added bonus, this will now treat casts to constant addrspace as constant. We also seemed to be missing targeted tests for this, so add a few missing other cases too.	2020-05-09 16:00:08 -04:00
Stanislav Mekhanoshin	db7dea2b6f	[AMDGPU] Vectorize alloca thru bitcast This is mostly useful if alloca element type is not integer and then casted to an integer for load or store. We now can vectorize an [i32] alloca but cannot do so for [float]. There also a separate patch needed to properly lower 64 bit types after they vectorized. At the moment these are lowered via scratch anyway. Differential Revision: https://reviews.llvm.org/D79641	2020-05-08 15:11:38 -07:00
Matt Arsenault	78a43f10c7	AMDGPU: Don't assert on unknown address spaces Assume unknown address spaces behave like some flavor of global memory.	2020-05-08 12:57:27 -04:00
Matt Arsenault	fda0c8df28	AMDGPU: Lower addrspacecast to 32-bit constant Somehow this was missing from the DAG path, but not global isel.	2020-05-08 10:46:00 -04:00
Nikita Popov	5fa87ec004	[AMDGPU] Try to determine sign bit during div/rem expansion This is preparation for D79294, which removes an expensive InstSimplify optimization, on the assumption that it will be picked up by InstCombine instead. Of course, this does not hold up if a backend performs non-trivial IR expansions without running a canonicalization pipeline afterwards, which turned up as an issue in the context of AMDGPU div/rem expansion. This patch mitigates the issue by explicitly performing a known bits calculation where it matters. No test changes, as those would only be visible after the other patch lands. Differential Revision: https://reviews.llvm.org/D79596	2020-05-08 10:11:26 +02:00
Carl Ritson	e3ffe7269b	[AMDGPU] Cluster shader exports Summary: Add DAG scheduling mutation to cluster export instructions. This avoids unnecessary waitcnts being added when computation ends up interspersed with exports. Reviewers: foad, arsenm, rampitec, nhaehnle Reviewed By: foad Subscribers: kzhuravl, jvesely, wdng, mgorny, yaxunl, dstuttard, tpr, t-tye, hiraditya, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D79481	2020-05-07 19:05:38 +09:00
Michael Liao	4ee5a04187	[amdgpu] Fix check of VCC. Summary: - Need to include checking on the new 16-bit subregs. Reviewers: rampitec Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D79498	2020-05-06 14:16:37 -04:00
Stanislav Mekhanoshin	54d6dfe996	[AMDGPU] Drop 16 bit subreg suffixes on print We do not want to break asm syntax. These suffixes are quite useful for debugging, so add an option to print them. Right now it is NFC. Differential Revision: https://reviews.llvm.org/D79435	2020-05-06 08:14:10 -07:00
Jay Foad	29067aac46	[AMDGPU] Don't implement GCNHazardRecognizer::PreEmitNoops(SUnit ) When called from the post-RA scheduler, hazards have already been handled by getHazardType returning NoopHazard, so PreEmitNoops always returns zero. Remove it. NFC. Historical note: PreEmitNoops was added to the hazard recognizer interface as an optional feature to support dispatch group formation on the POWER target: http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20131202/197470.html So it seems right that we shouldn't need to implement it. We do still implement the other overload PreEmitNoops(MachineInstr ) because that is used by the PostRAHazardRecognizer pass. Differential Revision: https://reviews.llvm.org/D79476	2020-05-06 16:11:19 +01:00
Ram Nalamothu	f7060f4f88	For PAL, make sure Scratch Buffer Descriptor do not clobber GIT pointer Since SRSRC has alignment requirements, first find non GIT pointer clobbered registers for SRSRC and then if those registers clobber preloaded Scratch Wave Offset register, copy the Scratch Wave Offset register to a free SGPR.	2020-05-06 10:31:15 -04:00
Matt Arsenault	074c371a48	AMDGPU: Insert kernarg code after allocas This produces more normal looking IR by keeping all the allocas clustered at the start of the block.	2020-05-06 10:19:56 -04:00
Dmitry Preobrazhensky	5998baccb9	[AMDGPU][MC][GFX9+] Enabled 21-bit signed offsets for SMEM instructions Reviewers: arsenm, rampitec Differential Revision: https://reviews.llvm.org/D79288	2020-05-06 14:13:10 +03:00
David Blaikie	025cd300cd	Collapse variable into assert to remove non-assert unused variable	2020-05-05 11:04:43 -07:00
Christudasan Devadasan	375cec4b6c	[AMDGPU] Introduce more scratch registers in the ABI. The AMDGPU target has a convention that defined all VGPRs (execept the initial 32 argument registers) as callee-saved. This convention is not efficient always, esp. when the callee requiring more registers, ended up emitting a large number of spills, even though its caller requires only a few. This patch revises the ABI by introducing more scratch registers that a callee can freely use. The 256 vgpr registers now become: 32 argument registers 112 scratch registers and 112 callee saved registers. The scratch registers and the CSRs are intermixed at regular intervals (a split boundary of 8) to obtain a better occupancy. Reviewers: arsenm, t-tye, rampitec, b-sumner, mjbedy, tpr Reviewed By: arsenm, t-tye Differential Revision: https://reviews.llvm.org/D76356	2020-05-05 23:02:58 +05:30
Stanislav Mekhanoshin	9ef166e657	[AMDGPU] Fix FoldImmediate for 16 bit operand Differential Revision: https://reviews.llvm.org/D79362	2020-05-05 10:19:14 -07:00
Jay Foad	3d76824b7f	[AMDGPU] Better support for VMEM soft clauses in GCNHazardRecognizer VMEM soft clauses only contain VMEM and FLAT instructions. Teaching GCNHazardRecognizer::checkSoftClauseHazards that other kinds of instructions will naturally break the clause means there are far fewer cases where it has to insert an s_nop instruction to forcibly break the clause. Differential Revision: https://reviews.llvm.org/D79353	2020-05-05 15:49:09 +01:00
Sebastian Neubauer	1de4e56933	[AMDGPU] Don't mark the .note section as ALLOC Marking a section as ALLOC tells the ELF loader to load the section into memory. As we do not want to load the notes into VRAM, the flag should not be there. On AMDHSA, .note is still marked as ALLOC, apparently this is currently needed for OpenCL (see https://reviews.llvm.org/D74995). Differential Revision: https://reviews.llvm.org/D76278	2020-05-05 14:21:45 +02:00
Sam Parker	40574fefe9	[NFC][CostModel] Add TargetCostKind to relevant APIs Make the kind of cost explicit throughout the cost model which, apart from making the cost clear, will allow the generic parts to calculate better costs. It will also allow some backends to approximate and correlate the different costs if they wish. Another benefit is that it will also help simplify the cost model around immediate and intrinsic costs, where we currently have multiple APIs. RFC thread: http://lists.llvm.org/pipermail/llvm-dev/2020-April/141263.html Differential Revision: https://reviews.llvm.org/D79002	2020-05-05 10:35:54 +01:00
Stanislav Mekhanoshin	c85eda74b8	[AMDGPU] fix copies between 32 and 16 bit This a hack to fix illegal 32 to 16 bit copies. The problem is when we make 16 bit subregs legal it creates a huge amount of failures which can only be resolved at once without a temporary hack like this. The next step is to change operands, instruction definitions and patterns until this hack is not needed. Differential Revision: https://reviews.llvm.org/D79119	2020-05-04 08:54:22 -07:00
alex-t	5b898bddff	[AMDGPU] Enable carry out ADD/SUB operations divergence driven instruction selection. Summary: This change enables all kind of carry out ISD opcodes to be selected according to the node divergence. Reviewers: rampitec, arsenm, vpykhtin Reviewed By: rampitec Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D78091	2020-05-04 16:42:25 +03:00
Jay Foad	5f7ea85e78	[AMDGPU] Remove unnecessary s_waitcnt between VMEM loads VMEM loads of the same type (sampler vs no sampler) are guaranteed to write their result registers in order, so there is no need for an s_waitcnt even if they write to overlapping vgprs. Differential Revision: https://reviews.llvm.org/D79176	2020-05-01 10:10:23 +01:00
Jay Foad	1bf7ccb706	[AMDGPU] Use int and unsigned instead of other 32-bit integer types. NFC.	2020-04-30 15:21:36 +01:00
Jay Foad	462b960de8	Fix silly mistake in `31c09d03a1` [AMDGPU] Remove WaitcntBrackets::MixedPendingEvents[]. NFC.	2020-04-30 11:41:14 +01:00
Jay Foad	86545bf72d	[AMDGPU] Simplify loops in SIInsertWaitcnts::generateWaitcntInstBefore The loops over use operands and def operands were mostly identical. Combine them, and likewise for load memoperands and store memoperands. NFC.	2020-04-30 08:53:12 +01:00
Jay Foad	9f59d1931c	[AMDGPU] Remove Def argument from WaitcntBrackets::getRegInterval. NFC. It's cleaner to check this in the callers instead.	2020-04-30 08:53:12 +01:00
Jay Foad	31c09d03a1	[AMDGPU] Remove WaitcntBrackets::MixedPendingEvents[]. NFC. It's trivial to derive this information from other state.	2020-04-29 19:58:19 +01:00
Jay Foad	120572072e	[AMDGPU] Initialize gpr upper bounds to -1. NFC. These upper bounds are inclusive, so -1 (rather than 0) is the natural way to express an empty range.	2020-04-29 19:58:06 +01:00
Jay Foad	777f91f47e	[AMDGPU] Simplify MergeInfo calculations. NFC. This makes the definition and uses of NewUB more symmetrical, and makes it clear that ScoreLBs[T] does not change.	2020-04-29 19:58:06 +01:00
Jay Foad	4649da119a	[AMDGPU] Use a MapVector instead of a DenseMap and a std::vector. NFC.	2020-04-29 16:02:24 +01:00
Jay Foad	2a10957f62	[AMDGPU] Minor cleanups. NFC.	2020-04-29 16:02:24 +01:00
Jay Foad	3c1f21cdf6	[AMDGPU] Remove some redundant variables. NFC.	2020-04-29 09:24:41 +01:00
Dmitri Gribenko	1a9cc47f94	Fixed a -Wunused-variable warning in no-assert builds	2020-04-29 09:12:47 +02:00
Stanislav Mekhanoshin	26777ad7a0	[AMDGPU] Adapt GCNRegBankReassign for 16 bit subregs It allows it not to crash and analyze 16 bit subregs if those appear in the instructions. At the same time it does not attempt to reassign these. It still can correctly identify register banks to let larger registers to be reassigned. More work will be needed here when real instructions will use these registers and more tests as well. Differential Revision: https://reviews.llvm.org/D78772	2020-04-28 16:16:04 -07:00
Stanislav Mekhanoshin	8a30460697	[AMDGPU] Define AGPR subregs These are only needed as VGPR counterpart. Differential Revision: https://reviews.llvm.org/D78597	2020-04-28 15:30:43 -07:00
Stanislav Mekhanoshin	46a75436f8	[AMDGPU] Define special SGPR subregs These are used in SReg_32 and when we start to use SGPR_LO16 there will be compaints that not all registers in RC support all subreg indexes. For now it is NFC. Unused regunits are reserved so that verifier does not complain about missing phys reg live-ins. Differential Revision: https://reviews.llvm.org/D78591	2020-04-28 14:57:46 -07:00
Stanislav Mekhanoshin	395d93358e	Revert "[AMDGPU] Define special SGPR subregs" This reverts commit `1baaa080e0`.	2020-04-28 13:53:15 -07:00
Stanislav Mekhanoshin	1baaa080e0	[AMDGPU] Define special SGPR subregs These are used in SReg_32 and when we start to use SGPR_LO16 there will be compaints that not all registers in RC support all subreg indexes. For now it is NFC. Unused regunits are reserved so that verifier does not complain about missing phys reg live-ins. Differential Revision: https://reviews.llvm.org/D78591	2020-04-28 13:34:24 -07:00
Sam Parker	e9c9329aa4	[TTI] Add TargetCostKind argument to getUserCost There are several different types of cost that TTI tries to provide explicit information for: throughput, latency, code size along with a vague 'intersection of code-size cost and execution cost'. The vectorizer is a keen user of RecipThroughput and there's at least 'getInstructionThroughput' and 'getArithmeticInstrCost' designed to help with this cost. The latency cost has a single use and a single implementation. The intersection cost appears to cover most of the rest of the API. getUserCost is explicitly called from within TTI when the user has been explicit in wanting the code size (also only one use) as well as a few passes which are concerned with a mixture of size and/or a relative cost. In many cases these costs are closely related, such as when multiple instructions are required, but one evident diverging cost in this function is for div/rem. This patch adds an argument so that the cost required is explicit, so that we can make the important distinction when necessary. Differential Revision: https://reviews.llvm.org/D78635	2020-04-28 08:57:45 +01:00
Craig Topper	a58b62b4a2	[IR] Replace all uses of CallBase::getCalledValue() with getCalledOperand(). This method has been commented as deprecated for a while. Remove it and replace all uses with the equivalent getCalledOperand(). I also made a few cleanups in here. For example, to removes use of getElementType on a pointer when we could just use getFunctionType from the call. Differential Revision: https://reviews.llvm.org/D78882	2020-04-27 22:17:03 -07:00
Jay Foad	498795829b	[AMDGPU] Remove odd blank line in debug output.	2020-04-27 17:10:36 +01:00
Simon Pilgrim	43d6f9a876	AMDGPU/Utils - cleanup include and forward declarations. NFC. Remove unused includes + forward declarations. Reduce unnecessary StringRef.h includes to StringRef forward declaration.	2020-04-26 12:12:21 +01:00
Fangrui Song	2cb48d620f	[TableGen] Drop deprecated leading # operation (NOP) and replace ## with #	2020-04-25 16:26:45 -07:00
Matt Arsenault	35e6a9c839	AMDGPU: Break read2/write2 search range on a memory fence This is to fix performance regressions introduced by `86c944d790`. The old search would collect all potentially mergeable instructions in the entire block. In this case, the same address is written in multiple places in the block on the other side of a fence. When sorted by offset, the two unmergeable, identical addresses would be next to each other and the merge would give up. Break the search space when we encounter an instruction we won't be able to merge across. This will keep the identical addresses in different merge attempts. This may also improve compile time by reducing the merge list size.	2020-04-24 15:53:30 -04:00
Simon Pilgrim	091f7f0103	AMDGPUArgumentUsageInfo.h - cleanup includes and forward declarations. NFC. Reduce Function.h include to (already existing) forward declaration. Remove unused GCNSubtarget/TargetMachine forward declarations.	2020-04-24 16:21:37 +01:00
Simon Pilgrim	d04059778e	SIRegisterInfo.h - remove unnecessary MachineRegisterInfo forward declaration. NFC. We already need to include MachineRegisterInfo.h	2020-04-24 13:27:57 +01:00
Piotr Sobczak	7631af3af2	[AMDGPU] Skip generating cache invalidating instructions on AMDPAL Summary: Frontend guarantees that coherent accesses have corresponding cache policy bits set (glc, dlc). Therefore there is no need for extra instructions that invalidate cache. Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D78800	2020-04-24 13:53:44 +02:00
Christudasan Devadasan	207cd5f68f	[AMDGPU] Add the SGPR used for FP copy to block livein lists. The temporary register used for FP copy should be live throughout the function.	2020-04-24 11:47:38 +05:30
Matt Arsenault	6bffd0df78	AMDGPU: Fix redundant members	2020-04-23 23:14:01 -04:00
Matt Arsenault	50128f8a33	AMDGPU: Use Register	2020-04-23 22:25:36 -04:00
Matt Arsenault	156afb2253	AMDGPU: Fix inlining logic for denormals This was backwards from intended and missing a test. We perhaps should just ignored the FP mode here, since it shouldn't be legal to mix code with different default modes in the absence of strictfp.	2020-04-23 15:30:48 -04:00
Matt Arsenault	89c8c80bd5	AMDGPU: Change pre-gfx9 implementation of fcanonicalize to mul If f32 denormals were enabled pre-gfx9, we would still try to implement this with v_max_f32. Pre-gfx9, these instructions ignored the denormal mode and did not flush. Switch to the multiply form for f32 as a workaround which should always work in any case. This fixes conformance failures when the library implementation of fmin/fmax were accidentally not inlined, forcing the assumption of no flushing on targets where denormals are not enabled by default. This is a workaround, since really we should not be mixing code with different FP mode expectations, but prefer the lowering that will work in any mode. Now this will always use max to implement canonicalize on gfx9+. This is only really beneficial for f64. For f32/f16 it's a neutral choice (and worse in terms of code size in 1 case), but possibly worse for the compiler since it does add an extra register use operand. Leave this change for later.	2020-04-23 15:24:13 -04:00
Jay Foad	cca6bc42d9	[AMDGPU] Use RegClass helper functions in getRegForInlineAsmConstraint. This avoids more long lists of register classes that have to be updated every time we add a new one. NFC. Differential Revision: https://reviews.llvm.org/D78570	2020-04-23 12:26:52 +01:00
Jay Foad	0337017a9f	[AMDGPU] Use SGPR instead of SReg classes `12994a70cf` did this for 128-bit classes: SGPR_128 only includes the real allocatable SGPRs, and SReg_128 adds the additional non-allocatable TTMP registers. There's no point in allocating SReg_128 vregs. This shrinks the size of the classes regalloc needs to consider, which is usually good. This patch extends it to all classes > 64 bits, for consistency. Differential Revision: https://reviews.llvm.org/D78622	2020-04-23 11:45:22 +01:00
Kazuaki Ishizaki	0312b9f550	[llvm] NFC: Fix trivial typo in rst and td files Differential Revision: https://reviews.llvm.org/D77469	2020-04-23 14:26:32 +09:00
Christopher Tetreault	2dea3f1298	[SVE] Add new VectorType subclasses Summary: Introduce new types for fixed width and scalable vectors. Does not remove getNumElements yet so as to not break code during transition period. Reviewers: deadalnix, efriedma, sdesmalen, craig.topper, huntergr Reviewed By: sdesmalen Subscribers: jholewinski, arsenm, jvesely, nhaehnle, mehdi_amini, rriddle, jpienaar, burmako, shauheen, antiagainst, nicolasvasilache, csigg, arpith-jacob, mgester, lucyrfox, liufengdb, kerbowa, Joonsoo, grosul1, frgossen, lldb-commits, tschuett, hiraditya, rkruppe, psnobl, llvm-commits Tags: #llvm, #lldb Differential Revision: https://reviews.llvm.org/D77587	2020-04-22 08:59:01 -07:00
Jay Foad	dbdffe3ee9	[AMDGPU] Add 192-bit register classes Differential Revision: https://reviews.llvm.org/D78312	2020-04-22 13:10:37 +01:00
Jay Foad	d625b4b081	[AMDGPU] Add missing AReg classes Add 96-bit, 160-bit and 256-bit AReg classes to match VReg and SReg. NFC as far as I know, but it may avoid weird legalization problems. Differential Revision: https://reviews.llvm.org/D78348	2020-04-22 13:10:37 +01:00
Jay Foad	7318625674	[AMDGPU] Remove obsolete special case for 1024-bit vector types. NFC.	2020-04-22 09:05:24 +01:00
Jay Foad	2fa17cdd7a	[AMDGPU] Simplify definition of VReg and AReg classes. NFC. Differential Revision: https://reviews.llvm.org/D78553	2020-04-22 08:59:28 +01:00
Matt Arsenault	7dece2fde3	AMDGPU: Use Register	2020-04-21 15:19:35 -04:00
Jay Foad	658f33dcea	[AMDGPU] Remove selectSGPRVectorRegClassID. NFC. This was yet another function that had to be updated whenever you added a new register class. Remove it by refactoring its only caller to use standard helper functions from SIRegisterInfo. Differential Revision: https://reviews.llvm.org/D78557	2020-04-21 16:29:21 +01:00
Shengchen Kan	8bb059ab63	[MC][Bugfix] Remove redundant parameter for relaxInstruction Summary: Before this patch, `relaxInstruction` takes three arguments, the first argument refers to the instruction before relaxation and the third argument is the output instruction after relaxation. There are two quite strange things: 1) The first argument's type is `const MCInst &`, the third argument's type is `MCInst &`, but they may be aliased to the same variable 2) The backends of ARM, AMDGPU, RISC-V, Hexagon assume that the third argument is a fresh uninitialized `MCInst` even if `relaxInstruction` may be called like `relaxInstruction(Relaxed, STI, Relaxed)` in a loop. In this patch, we drop the thrid argument, and let `relaxInstruction` directly modify the given instruction. Also, this patch fixes the bug https://bugs.llvm.org/show_bug.cgi?id=45580, which is introduced by D77851, and breaks the assumption of ARM, AMDGPU, RISC-V, Hexagon. Reviewers: Razer6, MaskRay, jyknight, asb, luismarques, enderby, rtaylor, colinl, bcain Reviewed By: Razer6, MaskRay, bcain Subscribers: bcain, nickdesaulniers, nathanchance, wuzish, annita.zhang, arsenm, dschuff, jyknight, dylanmckay, sdardis, nemanjai, jvesely, nhaehnle, tpr, sbc100, jgravelle-google, kristof.beyls, hiraditya, aheejin, kbarton, fedor.sergeev, asb, rbar, johnrusso, simoncook, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, PkmX, jocewei, Jim, lenary, s.egerton, pzheng, sameer.abuasal, apazos, luismarques, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D78364	2020-04-21 11:06:55 +08:00
Piotr Sobczak	c48ceaf37b	Revert "[AMDGPU] Set the CostPerUse value for vgpr registers." This reverts commit `728b878de6`. D76417 has caused vgpr count to go up significantly in real-world graphics content.	2020-04-20 22:47:31 +02:00
Sam Parker	e3056ae9a0	[NFC][TTI] Explicit use of VectorType The API for shuffles and reductions uses generic Type parameters, instead of VectorType, and so assertions and casts are used a lot. This patch makes those types explicit, which means that the clients can't be lazy, but results in less ambiguity, and that can only be a good thing. Bugzilla: https://bugs.llvm.org/show_bug.cgi?id=45562 Differential Revision: https://reviews.llvm.org/D78357	2020-04-20 09:16:52 +01:00
Craig Topper	744eaa7a3f	[CallSite removal][AMDGPU] Use CallBase instead of CallSite in AMDGPUFixFunctionBitcasts. NFC	2020-04-19 15:21:02 -07:00
Florian Hahn	a7aaadc135	[TTI] Clean up includes (NFC). Remove some unnecessary includes, replace some with forward declarations. This also exposed a few places that were missing some includes.	2020-04-19 20:11:59 +01:00
Matt Arsenault	f463792506	AMDGPU: Remove custom node for RSQ_LEGACY Directly select from the intrinsic. This wasn't getting much value from the custom node.	2020-04-17 19:50:36 -04:00
Stanislav Mekhanoshin	992fbce4e9	[AMDGPU] copyPhysReg() for 16 bit SGPR subregs Differential Revision: https://reviews.llvm.org/D78255	2020-04-17 11:59:39 -07:00
Stanislav Mekhanoshin	fde2aefa22	[AMDGPU] Use SDWA for 16 bit subreg copy This simplifies the logic and allows to use it on GFX8. Differential Revision: https://reviews.llvm.org/D78150	2020-04-17 11:45:44 -07:00
Dominik Montada	55e3a7c6b2	[GlobalISel][AMDGPU] add legalization for G_FREEZE Summary: Copy the legalization rules from SelectionDAG: -widenScalar using anyext -narrowScalar using intermediate merges -scalarize/fewerElements using unmerge -moreElements using G_IMPLICIT_DEF and insert Add G_FREEZE legalization actions to AMDGPULegalizerInfo. Use the same legalization actions as G_IMPLICIT_DEF. Depends on D77795. Reviewers: dsanders, arsenm, aqjune, aditya_nandakumar, t.p.northover, lebedev.ri, paquette, aemerson Reviewed By: arsenm Subscribers: kzhuravl, yaxunl, dstuttard, tpr, t-tye, jvesely, nhaehnle, kerbowa, wdng, rovka, hiraditya, volkan, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D78092	2020-04-17 16:44:46 +02:00
Jay Foad	96b61571d0	[AMDGPU] New helper functions to get a register class of a given width Summary: Introduce new helper functions getVGPRClassForBitWidth, getAGPRClassForBitWidth, getSGPRClassForBitWidth and use them to refactor various other functions that all contained their own lists of valid register class widths. NFC. Reviewers: arsenm, rampitec Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D78311	2020-04-17 15:16:57 +01:00
Jay Foad	96712d6ef2	[AMDGPU] Simplify SIRegisterInfo::getRegSplitParts Summary: Use more logic and fewer tables. This reduces the line count and reduces the effort required to introduce more register classes of different sizes in future. Reviewers: arsenm, rampitec, nhaehnle Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, hiraditya, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D78351	2020-04-17 14:37:11 +01:00
Jay Foad	858d8db470	AMDGPU/GlobalISel: Work around another selector crash This does for G_EXTRACT_VECTOR_ELT what `588bd7be36` did for G_TRUNC. Ideally types without a corresponding register class wouldn't reach here, but we're currently missing some (in particular a 192-bit class is missing).	2020-04-17 12:07:54 +01:00
Fraser Cormack	c819ef9653	Provide operand indices to adjustSchedDependency This allows targets to know exactly which operands are contributing to the dependency, which is required for targets with per-operand scheduling models. Differential Revision: https://reviews.llvm.org/D77135	2020-04-17 11:08:44 +01:00
Simon Pilgrim	bcd7f77713	MCObjectWriter.h - remove Endian.h/EndianStream.h/raw_ostream.h includes. NFC Push these includes down to the the writers that actually need them, a number of which were implicitly relying on the MCObjectWriter.h.	2020-04-17 10:44:08 +01:00
Stanislav Mekhanoshin	2e94a64b57	[AMDGPU] Define 16 bit SGPR subregs These are needed as a counterpart for VGPR subregs even though there are no scalar instructions which can operate 16 bit values. When we are materializing a constant that is done into an SGPR and that SGPR may/will be copied into a 16 bit VGPR subreg. Such copy is illegal. There are also similar problems if a source operand of a 16 bit VALU instruction is an SGPR. In addition we need to get a register with a lo16 subregister of an SGPR RC during selection and this fails as well. All of that makes me believe we need these subregisters as a syntactic glue. Differential Revision: https://reviews.llvm.org/D78250	2020-04-16 10:31:39 -07:00
Matt Arsenault	588bd7be36	AMDGPU/GlobalISel: Work around a selector crash Ideally types without a corresponding register class wouldn't reach here, but we're currently missing some (in particular a 192-bit class is missing).	2020-04-15 14:38:50 -04:00
Benjamin Kramer	cc035d475f	Upgrade users of 'new ShuffleVectorInst' to pass indices as an int array No functionality change intended.	2020-04-15 14:29:43 +02:00
Sameer Sahasrabuddhe	8c11bc0cd0	Introduce fix-irreducible pass An irreducible SCC is one which has multiple "header" blocks, i.e., blocks with control-flow edges incident from outside the SCC. This pass converts an irreducible SCC into a natural loop by introducing a single new header block and redirecting all the edges on the original headers to this new block. This is a useful workaround for a limitation in the structurizer which, which produces incorrect control flow in the presence of irreducible regions. The AMDGPU backend provides an option to enable this pass before the structurizer, which may eventually be enabled by default. Reviewed By: nhaehnle Differential Revision: https://reviews.llvm.org/D77198 This restores commit `2ada8e2525`. Originally reverted with commit `44e09b59b8`.	2020-04-15 15:05:51 +05:30
Sameer Sahasrabuddhe	44e09b59b8	Revert "Introduce fix-irreducible pass" This reverts commit `2ada8e2525`. Buildbots produced compilation errors which I was not able to quickly reproduce locally. Need more time to investigate.	2020-04-15 12:19:50 +05:30
Sameer Sahasrabuddhe	2ada8e2525	Introduce fix-irreducible pass An irreducible SCC is one which has multiple "header" blocks, i.e., blocks with control-flow edges incident from outside the SCC. This pass converts an irreducible SCC into a natural loop by introducing a single new header block and redirecting all the edges on the original headers to this new block. This is a useful workaround for a limitation in the structurizer which, which produces incorrect control flow in the presence of irreducible regions. The AMDGPU backend provides an option to enable this pass before the structurizer, which may eventually be enabled by default. Reviewed By: nhaehnle Differential Revision: https://reviews.llvm.org/D77198	2020-04-15 11:29:19 +05:30
Matt Arsenault	cc149172da	AMDGPU/GlobalISel: Fix selection of scalar f64 G_FABS This wasn't covered by existing tablegen patterns, but also suffers the same issues as G_FNEG. Workaround them by manually selecting, like G_FNEG.	2020-04-14 22:05:22 -04:00
Mircea Trofin	447e2c3067	[llvm][NFC][CallSite] Remove Implementation uses of CallSite Reviewers: dblaikie, davidxl, craig.topper Subscribers: arsenm, dschuff, nemanjai, jvesely, nhaehnle, sbc100, jgravelle-google, hiraditya, aheejin, kbarton, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D78142	2020-04-14 14:49:47 -07:00
Mircea Trofin	4aae4e3f48	[llvm][NFC] CallSite removal from inliner-related files Summary: This removes CallSite from inliner files. Some dependencies where thus affected. Reviewers: dblaikie, davidxl, craig.topper Subscribers: arsenm, jvesely, nhaehnle, eraman, hiraditya, aheejin, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D77991	2020-04-13 21:28:58 -07:00
Matt Arsenault	f48fe2c36e	GlobalISel: Fix casted unmerge of G_CONCAT_VECTORS This was assuming a scalarizing unmerge, and would fail assert if the unmerge was to smaller vector types.	2020-04-13 22:03:05 -04:00
Matt Arsenault	0ba40d4ccf	AMDGPU/GlobalISel: Combines for V_CVT_F32_UBYTE[0-3] Ports the existing DAG combines, minus the simplify demanded bits which seems to have no equivalent now. Without these, this isn't particularly helpful in most of the IR sample cases.	2020-04-13 19:18:19 -04:00
Austin Kerbow	a69b3e010c	[AMDGPU][GlobalISel] Fix div_scale in FDIV lowering Differential Revision: https://reviews.llvm.org/D78004	2020-04-13 15:54:49 -07:00
Craig Topper	113f37a1f9	[CallSite removal][TargetLowering] Replace ImmutableCallSite with CallBase Differential Revision: https://reviews.llvm.org/D77995	2020-04-13 13:50:15 -07:00
Matt Arsenault	e6605a209c	DAG: Fix wrong legality check for ISD::FMAD Since `1725f28841`, this should check isFMADLegalForFAddFSub rather than the the plain isOperationLegal. This would assert in a subset of cases due to an oddity in how FMAD is selected. We will allow FMA formation pre-legalize, but not FMAD even in cases where it would be valid. The current hook requires passing in the root fadd/fsub. However, in this distributed case, this would be far more complicated to pass in the relevant operand. AMDGPU doesn't get any value from the node, and only needs the type and is the only implementor, so I'm not sure why we have this complexity. Just rename and expand the assert to avoid the more complicated checks spread through the distribution logic.	2020-04-13 10:25:39 -07:00
Austin Kerbow	eab9a4f119	[AMDGPU] Don't assert on partial exec copy After Machine CSE and coalescing we can end up with copies of exec to subregister SGPRs. Differential Revision: https://reviews.llvm.org/D77992	2020-04-12 21:14:36 -07:00
Craig Topper	95192f548d	[CallSite removal][TargetLowering] Use CallBase instead of CallSite in TargetLowering::ParseConstraints interface. Differential Revision: https://reviews.llvm.org/D77929	2020-04-12 11:26:25 -07:00
Mircea Trofin	d2f1cd5d97	[llvm][NFC] Refactor uses of CallSite to CallBase - call promotion Summary: Updated CallPromotionUtils and impacted sites. Parameters that are expected to be non-null, and return values that are guranteed non-null, were replaced with CallBase references rather than pointers. Left FIXME in places where more changes are facilitated by CallBase, but aren't CallSites: Instruction* parameters or return values, for example, where the contract that they are actually CallBase values. Reviewers: davidxl, dblaikie, wmi Reviewed By: dblaikie Subscribers: arsenm, jvesely, nhaehnle, eraman, hiraditya, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D77930	2020-04-12 08:27:29 -07:00
Matt Arsenault	96819011ca	AMDGPU/GlobalISel: Fix RegBankSelect for v2s16 shifts These need to be promoted and scalarized for the SALU.	2020-04-11 20:55:33 -04:00
Matt Arsenault	ac8d51a3c6	AMDGPU/GlobalISel: Legalize 16-bit shift amounts to s16 The current selector depends on 16-bit shifts using 16-bit shift amount types, but really it should accept either for all types.	2020-04-11 18:12:26 -04:00
Matt Arsenault	c5497e5399	AMDGPU/GlobalISel: Fix legalizing <3 x s16> vselects	2020-04-11 15:59:51 -04:00
Fangrui Song	d2e5157c1f	[MC] Add UseIntegratedAssembler = false. NFC	2020-04-11 10:13:49 -07:00
Matt Arsenault	cf29333f40	AMDGPU/GlobalISel: Work around forming illegal zextload after legalize Selection would fail after the post legalize combiner put an illegal zextload back together. The base combiner has parameter to only allow legal operations, but they appear to not be used. I also don't see a nice way to remove a single entry from all_combines, so just hack around this.	2020-04-11 10:52:58 -04:00
Stanislav Mekhanoshin	44920e8566	[AMDGPU] Disable sub-dword scralar loads IR widening These will be widened in the DAG. In the meanwhile early widening prevents otherwise possible vectorization of such loads. Differential Revision: https://reviews.llvm.org/D77835	2020-04-10 08:20:49 -07:00
Michael Liao	b54b4ecac3	Fix `-Wextra` warning. NFC.	2020-04-10 03:22:02 -04:00
Christopher Tetreault	e634f482ea	Clean up usages of asserting vector getters in Type Summary: Remove usages of asserting vector getters in Type in preparation for the VectorType refactor. The existence of these functions complicates the refactor while adding little value. Reviewers: arsenm, efriedma, sdesmalen Reviewed By: arsenm Subscribers: wdng, arsenm, jvesely, nhaehnle, hiraditya, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D77268	2020-04-09 13:11:37 -07:00
Jay Foad	4970a1deca	[AMDGPU] Remove outdated comment	2020-04-09 10:36:00 +01:00
Matt Arsenault	0aa0d70067	MIR: Use Register	2020-04-08 22:07:26 -04:00
Stanislav Mekhanoshin	f96810ff34	[AMDGPU] Expand vector trunc stores from i16 to i8 Differential Revision: https://reviews.llvm.org/D77693	2020-04-07 21:47:45 -07:00
Matt Arsenault	6011627f51	CodeGen: More conversions to use Register	2020-04-07 18:54:36 -04:00
Stanislav Mekhanoshin	96e51ed005	[AMDGPU] Implement copyPhysReg for 16 bit subregs Differential Revision: https://reviews.llvm.org/D74937	2020-04-07 14:22:46 -07:00
Matt Arsenault	2481f26ac3	CodeGen: Use Register in TargetFrameLowering	2020-04-07 17:07:44 -04:00
Graham Sellers	a19a56f6a1	[AMDGPU] Extend constant folding for logical operations This patch extends existing constant folding in logical operations to handle S_XNOR, S_NAND, S_NOR, S_ANDN2, S_ORN2, V_LSHL_ADD_U32 and V_AND_OR_B32. Also added a couple of tests for existing folds.	2020-04-07 14:37:16 -04:00
Matt Arsenault	f596ab4066	AMDGPU: Use early return	2020-04-07 13:48:00 -04:00
Stanislav Mekhanoshin	12a324393d	[AMDGPU] Limit endcf-collapase to simple if We can only collapse adjacent SI_END_CF if outer statement belongs to a simple SI_IF, otherwise correct mask is not in the register we expect, but is an argument of an S_XOR instruction. Even if SI_IF is simple it might be lowered using S_XOR because lowering is dependent on a basic block layout. It is not considered simple if instruction consuming its output is not an SI_END_CF. Since that SI_END_CF might have already been lowered to an S_OR isSimpleIf() check may return false. This situation is an opportunity for a further optimization of SI_IF lowering, but that is a separate optimization. In the meanwhile move SI_END_CF post the lowering when we already know how the rest of the CFG was lowered since a non-simple SI_IF case still needs to be handled. Differential Revision: https://reviews.llvm.org/D77610	2020-04-07 10:27:23 -07:00
Eli Friedman	68b03aee1a	Remove SequentialType from the type heirarchy. Now that we have scalable vectors, there's a distinction that isn't getting captured in the original SequentialType: some vectors don't have a known element count, so counting the number of elements doesn't make sense. In some cases, there's a better way to express the commonality using other methods. If we're dealing with GEPs, there's GEP methods; if we're dealing with a ConstantDataSequential, we can query its element type directly. In the relatively few remaining cases, I just decided to write out the type checks. We're talking about relatively few places, and I think the abstraction doesn't really carry its weight. (See thread "[RFC] Refactor class hierarchy of VectorType in the IR" on llvmdev.) Differential Revision: https://reviews.llvm.org/D75661	2020-04-06 17:03:49 -07:00
Konstantin Pyzhov	72e8754916	[AMDGPU] Disable 'Skip Uniform Regions' optimization by default for AMDGPU. Reviewers: sameerds, dstuttard Differential Revision: https://reviews.llvm.org/D77228	2020-04-06 09:05:58 -04:00
Matt Arsenault	869f05c834	AMDGPU: Remove dead paths for requiresUniformRegister The extracts from control flow intrinsics are already properly handled by divergence analysis. The inline asm case isn't dead, but has also never really worked correctly so leave it as-is for now.	2020-04-06 16:15:10 -04:00
Konstantin Pyzhov	51dc028314	Revert `e1730cfeb3`	2020-04-06 05:56:11 -04:00
Konstantin Pyzhov	e1730cfeb3	[AMDGPU] Disable 'Skip Uniform Regions' optimization by default for AMDGPU. Reviewers: sameerds, dstuttard Differential Revision: https://reviews.llvm.org/D77228	2020-04-06 05:10:37 -04:00
Matt Arsenault	8a5f0dafd4	AMDGPU/GlobalISel: Select llvm.amdgcn.div.scale	2020-04-06 11:50:19 -04:00
Matt Arsenault	e87ec66762	AMDGPU/GlobalISel: Fix llvm.amdgcn.div.fmas.ll	2020-04-06 11:50:16 -04:00
Jay Foad	ddd2f4b96f	[AMDGPU] Fix inaccurate comments	2020-04-06 16:44:08 +01:00
Matt Arsenault	cbf719b568	AMDGPU: Use DAG patterns for div_fmas	2020-04-06 09:28:30 -04:00
Matt Arsenault	79b29d6df7	AMDGPU: Remove DisableInst feature I'm not sure why these were bothering to check the instruction profile, since those profiles should only be used with these instruction classes.	2020-04-06 09:27:44 -04:00
Apelete Seketeli	8aadb442d1	[scan-build] fix dead store warnings emitted on LLVM AMDGPU code base This fixes dead store warnings of the type "dead assignment" reported by Clang Static Analyzer.	2020-04-05 11:19:03 -04:00
Matt Arsenault	6bfe28e92f	AMDGPU: Fix annotate kernel features through casted calls I thought I was testing this before, but the workitem id x case isn't great since it's mandatory in the parent kernel.	2020-04-04 20:44:44 -04:00
Matt Arsenault	221890d709	AMDGPU: Add feature for fast f32 denormals	2020-04-04 20:01:24 -04:00
Matt Arsenault	30ebafaa56	CodeGen: Convert some TII hooks to use Register	2020-04-03 14:52:54 -04:00
Matt Arsenault	178050c3ba	AMDGPU: Use Register in more places	2020-04-03 14:52:54 -04:00
Matt Arsenault	e8dcb6d05e	AMDGPU: Remove redundant virtual	2020-04-03 14:52:53 -04:00
Stanislav Mekhanoshin	0462795095	[AMDGPU] Propagate AGPR RC from PHI to its PHI operands We can fix register class of PHI based on its all AGPR uses. That leaves behind all PHIs which were already processed earlier. Propagate RC back to PHI operands of a PHI. Differential Revision: https://reviews.llvm.org/D77344	2020-04-03 11:23:02 -07:00
Austin Kerbow	30f18ed387	[AMDGPU] Handle SMRD signed offset immediate Summary: This fixes a few issues related to SMRD offsets. On gfx9 and gfx10 we have a signed byte offset immediate, however we can overflow into a negative since we treat it as unsigned. Also, the SMRD SOFFSET sgpr is an unsigned offset on all subtargets. We sometimes tried to use negative values here. Third, S_BUFFER instructions should never use a signed offset immediate. Differential Revision: https://reviews.llvm.org/D77082	2020-04-02 17:41:52 -07:00
Matt Arsenault	f68cc2a7ed	AMDGPU: Use 128-bit DS operations by default	2020-04-02 17:17:47 -04:00
Matt Arsenault	5660bb6bc9	AMDGPU: Remove denormal subtarget features Switch to using the denormal-fp-math/denormal-fp-math-f32 attributes.	2020-04-02 17:17:12 -04:00
Matt Arsenault	75cf30918f	AMDGPU: Assume f32 denormals are enabled by default This will likely introduce catastrophic performance regressions on older subtargets, but should be correct. A follow up change will remove the old fp32-denormals subtarget features, and switch to using the new denormal-fp-math/denormal-fp-math-f32 attributes. Frontends should be making sure to add the denormal-fp-math-f32 attribute when appropriate to avoid performance regressions.	2020-04-02 17:17:12 -04:00
Matt Arsenault	c3d3c22a58	AMDGPU: Hack out noinline on functions using LDS globals This is a workaround for clang adding noinline to all functions at -O0. Previously, we would just add alwaysinline, and the verifier would complain about having both noinline and alwaysinline. We currently can't truly codegen this case as a freestanding function, so override the user forcing noinline.	2020-04-02 14:12:07 -04:00
Stanislav Mekhanoshin	f2334a7ef2	[AMDGPU] Fix crash in SILoadStoreOptimizer SILoadStoreOptimizer::checkAndPrepareMerge() expects base and paired instruction to come in order and scans MBB from base to the paired instruction. An original order can be changed if there were a dependent instruction in between and base instruction was moved. Fixed by bailing the optimization. In theory it might be possible still to perform a merge by swapping instructions, but on practice it bails anyway because it finds dependency on that same instruction which has resulted in the base move. Differential Revision: https://reviews.llvm.org/D77245	2020-04-02 10:26:47 -07:00
Guillaume Chatelet	189d2e215f	[Alignment][NFC] Use more Align versions of various functions Summary: This is patch is part of a series to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790 Reviewers: courbet Subscribers: MatzeB, qcolombet, arsenm, sdardis, jvesely, nhaehnle, hiraditya, jrtc27, atanasyan, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D77291	2020-04-02 09:00:53 +00:00
Matt Arsenault	5e4e8d0388	AMDGPU/GlobalISel: Change intrinsic ID for _L to _LZ opt Still should handle the other case changes the opcode this way.	2020-04-01 13:03:02 -04:00
Guillaume Chatelet	1dffa2550b	[Alignment][NFC] Transition to MachineFrameInfo::getObjectAlign() Summary: This is patch is part of a series to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790 Reviewers: courbet Subscribers: arsenm, sdardis, nemanjai, jvesely, nhaehnle, hiraditya, kbarton, jrtc27, atanasyan, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D77215	2020-04-01 14:08:28 +00:00
Simon Pilgrim	be7a233e93	Fix operator precedence warning. NFCI.	2020-04-01 14:36:52 +01:00
Simon Pilgrim	552e46ea1e	Fix unused variable warnings. NFCI.	2020-04-01 14:36:51 +01:00
Matt Arsenault	43e576593e	AMDGPU/GlobalISel: Fix insert point when lowering G_FMAD	2020-03-31 19:57:06 -04:00
Eli Friedman	1ee6ec2bf3	Remove "mask" operand from shufflevector. Instead, represent the mask as out-of-line data in the instruction. This should be more efficient in the places that currently use getShuffleVector(), and paves the way for further changes to add new shuffles for scalable vectors. This doesn't change the syntax in textual IR. And I don't currently plan to change the bitcode encoding in this patch, although we'll probably need to do something once we extend shufflevector for scalable types. I expect that once this is finished, we can then replace the raw "mask" with something more appropriate for scalable vectors. Not sure exactly what this looks like at the moment, but there are a few different ways we could handle it. Maybe we could try to describe specific shuffles. Or maybe we could define it in terms of a function to convert a fixed-length array into an appropriate scalable vector, using a "step", or something like that. Differential Revision: https://reviews.llvm.org/D72467	2020-03-31 13:08:59 -07:00
Stanislav Mekhanoshin	08682dcc86	[AMDGPU] Define 16 bit VGPR subregs We have loads preserving low and high 16 bits of their destinations. However, we always use a whole 32 bit register for these. The same happens with 16 bit stores, we have to use full 32 bit register so if high bits are clobbered the register needs to be copied. One example of such code is added to the load-hi16.ll. The proper solution to the problem is to define 16 bit subregs and use them in the operations which do not read another half of a VGPR or preserve it if the VGPR is written. This patch simply defines subregisters and register classes. At the moment there should be no difference in code generation. A lot more work is needed to actually use these new register classes. Therefore, there are no new tests at this time. Register weight calculation has changed with new subregs so appropriate changes were made to keep all calculations just as they are now, especially calculations of register pressure. Differential Revision: https://reviews.llvm.org/D74873	2020-03-31 11:49:06 -07:00
Guillaume Chatelet	c9d5c19597	[Alignment][NFC] Transitionning more getMachineMemOperand call sites Summary: This is patch is part of a series to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790 Reviewers: courbet Subscribers: arsenm, dylanmckay, sdardis, nemanjai, jvesely, nhaehnle, hiraditya, kbarton, jrtc27, atanasyan, Jim, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D77121	2020-03-31 08:36:18 +00:00
Sebastian Neubauer	5d3a69feca	[AMDGPU] New llvm.amdgcn.ballot intrinsic Add a new llvm.amdgcn.ballot intrinsic modeled on the ballot function in GLSL and other shader languages. It returns a bitfield containing the result of its boolean argument in all active lanes, and zero in all inactive lanes. This is intended to replace the existing llvm.amdgcn.icmp and llvm.amdgcn.fcmp intrinsics after a suitable transition period. Use the new intrinsic in the atomic optimizer pass. Differential Revision: https://reviews.llvm.org/D65088	2020-03-31 10:35:39 +02:00
Guillaume Chatelet	0de874adfb	[Alignment][NFC] Transition to inferAlignFromPtrInfo Summary: This is patch is part of a series to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790 Reviewers: courbet Subscribers: arsenm, jvesely, nhaehnle, hiraditya, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D77120	2020-03-31 08:06:49 +00:00
Matt Arsenault	d0dd24a381	AMDGPU/GlobalISel: Fix crashing on weird G_INSERT sources No test since these cases shouldn't really be getting through the legalizer.	2020-03-30 18:14:04 -04:00
Matt Arsenault	db9f0d1ce5	AMDGPU: Form v_cvt_ubyte* with f16 results We get 2 conversion instructions anyway. Previously we would get a conversion with SDWA reading from a byte source, which has a larger encoding.	2020-03-30 17:59:49 -04:00
Matt Arsenault	b27d255e1e	AMDGPU/GlobalISel: Form CVT_F32_UBYTE0	2020-03-30 17:45:55 -04:00
Matt Arsenault	bcb643c8af	AMDGPU/GlobalISel: Handle image atomics	2020-03-30 17:41:04 -04:00
Matt Arsenault	48eda37282	AMDGPU/GlobalISel: Start selecting image intrinsics Does not handled atomics yet.	2020-03-30 17:33:04 -04:00
Matt Arsenault	570a578e46	AMDGPU: Account for dmask when computing image mem size Only the number of elements in the dmask will really be accessed.	2020-03-30 17:30:58 -04:00
Jay Foad	cee65d51fe	AMDGPU: Implement getMemcpyLoopLoweringType Summary: Based on a patch by Matt Arsenault. Reviewers: rampitec, kerbowa, nhaehnle, arsenm Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D77057	2020-03-30 22:21:01 +01:00
Matt Arsenault	2641ba52a9	AMDGPU/GlobalISel: Round up image operations with 5, 6 or 7 addresses The instruction definitions are missing for these register types, so round up to 8 like the DAG.	2020-03-30 17:02:47 -04:00
Matt Arsenault	42d5609809	AMDGPU/GlobalISel: Start handling _L to _LZ optimization We currently don't have a way to map to the equivalent intrinsic opcode, so track immediate 0s in place of the address for the selection to know to change the final opcode.	2020-03-30 17:02:30 -04:00
Matt Arsenault	4919f2e1c5	AMDGPU/GlobalISel: Basic legalize rules for G_FSHR Only handles easy 32-bit cases.	2020-03-30 11:53:01 -07:00
Jakub Kuderski	77ce2e21a8	[AMDGPU] Add Relocation Constant Support Summary: This change adds amdgcn.reloc.constant intrinsic to the amdgpu backend, which will compile into a relocation entry in the resulting elf. The intrinsics takes a MetadataNode (String) as its only argument, which specifies the symbol name of the relocation entry. `SelectionDAGBuilder::getValueImpl` is changed to allow metadata operands passed through to ISel. Author: csyonghe <yonghe@google.com> Reviewers: tpr, nhaehnle Reviewed By: nhaehnle Subscribers: arsenm, kzhuravl, jvesely, wdng, yaxunl, dstuttard, t-tye, hiraditya, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D76440	2020-03-30 13:49:20 -04:00
Sameer Sahasrabuddhe	3cbbded68c	Introduce unify-loop-exits pass. For each natural loop with multiple exit blocks, this pass creates a new block N such that all exiting blocks now branch to N, and then control flow is redistributed to all the original exit blocks. The bulk of the tranformation is a new function introduced in BasicBlockUtils that an redirect control flow from a set of incoming blocks to a set of outgoing blocks via a common "hub". This is a useful workaround for a limitation in the structurizer which incorrectly orders blocks when processing a nest of loops. This pass bypasses that issue by ensuring that each natural loop is recognized as a separate region. Since the structurizer is a region pass, it no longer sees a nest of loops in a single region, and instead processes each "level" in the nesting as a separate region. The AMDGPU backend provides a new option to enable this pass before the structurizer, which may eventually be enabled by default. Reviewers: madhur13490, arsenm, nhaehnle Reviewed By: nhaehnle Differential Revision: https://reviews.llvm.org/D75865	2020-03-30 13:23:56 -04:00
Matt Arsenault	bb009498c2	AMDGPU/GlobalISel: Hack to fix i24 argument lowering I still think the call lowering type legalization logic split between the generic code and target is too confusing, but largely induced by the reliance on the DAG infrastructure.	2020-03-30 11:00:45 -04:00
Matt Arsenault	90a36bbd7c	AMDGPU/GlobalISel: Legalize 64-bit G_UDIV/G_UREM Mostly ported from the DAG version. This results in much worse code than the DAG version, largely due to a much worse expansion for G_UMULH.	2020-03-30 10:57:37 -04:00
Florian Hahn	c3b03f3d0c	[AMDGPU] Drop const for value that is copied (NFC). This fixes warning: loop variable 'Def' of type 'const llvm::Register' creates a copy from type 'const llvm::Register' [-Wrange-loop-analysis] llvm::Register just contains a single unsigned and should be copied. Reviewers: rampitec Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D77011	2020-03-30 10:59:59 +01:00
Matt Arsenault	d15723ef06	AMDGPU/GlobalISel: Remove redundant virtual	2020-03-29 14:03:07 -04:00

... 5 6 7 8 9 ...

5260 Commits