llvm-project

Commit Graph

Author	SHA1	Message	Date
Matt Arsenault	c8a9872259	AMDGPU/GlobalISel: Look through copies in getPtrBaseWithConstantOffset We may have an SGPR->VGPR copy if a totally uniform pointer calculation is used for a VGPR pointer operand. Also hack around a bug in MUBUF matching which would incorrectly use MUBUF for global when flat was requested. This should really be a predicate on the parent pattern, but the DAG always checked this manually inside the complex pattern.	2020-08-17 12:31:38 -04:00
Steven Perron	eed6476a87	Reset PAL metadata when AMDGPU traget stream finishes If the same stream object is used for multiple compiles, the PAL metadata from eariler compilations will leak into later one. See https://github.com/GPUOpen-Drivers/llpc/issues/882 for how this is happening in LLPC. No tests were added because multiple compiles will have to happen using the same pass manager, and I do not see a setup for that on the LLVM side. Let me know if there is a good way to test this. Reviewed By: nhaehnle Differential Revision: https://reviews.llvm.org/D85667	2020-08-17 10:56:11 -04:00
Matt Arsenault	c7b9cd31bf	AMDGPU/GlobalISel: Fix missing 256-bit AGPR mapping	2020-08-17 09:53:26 -04:00
Matt Arsenault	af162ac785	AMDGPU/GlobalISel: Fix using readfirstlane with ballot intrinsics This should use the default mapping and insert a copy to the vcc bank, and not try to insert a readfirstlane.	2020-08-17 09:53:25 -04:00
Matt Arsenault	da3f357de6	AMDGPU: Don't look at dbg users for foldable operands These would have always failed to fold, so checking them or adding them to the fold candidates is useless.	2020-08-17 09:53:25 -04:00
Matt Arsenault	66ffa0e91f	AMDGPU/GlobalISel: Fix using post-legal combiner without LegalizerInfo	2020-08-17 09:19:22 -04:00
Matt Arsenault	e0375dbcb3	AMDGPU: Fix using wrong offsets for global atomic fadd intrinsics Global instructions have the signed offsets.	2020-08-17 09:19:15 -04:00
Matt Arsenault	f0af434b79	AMDGPU: Remove register class params from flat memory patterns	2020-08-15 12:12:33 -04:00
Matt Arsenault	a7455652c0	AMDGPU: Fix global atomic saddr operand class	2020-08-15 12:12:28 -04:00
Matt Arsenault	625db2fe5b	AMDGPU: Remove slc from flat offset complex patterns This was always set to 0. Use a default value of 0 in this context to satisfy the instruction definition patterns. We can't unconditionally use SLC with a default value of 0 due to limitations in TableGen's handling of defaulted operands when followed by non-default operands.	2020-08-15 12:12:24 -04:00
Matt Arsenault	e5077b5c2a	AMDGPU: Fix matching wrong offsets for global atomic loads These used signed offsets with a different size.	2020-08-15 12:12:17 -04:00
Matt Arsenault	8cb022982a	AMDGPU: Remove redundant FLAT complex patterns These were identical to the non-atomic cases. I'm not sure why these were ever separated.	2020-08-15 12:12:01 -04:00
Matt Arsenault	47af1ac69a	AMDGPU: Correct definitions for global saddr instructions The VGPR component is a 32-bit offset, not 64-bits. I'm not sure what the correct syntax is for this. This maintains the vaddr position and leaves saddr in the end "off" position. This is particularly terrible for stores, since the operand order is now <vgpr offset>, <data>, <sgpr base>, splitting the pointer operands. I suppose this is a logical consequence from the mistake of not putting the data operand first. I'm not sure what sp3 does.	2020-08-15 12:11:57 -04:00
Matt Arsenault	79298a5067	AMDGPU: Remove SIFixupVectorISel pass This was only used for matching the saddr addressing mode of global instructions, but this was not implemented correctly. The instruction definitions aren't even correct, and are defined as using a 64-bit VGPR component. Eliminate this pass to enable correcting the instruction definitions. A new matching implementation can work in GlobalISel or relying on DAG divergence information for the base address.	2020-08-15 12:11:51 -04:00
Stanislav Mekhanoshin	43a38dc251	[AMDGPU] Fix MAI ld/st hazard handling It did not process hazard for ds_permute because it does not load or store even though it is DS. Differential Revision: https://reviews.llvm.org/D86003	2020-08-14 17:07:37 -07:00
Craig Topper	c7a0b2684f	[X86][MC][Target] Initial backend support a tune CPU to support -mtune This patch implements initial backend support for a -mtune CPU controlled by a "tune-cpu" function attribute. If the attribute is not present X86 will use the resolved CPU from target-cpu attribute or command line. This patch adds MC layer support a tune CPU. Each CPU now has two sets of features stored in their GenSubtargetInfo.inc tables . These features lists are passed separately to the Processor and ProcessorModel classes in tablegen. The tune list defaults to an empty list to avoid changes to non-X86. This annoyingly increases the size of static tables on all target as we now store 24 more bytes per CPU. I haven't quantified the overall impact, but I can if we're concerned. One new test is added to X86 to show a few tuning features with mismatched tune-cpu and target-cpu/target-feature attributes to demonstrate independent control. Another new test is added to demonstrate that the scheduler model follows the tune CPU. I have not added a -mtune to llc/opt or MC layer command line yet. With no attributes we'll just use the -mcpu for both. MC layer tools will always follow the normal CPU for tuning. Differential Revision: https://reviews.llvm.org/D85165	2020-08-14 15:31:50 -07:00
Matt Arsenault	40a142fa57	AMDGPU/GlobalISel: Match andn2/orn2 for more types Unfortunately this ends up not working as expected on targets with 16-bit operations due to AMDGPUCodeGenPrepare's promotion of uniform 16-bit ops to i32. The vector case annoyingly requires switching the checked opcode, since constants for vectors aren't directly handled. I also need to think more carefully about whether this is valid for i1.	2020-08-14 13:18:03 -04:00
Sebastian Neubauer	9aa0ff77bd	[AMDGPU] Enable .rodata for amdpal os PAL recently got support for multiple ELF sections and relocations, therefore we can now use .rodata sections instead of forcing constants into .text. Differential Revision: https://reviews.llvm.org/D85895	2020-08-14 09:05:48 +02:00
Austin Kerbow	7d1cb187fb	[AMDGPU] Fix FP/BP spills when MUBUF constant offset exceeded If we need a scratch register for the spill don't use the same scratch register that is being used for the MBUF offset. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D85772	2020-08-13 14:12:00 -07:00
Stanislav Mekhanoshin	0462aef5f3	[AMDGPU] Inhibit SDWA if target instruction has FI Differential Revision: https://reviews.llvm.org/D85918	2020-08-13 11:34:28 -07:00
Stanislav Mekhanoshin	d25cb5a8a2	[AMDGPU] Fix misleading SDWA verifier error. NFC. The old error from GFX9 shall be updated to GFX9+.	2020-08-13 11:32:17 -07:00
Carl Ritson	d538c5837a	[AMDGPU] Fix missed SI_RETURN_TO_EPILOG in pre-emit peephole SIPreEmitPeephole does not process all terminators, which means it can fail to handle SI_RETURN_TO_EPILOG if immediately preceeded by a branch to the early exit block. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D85872	2020-08-13 21:52:41 +09:00
Ruiling Song	18b1e67523	[AMDGPU] Fix crash when dag-combining bitcast From the code after the 'break', they are processing 64bit scalar and vector bitcast. So I think the break-condition should be (cond1 \|\| cond2) This means we only execute following code if (64bit and dest-is-vector). Also remove a previous fix which is not needed with this new fix. (introduced in: `1349a04ef5`) Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D85804	2020-08-13 10:23:13 +08:00
Matt Arsenault	e14474a39a	AMDGPU/GlobalISel: Select llvm.amdgcn.global.atomic.fadd Remove the intermediate transform in the DAG path. I believe this is the last non-deprecated intrinsic that needs handling.	2020-08-12 10:04:53 -04:00
Matt Arsenault	701228c411	AMDGPU: Handle intrinsics in performMemSDNodeCombine This avoids a possible regression in a future patch	2020-08-12 10:04:53 -04:00
Matt Arsenault	0dc4c36d3a	AMDGPU/GlobalISel: Manually select llvm.amdgcn.writelane Fixup the special case constant bus handling pre-gfx10.	2020-08-11 11:56:16 -04:00
Matt Arsenault	076305568c	AMDGPU/GlobalISel: Prepare for more custom load lowerings Slight restructuring of the code to avoid formatting changes when more cases are handled here.	2020-08-11 11:09:05 -04:00
Matt Arsenault	e2f1b48f86	GlobalISel: Implement bitcast action for G_INSERT_VECTOR_ELT This mirrors the support for the equivalent extracts. This also creates a huge mess that would be greatly improved if we had any bit operation combines.	2020-08-11 10:39:14 -04:00
Matt Arsenault	53f21e0fb7	TableGen/GlobalISel: Hack the operand order for atomic_store ISD::ATOMIC_STORE arbitrarily has the operands in the opposite order from regular ISD::STORE, which always introduced an annoying duplication of patterns to handle both cases. Since in GlobalISel there's just the one G_STORE, we need to swap the operands to correctly emit the type check for the pointer operand. Some work started in `20aafa3156` to migrate SelectionDAG to use ISD::STORE for atomics, but that work seems to have stalled. Since this is the pretty much the last operation which matters which isn't supported for AMDGPU, use this compatibility hack to unblock declaring it functionally complete. Not sure what's going on with the pending_phis AArch64 test. It seems it didn't always use atomics, and I'm not sure what it was originally testing matters anymore.	2020-08-11 10:22:44 -04:00
Kerry McLaughlin	85c7e89f3b	[CodeGen] Refactor getMemBasePlusOffset & getObjectPtrOffset to accept a TypeSize Changes the Offset arguments to both functions from int64_t to TypeSize & updates all uses of the functions to create the offset using TypeSize::Fixed() Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D85220	2020-08-11 12:17:10 +01:00
Matt Arsenault	6fe6b29c29	AMDGPU: Fix assertion in performSHLPtrCombine for 64-bit pointers	2020-08-10 13:46:52 -04:00
Matt Arsenault	68fab44acf	AMDGPU: Fix visiting physreg dest users when folding immediate copies This can fold the immediate into the physical destination, but this should not look for further users of the register. Fixes regression introduced by `766cb615a3`.	2020-08-10 13:46:51 -04:00
Matt Arsenault	40188f807d	AMDGPU/GlobalISel: Don't try to handle undef source operand This is now illegal MIR	2020-08-10 08:49:43 -04:00
Matt Arsenault	a0ec81f70d	AMDGPU/GlobalISel: Merge load/store select cases	2020-08-10 08:46:26 -04:00
Matt Arsenault	c8b17874e5	AMDGPU/GlobalISel: Fix typo	2020-08-10 08:41:17 -04:00
Matt Arsenault	9533f0ea68	AMDGPU/GlobalISel: Use nicer form of buildInstr	2020-08-10 08:41:07 -04:00
Petar Avramovic	0d58d9e8fb	AMDGPU/GlobalISel: Lower G_FREM Add custom lower for G_FREM. Differential Revision: https://reviews.llvm.org/D84324	2020-08-10 10:10:46 +02:00
Piotr Sobczak	62d8b8a225	Fix 64-bit copy to SCC Fix 64-bit copy to SCC by restricting the pattern resulting in such a copy to subtargets supporting 64-bit scalar compare, and mapping the copy to S_CMP_LG_U64. Before introducing the S_CSELECT pattern with explicit SCC (`0045786f14`), there was no need for handling 64-bit copy to SCC ($scc = COPY sreg_64). The proposed handling to read only the low bits was however based on a false premise that it is only one bit that matters, while in fact the copy source might be a vector of booleans and all bits need to be considered. The practical problem of mapping the 64-bit copy to SCC is that the natural instruction to use (S_CMP_LG_U64) is not available on old hardware. Fix it by restricting the problematic pattern to subtargets supporting the instruction (hasScalarCompareEq64). Differential Revision: https://reviews.llvm.org/D85207	2020-08-09 20:50:30 +02:00
Matt Arsenault	3c0597a9e4	AMDGPU: Avoid explicitly listing all the memory nodes	2020-08-07 19:22:46 -04:00
Vang Thao	04bd5b5286	[AMDGPU] Fix not rescheduling without clustering Regions are sometimes skipped which should be rescheduled without memory op clustering. RegionIdx is not incremented when iterating over regions that are flagged to be skipped, causing the index to be incorrect. Thanks to Vang Thao for discovering this bug! Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D85498	2020-08-07 11:15:58 -07:00
Bevin Hansson	5de6c56f7e	[Intrinsic] Add sshl.sat/ushl.sat, saturated shift intrinsics. Summary: This patch adds two intrinsics, llvm.sshl.sat and llvm.ushl.sat, which perform signed and unsigned saturating left shift, respectively. These are useful for implementing the Embedded-C fixed point support in Clang, originally discussed in http://lists.llvm.org/pipermail/llvm-dev/2018-August/125433.html and http://lists.llvm.org/pipermail/cfe-dev/2018-May/058019.html Reviewers: leonardchan, craig.topper, bjope, jdoerfert Subscribers: hiraditya, jdoerfert, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D83216	2020-08-07 15:09:24 +02:00
Matt Arsenault	87b2af8140	AMDGPU/GlobalISel: Enable s_{and\|or}n2_{b32\|b64} patterns	2020-08-06 18:00:38 -04:00
dfukalov	4ccc38813e	[AMDGPU][CostModel] Add f16, f64 and contract cases to fused costs estimation. Add cases of fused fmul+fadd/fsub with f16 and f64 operands to cost model. Also added operations with contract attribute. Fixed line endings in test. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D84995	2020-08-06 21:43:27 +03:00
Matt Arsenault	e00201539f	GlobalISel: Implement fewerElementsVector for G_EXTRACT_VECTOR_ELT Use the same basic strategy as LegalizeVectorTypes. Try to index into smaller pieces if there's a constant index, and otherwise fall back to a stack temporary.	2020-08-06 14:33:16 -04:00
Matt Arsenault	1a0c0944c6	AMDGPU: Define raw/struct variants of buffer atomic fadd Somehow the new FP atomic buffer intrinsics ended up using the legacy style for buffer intrinsics.	2020-08-06 13:36:19 -04:00
Matt Arsenault	90eb7d5283	AMDGPU: Fix spilling of 96-bit AGPRs	2020-08-06 12:42:07 -04:00
Matt Arsenault	56270d1d42	AMDGPU/GlobalISel: Start trying to handle AGPR bank Try to use AGPR banks for the various merge/unmerge type operations. Previously these would introduce copies to VGPR.	2020-08-06 12:39:50 -04:00
Matt Arsenault	34040a4f61	GlobalISel: Define InvalidRegBankID enum value	2020-08-06 12:39:49 -04:00
Matt Arsenault	63cdc9a49f	AMDGPU/GlobalISel: Handle llvm.amdgcn.ds.{fadd\|fmin\|fmax} These intrinsics are missing mangling for both the pointer and data type.	2020-08-06 11:09:08 -04:00
Matt Arsenault	63c4be53cf	AMDGPU/GlobalISel: Try to promote to use packed saturating add/sub This produces worse results right now for i8 vectors, but that should be addressed when we actually try to optimize packed vectors.	2020-08-06 11:08:45 -04:00
Matt Arsenault	dcf3ffb0a8	AMDGPU/GlobalISel: Move frame index selection to patterns Doesn't really save any code until global value is handled too.	2020-08-06 10:42:15 -04:00
Matt Arsenault	d188a608bd	AMDGPU: Fix code duplication between the selectors Not sure this is the right place for this helper.	2020-08-06 10:42:15 -04:00
Matt Arsenault	5a503521e7	AMDGPU/GlobalISel: Implement expansion for rsq.clamp Not sure why we handle this removed instruction on newer subtargets for this one and no others, but maintain compatibility with the DAG.	2020-08-06 10:23:25 -04:00
Matt Arsenault	c015cbc68b	AMDGPU/GlobalISel: Fix trying to widen <3 x s1> boolean ops	2020-08-06 10:07:22 -04:00
Matt Arsenault	28124a0a63	AMDGPU/GlobalISel: Stop using G_EXTRACT in argument lowering We really need to put this undef padding stuff into a helper somewhere, but leave that for when this is moved to generic code.	2020-08-06 09:55:35 -04:00
Matt Arsenault	6c7f640bf7	AMDGPU/GlobalISel: Implement LLT version of allowsMisalignedMemoryAccesses	2020-08-06 09:50:36 -04:00
Matt Arsenault	37894ba661	AMDGPU/GlobalISel: Make s16 phi legal If we were to have an operation with an s16 def that needs to be executed in a waterfall loop, not having s16 legal would place an avoidable burden on RegBankSelect to widen it.	2020-08-06 09:41:14 -04:00
Matt Arsenault	5316256709	AMDGPU/GlobalISel: Fix assert on copy to vcc This was trying to constrain a physical register. By the verifier's understanding, it's impossible to have a 1-bit copy to vcc/vcc_lo so don't try to handle physregs.	2020-08-06 09:41:14 -04:00
Matt Arsenault	0ee1eba581	AMDGPU: Remove ATOMIC_PK_FADD The f32 and v2f16 cases should be handled the same way.	2020-08-05 22:00:52 -04:00
Ruiling Song	5ddc8b49ba	[AMDGPU] add buffer_atomic_swap for float The functionality is used when calling imageAtomicExhange() on float type imageBuffer in Graphics shaders. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D85187	2020-08-06 09:45:48 +08:00
Stanislav Mekhanoshin	0bcda1a261	[AMDGPU] Scavenge temp reg for AGPR spill Differential Revision: https://reviews.llvm.org/D85234	2020-08-05 13:29:19 -07:00
Matt Arsenault	ec8c172d01	AMDGPU: Correct prolog SP initialization logic Having callees that will read SP is not the only reason we need to reference the stack pointer.	2020-08-05 15:47:53 -04:00
Stanislav Mekhanoshin	ea7d0e2996	[AMDGPU] gfx1031 target Differential Revision: https://reviews.llvm.org/D85337	2020-08-05 12:36:26 -07:00
Matt Arsenault	83eaf5d55d	AMDGPU: Eliminate BUFFER_ATOMIC_PK_ADD_F16 node This is redundant with the other no return buffer atomic node, and we don't really need a separate type profile for it.	2020-08-05 15:16:51 -04:00
Matt Arsenault	43c0c9252a	AMDGPU: Refactor buffer atomic intrinsic lowering Move raw/struct buffer atomic lowering to separate functions. This avoids a long nested switch, and simplifies a future patch.	2020-08-05 14:44:55 -04:00
Matt Arsenault	3e52667433	AMDGPU: Fix verifier error with undef source producing s_bitset* This needs to preserve the undef flag.	2020-08-05 14:42:20 -04:00
Jay Foad	8cbf4a17ac	[AMDGPU] Propagate fast math flags in frem lowering Differential Revision: https://reviews.llvm.org/D84518	2020-08-05 09:09:38 +01:00
Jay Foad	04cf4a5a65	[AMDGPU] Lower frem f16 Without this it would fail to select on subtargets that have 16-bit instructions. Differential Revision: https://reviews.llvm.org/D84517	2020-08-05 09:08:40 +01:00
Matt Arsenault	486e84dfa4	AMDGPU/GlobalISel: Use live in helper function for returnaddress	2020-08-04 17:36:01 -04:00
Matt Arsenault	89011fc3c9	AMDGPU/GlobalISel: Select llvm.returnaddress	2020-08-04 17:14:38 -04:00
Matt Arsenault	f8fb7835d6	GlobalISel: Add utilty for getting function argument live ins Get the argument register and ensure there's a copy to the virtual register. AMDGPU and AArch64 have similarish code to get the livein value, and I also want to use this in multiple places. This is a bit more aggressive about setting the register class than the original function, but that's probably OK. I think we're missing a few verifier checks for function live ins. I noticed AArch64's calling convention code is not actually adding liveins to functions, only the entry block (which apparently might not matter that much?). There should probably be a verifier check that entry block live ins are also live into the function. We also might need a verifier check that the copy to the livein virtual register is in the entry block.	2020-08-04 16:55:55 -04:00
Matt Arsenault	0de547ed4a	AMDGPU/GlobalISel: Ensure subreg is valid when selecting G_UNMERGE_VALUES Fixes verifier error with SGPR unmerges with 96-bit result types.	2020-08-04 12:27:34 -04:00
Jay Foad	8ec8ad868d	[AMDGPU] Use fma for lowering frem This gives shorter f64 code and perhaps better accuracy. Differential Revision: https://reviews.llvm.org/D84516	2020-08-04 16:18:23 +01:00
Carl Ritson	57899934ea	[AMDGPU] Make GCNRegBankReassign assign based on subreg banks When scavenging consider the sub-register of the source operand to determine the bank of a candidate register (not just sub0). Without this it is possible to introduce an infinite loop, e.g. $sgpr15_sgpr16_sgpr17 can be assigned for a conflict between $sgpr0 and SGPR_96:sub1. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D84910	2020-08-04 12:54:44 +09:00
Christopher Tetreault	3b92db4c84	[SVE] Remove bad call to VectorType::getNumElements() from AMDGPU Differential Revision: https://reviews.llvm.org/D85151	2020-08-03 15:56:10 -07:00
Cameron McInally	31c7a2fd5c	[FPEnv] Don't transform FSUB(-0,X)->FNEG(X) in SelectionDAGBuilder. This patch stops unconditionally transforming FSUB(-0,X) into an FNEG(X) while building the DAG. There is also one small change to handle the new FSUB(-0,X) similarly to FNEG(X) in the AMDGPU backend. Differential Revision: https://reviews.llvm.org/D84056	2020-08-03 10:22:25 -05:00
Matt Arsenault	2414bab5d7	AMDGPU/GlobalISel: Remove old hacks for boolean selection There were various hacks used to try to avoid making s1 SGPR vs. s1 VCC ambiguous after constraining the register before we had a strategy to deal with this. This also attempted to handle undef operands, which are now illegal gMIR.	2020-08-03 09:04:14 -04:00
Matt Arsenault	fd63e46941	AMDGPU/GlobalISel: Apply load bitcast to s.buffer.load intrinsic Should also apply this to the non-scalar buffer loads.	2020-08-03 08:54:29 -04:00
Matt Arsenault	d8ef1d1251	AMDGPU/GlobalISel: Fix selecting broken copies for s32->s64 anyext These should probably not be legal in the first place, but that might also be a pain.	2020-08-03 08:36:41 -04:00
Matt Arsenault	212570abcf	GlobalISel: Implement bitcast action for G_EXTRACT_VECTOR_ELEMENT For AMDGPU, vectors with elements < 32 bits should be indexed in 32-bit elements and the desired bits extracted from there. For elements > 64-bits, these should be reduce to 64/32 elements to enable the normal dynamic indexing paths. In the dynamic index cases, this produces shorter code most of the time. This does immediately regress the constant index cases, but this should be fixed once we have the most basic of shift combines. The element size > 64 case is pretty much ported from the exisiting DAG implementation for extract element promote. The increasing element size case is new.	2020-08-02 10:42:07 -04:00
Benjamin Kramer	c6f08b14d4	Hide some internal symbols. NFC.	2020-07-31 17:28:02 +02:00
Matt Arsenault	57bd64ff84	Support addrspacecast initializers with isNoopAddrSpaceCast Moves isNoopAddrSpaceCast to the TargetMachine. It logically belongs with the DataLayout.	2020-07-31 10:42:43 -04:00
Vitaly Buka	b0eb40ca39	[NFC] Remove unused GetUnderlyingObject paramenter Depends on D84617. Differential Revision: https://reviews.llvm.org/D84621	2020-07-31 02:10:03 -07:00
Vitaly Buka	89051ebace	[NFC] GetUnderlyingObject -> getUnderlyingObject I am going to touch them in the next patch anyway	2020-07-30 21:08:24 -07:00
Matt Arsenault	e56e9022bc	AMDGPU: Fix liveness errors when copying AGPR tuples Avoid recursively calling copyPhysReg for AGPR handling. This was dropping the necessary super register implicit defs to avoid liveness verifier errors.	2020-07-30 18:13:04 -04:00
Changpeng Fang	243376cdc7	AMDGPU: Put inexpensive ops first in AMDGPUAnnotateUniformValues::visitLoadInst Summary: This is in response to the review of https://reviews.llvm.org/D84873: The expensive check should be reordered last Reviewers: arsenm Differential Revision: https://reviews.llvm.org/D84890	2020-07-30 14:37:06 -07:00
Stanislav Mekhanoshin	5b32518f96	[AMDGPU] Do not use undef on indirect source We are using undef on the indirect move source subreg and then using implicit super-reg. This creates a problem in RA when Greedy decides to split the register. It reassigns the implicit super-reg but does not bother to change undef source because it is really does not matter. The fix is to stop lying to RA and drop undef flag. This has also hit a problem in SIFoldOperands as it can fold immediate into an indirect move since there is no undef flag anymore. That results in multiple test failures, so added the check for this case. Differential Revision: https://reviews.llvm.org/D84899	2020-07-30 10:41:59 -07:00
hsmahesha	33fd4a18e7	[AMDGPU/MemOpsCluster] Clean-up fixme's around mem ops clustering logic Get rid of all fixmes and base heuristic on `num-clustered-dwords`. The main intuition behind this is as follows. The existing heuristic roughly summarizes as below: * Assume, all the mem ops instructions participating in the clustering process, loads/stores same num bytes * If num bytes loaded by each mem op is 4 bytes, then cluster at max 5 mem ops, that is at max 20 bytes * If num bytes loaded by each mem op is 8 bytes, then cluster at max 3 mem ops, that is at max 24 bytes * If num bytes loaded by each mem op is 16 bytes, then cluster at max 2 mem ops, that is at max 32 bytes So, we need to make sure that the new heuristic do not completey deviate away from the above one, and it properly handles both the sub-word loads and the wide loads. Reviewed By: arsenm, rampitec Differential Revision: https://reviews.llvm.org/D84354	2020-07-30 21:41:13 +05:30
Matt Arsenault	0da582d9b6	GlobalISel: Handle llvm.roundeven I still think it's highly questionable that we have two intrinsics with identical behavior and only vary by the name of the libcall used if it happens to be lowered that way, but try to reduce the feature delta between SDAG and GlobalISel for recently added intrinsics. I'm not sure which opcode should be considered the canonical one, but lower roundeven back to round.	2020-07-29 20:01:12 -04:00
Stanislav Mekhanoshin	decfdb8ce3	[AMDGPU] Fixed formatting in GCNHazardRecognizer.cpp. NFC.	2020-07-29 12:21:28 -07:00
Stanislav Mekhanoshin	13b63be472	[AMDGPU] prefer non-mfma in post-RA schedule MFMA instructions shall not be scheduled back to back to avoid MAI SIMD stall. Tell post-RA schedule we would prefer some other instruction instead. Differential Revision: https://reviews.llvm.org/D84883	2020-07-29 12:17:50 -07:00
Matt Arsenault	59fac51ff2	AMDGPU/GlobalISel: Handle llvm.amdgcn.reloc.constant	2020-07-29 14:24:21 -04:00
Matt Arsenault	0b7de7966f	GlobalISel: Implement lower for G_EXTRACT_VECTOR_ELT Use the basic store to stack and reload.	2020-07-29 14:16:28 -04:00
Matt Arsenault	766cb615a3	AMDGPU: Relax restriction on folding immediates into physregs I never completed the work on the patches referenced by `f8bf7d7f42`, but this was intended to avoid folding immediate writes into m0 which the coalescer doesn't understand very well. Relax this to allow simple SGPR immediates to fold directly into VGPR copies. This pattern shows up routinely in current GlobalISel code since nothing is smart enough to emit VGPR constants yet.	2020-07-29 14:01:53 -04:00
Matt Arsenault	d42c7b2211	AMDGPU: Account for the size of LDS globals used through constant expressions. Also "fix" the longstanding bug where the computed size depends on the order of the visitation. We could try to predict the allocation order used by legalization, but it would never be 100% perfect. Until we start fixing the addresses somehow (or have a more reliable allocation scheme later), just try to compute the size based on the worst case padding.	2020-07-29 11:40:42 -04:00
Matt Arsenault	200bb5191a	AMDGPU/GlobalISel: Refactor special argument management	2020-07-29 08:27:31 -04:00
Matt Arsenault	c230965ccf	AMDGPU: Make saturating add/sub legal for DAG path	2020-07-29 08:27:31 -04:00
Matt Arsenault	cdd45d5f9c	AMDGPU/GlobalISel: Select llvm.amdgcn.global.atomic.csub Remove the custom node boilerplate. Not sure why this tried to handle the LDS atomic stuff.	2020-07-29 08:27:31 -04:00
Matt Arsenault	44211f20a8	AMDGPU: Optimize copies to exec with other insts after exec def It's possible to have terminator instructions after a write to exec, so skip over them to find it.	2020-07-28 21:34:50 -04:00
Matt Arsenault	b6ebc77326	AMDGPU/GlobalISel: Fix selecting llvm.amdgcn.s.getreg This introduces the same bug llvm.amdgcn.s.setreg has where if the user specified an immediate outside of the valid 16-bit range, it will select into a verifier error.	2020-07-28 21:34:50 -04:00
Matt Arsenault	6a7b6dd54b	AMDGPU: Don't assert in canInsertSelect Currently GlobalISel doesn't force all VGPR phi operands to VGPRs, so this hit a case where it was queried with a VGPR and SGPR. This could arguably be a verifier error, but it's currently not.	2020-07-28 21:01:06 -04:00
Matt Arsenault	068808d102	AMDGPU: Don't assume call targets are registers GlobalISel let through a call to null, which would then fold into the source operand like any other inline immediate. The SelectionDAG lowering deletes calls to null and undef as a workaround from before calls were supported. We should probably drop the special handling case in the DAG lowering now, since the middle end optimizers delete null calls anyway.	2020-07-28 20:46:06 -04:00
Matt Arsenault	8860daf0ed	AMDGPU: Handle a few missing cases in getAddrModeArguments	2020-07-28 20:22:38 -04:00
Matt Arsenault	b3e63aa8a4	AMDGPU: Don't assume there is only one terminator copy This would stop on the first in reverse order, failing the verifier if there were more earlier in the block.	2020-07-28 20:22:38 -04:00
Matt Arsenault	592f2e8d1c	AMDGPU: Fix verifier error on spilling partially defined SGPRs This needs an implicit def of the super-register in case one of the lanes isn't defined, similar to copyPhysReg (or the not-VGPR spill case below). This showed up in GlobalISel testing since it currently doesn't fold out many undef instructions.	2020-07-28 20:01:57 -04:00
Matt Arsenault	66d60e06cb	AMDGPU: Serialize MFI spill fields These should probably be inferred from the function on parse, but the target specific infrastructure currently does not give you a way to do this. SILowerSGPRSpills early exits without this reporting spills, which makes it difficult to write a MIR test for.	2020-07-28 20:01:57 -04:00
Matt Arsenault	e87356b498	GlobalISel: Don't assert on operations with no type indices Fix not marking G_FENCE as legal on AMDGPU This was apparently defaulting to legal using the "legacy" rules, whatever those are.	2020-07-28 16:49:55 -04:00
Matt Arsenault	5174e7b443	GlobalISel: Add typeIsNot LegalityPredicate This allows sorting the legal/custom rules first as is recommended	2020-07-28 16:49:55 -04:00
Matt Arsenault	9731ef3ec5	AMDGPU/GlobalISel: Add SReg_96 to SGPRRegBank	2020-07-28 16:49:55 -04:00
Matt Arsenault	e9b236f411	AMDGPU: Check for other defs when folding conditions into s_andn2_b64 We can't fold the masked compare value through the select if the select condition is re-defed after the and instruction. Fixes a verifier error and trying to use the outgoing value defined in the block. I'm not sure why this pass is bothering to handle physregs. It's making this more complex and forces extra liveness computation.	2020-07-28 16:36:23 -04:00
Austin Kerbow	adeeac9d5a	[AMDGPU] Spill CSR VGPR which is reserved for SGPR spills Update logic for reserving VGPR for SGPR spills. A CSR VGPR being reserved for SGPR spills could be clobbered if there were no free lower VGPR's available. Create a stack object so that it will be spilled in the prologue. Also adds more tests. Differential Revision: https://reviews.llvm.org/D83730	2020-07-28 11:53:02 -07:00
Matt Arsenault	16bcd54570	AMDGPU/GlobalISel: Mark GlobalISel classes as final	2020-07-28 11:42:17 -04:00
Matt Arsenault	bb23b5cfe0	AMDGPU/GlobalISel: Merge identical select cases	2020-07-28 11:42:17 -04:00
Matt Arsenault	a4edc04693	AMDGPU/GlobalISel: Use clamp modifier for [us]addsat/[us]subsat We also have never handled this for SelectionDAG, which needs additional work.	2020-07-28 11:18:05 -04:00
Matt Arsenault	ce944af33c	AMDGPU/GlobalISel: Mark G_ATOMICRMW_{NAND\|FSUB} as lower These aren't implemented and we're still relying on the AtomicExpand pass, but mark these as lower to eliminate a few of the few remaining no rules defined cases.	2020-07-27 18:47:40 -04:00
Matt Arsenault	8b81d0633f	AMDGPU: global_atomic_csub is not always dereferenceable	2020-07-27 18:47:39 -04:00
Kazu Hirata	902cbcd59e	Use llvm::is_contained where appropriate (NFC) Summary: This patch replaces std::find with llvm::is_contained where appropriate. Reviewers: efriedma, nhaehnle Reviewed By: nhaehnle Subscribers: arsenm, jvesely, nhaehnle, hiraditya, rogfer01, kerbowa, llvm-commits, vkmr Tags: #llvm Differential Revision: https://reviews.llvm.org/D84489	2020-07-27 10:20:44 -07:00
Piotr Sobczak	590dd73c6e	[AMDGPU] Make generating cache invalidating instructions optional Summary: D78800 skipped generating cache invalidating instrucions altogether on AMDPAL. However, this is sometimes too restrictive - we want a more flexible option to be able to toggle this behaviour on and off while we work towards developing a correct implementation of the alternative memory model. Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, dexonsmith, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D84448	2020-07-27 09:24:11 +02:00
Matt Arsenault	e97aa5609f	AMDGPU/GlobalISel: Don't assert in LegalizerInfo constructor We don't really need these asserts. The LegalizerInfo is also overly-aggressivly constructed, even when not in use. It needs to not assert on dummy targets that have manually specified, unrelated features.	2020-07-26 23:01:28 -04:00
Matt Arsenault	d35e2c101d	AMDGPU/GlobalISel: Fix not constraining ds_append/consume operands	2020-07-26 10:17:36 -04:00
Matt Arsenault	f6176f8a5f	GlobalISel: Handle G_PTR_ADD in narrowScalar	2020-07-26 10:08:17 -04:00
Matt Arsenault	7c09c173a2	AMDGPU/GlobalISel: Reorder G_CONSTANT legality rules The legal cases should be the first rules.	2020-07-26 10:05:05 -04:00
Matt Arsenault	bcf5184a68	AMDGPU/GlobalISel: Make sure <2 x s1> phis are scalarized	2020-07-26 10:04:47 -04:00
Matt Arsenault	6f961a1e7e	AMDGPU/GlobalISel: Legalize GDS atomics I noticed these don't use the _gfx9, non-m0 reading variants but not sure if that's a bug or not. It's the same in the DAG.	2020-07-26 10:03:34 -04:00
Matt Arsenault	5819159995	AMDGPU/GlobalISel: Pack constant G_BUILD_VECTOR_TRUNCs when selecting	2020-07-26 09:55:34 -04:00
Matt Arsenault	4033aa1467	AMDGPU/GlobalISel: Sign extend integer constants This matches the DAG behavior and fixes immediate folding	2020-07-26 09:30:14 -04:00
Matt Arsenault	392b969c32	AMDGPU/GlobalISel: Don't assert on G_INSERT > 128-bits Just fallback for now. Really tablegen needs to generate all of the subregister index handling we need.	2020-07-25 10:05:44 -04:00
Matt Arsenault	2bd72abef0	AMDGPU: Skip other terminators before inserting s_cbranch_exec[n]z PHIElimination/createPHISourceCopy inserts non-branch terminators after the control flow pseudo if a successor phi reads a register defined by the control flow pseudo. If this happens, we need to split the expansion of the control flow pseudo to ensure all the branches are after all of the other mask management instructions. GlobalISel hit this in testscases that happened to be tail duplicated. The original testcase still does not work, since the same problem appears to be present in a later pass.	2020-07-24 16:51:59 -04:00
madhur13490	4a577c3a22	[AMDGPU] Fix incorrect arch assert while setting up FlatScratchInit Reviewers: arsenm, foad, rampitec, scott.linder Reviewed By: arsenm Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D84391	2020-07-24 18:19:04 +00:00
Dmitry Preobrazhensky	6b8948922c	[AMDGPU][MC] Added support of SP3 syntax for MTBUF format modifier Currently supported LLVM MTBUF syntax is shown below. It is not compatible with SP3. op dst, addr, rsrc, FORMAT, soffset This change adds support for SP3 syntax: op dst, addr, rsrc, soffset SP3FORMAT In addition to being compatible with SP3, this syntax allows using symbolic names for data, numeric and unified formats. Below is a list of added syntax variants. format:<expression> format:[<numeric-format-name>,<data-format-name>] format:[<data-format-name>,<numeric-format-name>] format:[<data-format-name>] format:[<numeric-format-name>] format:[<unified-format-name>] The last syntax variant is supported for GFX10 only. See llvm bug 37738 Reviewers: arsenm, rampitec, vpykhtin Differential Revision: https://reviews.llvm.org/D84026	2020-07-24 16:41:03 +03:00
Petar Avramovic	47bd41d099	AMDGPU/GlobalISel: Select set.inactive intrinsic Differential Revision: https://reviews.llvm.org/D84407	2020-07-24 10:14:14 +02:00
Matt Arsenault	891759db73	GlobalISel: Add scalarSameSizeAs LegalizeRule Widen or narrow a type to a type with the same scalar size as another. This can be used to force G_PTR_ADD/G_PTRMASK's scalar operand to match the bitwidth of the pointer type. Use this to disallow narrower types for G_PTRMASK.	2020-07-23 21:17:31 -04:00
Matt Arsenault	b9c644ec61	AMDGPU: Fix failures from overflowing uint8_t number of operands If the operand index exceeded the limit of unsigned char, it wrapped and would point to the wrong operand. Increase the size of the operand index field to avoid this, and also don't bother trying to fold into implicit operands.	2020-07-23 15:39:33 -04:00
Matt Arsenault	d2b8fcff34	AMDGPU/GlobalISel: Handle call return values The only case that I know doesn't work is the implicit sret case when the return type doesn't fit in the return registers.	2020-07-23 14:29:35 -04:00
Sebastian Neubauer	896679733d	[AMDGPU] Fix typo. NFC	2020-07-23 17:01:12 +02:00
Jay Foad	b35833b84e	[GlobalISel][AMDGPU] Legalize saturating add/subtract Add support in LegalizerHelper for lowering G_SADDSAT etc. either using add/subtract-with-overflow or using max/min instructions. Enable this lowering for AMDGPU so it can be tested. The legalization rules are still approximate and skips out on using the clamp bit to treat these as legal, which has never been used before. This also doesn't yet try to deal with expanding SALU cases.	2020-07-23 09:06:42 -04:00
Matt Arsenault	0c92bfa4b8	GlobalISel: Don't use virtual for distinguishing arg handlers There's no reason to involve the hassle of a virtual method targets have to override for a simple boolean. Not sure exactly what's going on with Mips, but it seems to define its own totally separate handler classes.	2020-07-22 14:14:43 -04:00
Matt Arsenault	6f437117af	AMDGPU: Don't assert on f16 inv2pi immediates pre-gfx8 v_cvt_f32_f16 can still accept this value as a literal constant. This showed up in GlobalISel since it doesn't have constant folding for G_FPEXT.	2020-07-22 13:59:03 -04:00
Matt Arsenault	b98f902f18	GlobalISel: Restructure argument lowering loop in handleAssignments This was structured in a way that implied every split argument is in memory, or in registers. It is possible to pass an original argument partially in registers, and partially in memory. Transpose the logic here to only consider a single piece at a time. Every individual CCValAssign should be treated independently, and any merge to original value needs to be handled later. This is in preparation for merging some preprocessing hacks in the AMDGPU calling convention lowering into the generic code. I'm also not sure what the correct behavior for memlocs where the promoted size is larger than the original value. I've opted to clamp the memory access size to not exceed the value register to avoid the explicit trunc/extend/vector widen/vector extract instruction. This happens for AMDGPU for i8 arguments that end up stack passed, which are promoted to i16 (I think this is a preexisting DAG bug though, and they should not really be promoted when in memory).	2020-07-22 13:31:11 -04:00
Matt Arsenault	1fd1beea18	AMDGPU/GlobalISel: Fix translation of indirect calls	2020-07-22 13:13:21 -04:00
Dmitry Preobrazhensky	0b8fd77ad9	[AMDGPU][MC] Corrected decoding of 16-bit literals 16-bit literals are encoded as 32-bit values. If high 16-bits of the value is 0xFFFF, the decoded instruction cannot be reassembled. For example, the following code 0xff,0x04,0x04,0x52,0xcd,0xab,0xff,0xff was decoded as v_mul_lo_u16_e32 v2, 0xffffabcd, v2 However this literal is actually a 64-bit constant 0x00000000ffffabcd which violates requirements described in the documentation - the truncation is not safe. This change corrects decoding to make reassembly possible. Reviewers: arsenm, rampitec Differential Revision: https://reviews.llvm.org/D84098	2020-07-22 17:20:43 +03:00
Sebastian Neubauer	2a6c871596	[InstCombine] Move target-specific inst combining For a long time, the InstCombine pass handled target specific intrinsics. Having target specific code in general passes was noted as an area for improvement for a long time. D81728 moves most target specific code out of the InstCombine pass. Applying the target specific combinations in an extra pass would probably result in inferior optimizations compared to the current fixed-point iteration, therefore the InstCombine pass resorts to newly introduced functions in the TargetTransformInfo when it encounters unknown intrinsics. The patch should not have any effect on generated code (under the assumption that code never uses intrinsics from a foreign target). This introduces three new functions: TargetTransformInfo::instCombineIntrinsic TargetTransformInfo::simplifyDemandedUseBitsIntrinsic TargetTransformInfo::simplifyDemandedVectorEltsIntrinsic A few target specific parts are left in the InstCombine folder, where it makes sense to share code. The largest left-over part in InstCombineCalls.cpp is the code shared between arm and aarch64. This allows to move about 3000 lines out from InstCombine to the targets. Differential Revision: https://reviews.llvm.org/D81728	2020-07-22 15:59:49 +02:00
Petar Avramovic	44967fc604	AMDGPU: Simplify f16 to i64 custom lowering Range that f16 can represent fits into i32. Lower as f16->i32->i64 instead of f16->f32->i64 since f32->i64 has long expansion. Differential Revision: https://reviews.llvm.org/D84166	2020-07-22 10:32:14 +02:00
Matt Arsenault	b258920095	AMDGPU/GlobalISel: Fix not erasing inst when lowering G_FRINT	2020-07-21 18:29:41 -04:00
Matt Arsenault	7cd8a0256d	GlobalISel: Legalize G_FPOWI	2020-07-21 18:13:04 -04:00
Matt Arsenault	1168119c2f	AMDGPU: Start interpreting byref on kernel arguments These are treated identically to value aggregates placed in the kernel argument list. A %struct.foo or %struct.foo addrspace(4)* byref(sizeof(%struct.foo)) align(alignof(%struct.foo)) argument should produce the same offsets and argument metadata. This handles all 3 kernel ABI implementations, and the two HSA metadata emission paths.	2020-07-21 18:11:22 -04:00
Matt Arsenault	2fe0ea8261	DAG: Handle expanding strict_fsub into fneg and strict_fadd The AMDGPU handling of f16 vectors is terrible still since it gets scalarized even when the vector operation is legal. The code is is essentially duplicated between the non-strict and strict case. Apparently no other expansions are currently trying to do this. This is mostly because I found the behavior of getStrictFPOperationAction to be confusing. In the ARM case, it would expand strict_fsub even though it shouldn't due to the later check. At that point, the logic required to check for legality was more complex than just duplicating the 2 instruction expansion.	2020-07-21 16:17:10 -04:00
Matt Arsenault	107c954c13	AMDGPU/GlobalISel: Remove unnecessary parameter	2020-07-20 20:53:01 -04:00
Matt Arsenault	ce76d15a70	AMDGPU: Use MCRegister for preloaded arguments Attempt to fix build error with ancient GCC	2020-07-20 13:34:28 -04:00
Matt Arsenault	21ef01b7e3	AMDGPU: Remove outdated fixme	2020-07-20 11:41:41 -04:00

1 2 3 4 5 ...

5304 Commits