llvm-project

Commit Graph

Author	SHA1	Message	Date
Matt Arsenault	5a15f6628e	GlobalISel: Implement fewerElementsVector for G_INSERT_VECTOR_ELT Add unit tests since AMDGPU will only trigger this for gigantic vectors, and won't use the annoying odd sized breakdown case.	2020-08-18 13:51:19 -04:00
Matt Arsenault	2f5f5febf3	AMDGPU/GlobalISel: Select llvm.amdgcn.groupstaticsize Previously, it would successfully select and assert if not HSA or PAL when expanding the pseudoinstruction. We don't need the pseudoinstruction anymore since we know the total size after legalization.	2020-08-18 09:28:01 -04:00
Matt Arsenault	3ba7777b94	AMDGPU/GlobalISel: Fix selection of s1/s16 G_[F]CONSTANT The code to determine the value size was overcomplicated and only correct in the case where the result register already had a register class assigned. We can always take the size directly from the register's type.	2020-08-18 09:28:01 -04:00
Matt Arsenault	a9ee0589a8	AMDGPU/GlobalISel: Match global saddr addressing mode	2020-08-17 15:48:06 -04:00
Matt Arsenault	e1a2f4713c	AMDGPU: Match global saddr addressing mode The previous implementation was incorrect, and based off incorrect instruction definitions. Unfortunately we can't match natural addressing in a lot of cases due to the shift/scale applied in getelementptrs. This relies on reducing the 64-bit shift to 32-bits.	2020-08-17 15:28:14 -04:00
Stanislav Mekhanoshin	24182f14b6	[AMDGPU] Define spill opcodes for all AGPR sizes Since we have defined all these sizes I believe we shall be able to spill these as well. Differential Revision: https://reviews.llvm.org/D86098	2020-08-17 12:17:23 -07:00
Matt Arsenault	c8a9872259	AMDGPU/GlobalISel: Look through copies in getPtrBaseWithConstantOffset We may have an SGPR->VGPR copy if a totally uniform pointer calculation is used for a VGPR pointer operand. Also hack around a bug in MUBUF matching which would incorrectly use MUBUF for global when flat was requested. This should really be a predicate on the parent pattern, but the DAG always checked this manually inside the complex pattern.	2020-08-17 12:31:38 -04:00
Steven Perron	eed6476a87	Reset PAL metadata when AMDGPU traget stream finishes If the same stream object is used for multiple compiles, the PAL metadata from eariler compilations will leak into later one. See https://github.com/GPUOpen-Drivers/llpc/issues/882 for how this is happening in LLPC. No tests were added because multiple compiles will have to happen using the same pass manager, and I do not see a setup for that on the LLVM side. Let me know if there is a good way to test this. Reviewed By: nhaehnle Differential Revision: https://reviews.llvm.org/D85667	2020-08-17 10:56:11 -04:00
Matt Arsenault	c7b9cd31bf	AMDGPU/GlobalISel: Fix missing 256-bit AGPR mapping	2020-08-17 09:53:26 -04:00
Matt Arsenault	af162ac785	AMDGPU/GlobalISel: Fix using readfirstlane with ballot intrinsics This should use the default mapping and insert a copy to the vcc bank, and not try to insert a readfirstlane.	2020-08-17 09:53:25 -04:00
Matt Arsenault	da3f357de6	AMDGPU: Don't look at dbg users for foldable operands These would have always failed to fold, so checking them or adding them to the fold candidates is useless.	2020-08-17 09:53:25 -04:00
Matt Arsenault	66ffa0e91f	AMDGPU/GlobalISel: Fix using post-legal combiner without LegalizerInfo	2020-08-17 09:19:22 -04:00
Matt Arsenault	e0375dbcb3	AMDGPU: Fix using wrong offsets for global atomic fadd intrinsics Global instructions have the signed offsets.	2020-08-17 09:19:15 -04:00
Matt Arsenault	f0af434b79	AMDGPU: Remove register class params from flat memory patterns	2020-08-15 12:12:33 -04:00
Matt Arsenault	a7455652c0	AMDGPU: Fix global atomic saddr operand class	2020-08-15 12:12:28 -04:00
Matt Arsenault	625db2fe5b	AMDGPU: Remove slc from flat offset complex patterns This was always set to 0. Use a default value of 0 in this context to satisfy the instruction definition patterns. We can't unconditionally use SLC with a default value of 0 due to limitations in TableGen's handling of defaulted operands when followed by non-default operands.	2020-08-15 12:12:24 -04:00
Matt Arsenault	e5077b5c2a	AMDGPU: Fix matching wrong offsets for global atomic loads These used signed offsets with a different size.	2020-08-15 12:12:17 -04:00
Matt Arsenault	8cb022982a	AMDGPU: Remove redundant FLAT complex patterns These were identical to the non-atomic cases. I'm not sure why these were ever separated.	2020-08-15 12:12:01 -04:00
Matt Arsenault	47af1ac69a	AMDGPU: Correct definitions for global saddr instructions The VGPR component is a 32-bit offset, not 64-bits. I'm not sure what the correct syntax is for this. This maintains the vaddr position and leaves saddr in the end "off" position. This is particularly terrible for stores, since the operand order is now <vgpr offset>, <data>, <sgpr base>, splitting the pointer operands. I suppose this is a logical consequence from the mistake of not putting the data operand first. I'm not sure what sp3 does.	2020-08-15 12:11:57 -04:00
Matt Arsenault	79298a5067	AMDGPU: Remove SIFixupVectorISel pass This was only used for matching the saddr addressing mode of global instructions, but this was not implemented correctly. The instruction definitions aren't even correct, and are defined as using a 64-bit VGPR component. Eliminate this pass to enable correcting the instruction definitions. A new matching implementation can work in GlobalISel or relying on DAG divergence information for the base address.	2020-08-15 12:11:51 -04:00
Stanislav Mekhanoshin	43a38dc251	[AMDGPU] Fix MAI ld/st hazard handling It did not process hazard for ds_permute because it does not load or store even though it is DS. Differential Revision: https://reviews.llvm.org/D86003	2020-08-14 17:07:37 -07:00
Craig Topper	c7a0b2684f	[X86][MC][Target] Initial backend support a tune CPU to support -mtune This patch implements initial backend support for a -mtune CPU controlled by a "tune-cpu" function attribute. If the attribute is not present X86 will use the resolved CPU from target-cpu attribute or command line. This patch adds MC layer support a tune CPU. Each CPU now has two sets of features stored in their GenSubtargetInfo.inc tables . These features lists are passed separately to the Processor and ProcessorModel classes in tablegen. The tune list defaults to an empty list to avoid changes to non-X86. This annoyingly increases the size of static tables on all target as we now store 24 more bytes per CPU. I haven't quantified the overall impact, but I can if we're concerned. One new test is added to X86 to show a few tuning features with mismatched tune-cpu and target-cpu/target-feature attributes to demonstrate independent control. Another new test is added to demonstrate that the scheduler model follows the tune CPU. I have not added a -mtune to llc/opt or MC layer command line yet. With no attributes we'll just use the -mcpu for both. MC layer tools will always follow the normal CPU for tuning. Differential Revision: https://reviews.llvm.org/D85165	2020-08-14 15:31:50 -07:00
Matt Arsenault	40a142fa57	AMDGPU/GlobalISel: Match andn2/orn2 for more types Unfortunately this ends up not working as expected on targets with 16-bit operations due to AMDGPUCodeGenPrepare's promotion of uniform 16-bit ops to i32. The vector case annoyingly requires switching the checked opcode, since constants for vectors aren't directly handled. I also need to think more carefully about whether this is valid for i1.	2020-08-14 13:18:03 -04:00
Sebastian Neubauer	9aa0ff77bd	[AMDGPU] Enable .rodata for amdpal os PAL recently got support for multiple ELF sections and relocations, therefore we can now use .rodata sections instead of forcing constants into .text. Differential Revision: https://reviews.llvm.org/D85895	2020-08-14 09:05:48 +02:00
Austin Kerbow	7d1cb187fb	[AMDGPU] Fix FP/BP spills when MUBUF constant offset exceeded If we need a scratch register for the spill don't use the same scratch register that is being used for the MBUF offset. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D85772	2020-08-13 14:12:00 -07:00
Stanislav Mekhanoshin	0462aef5f3	[AMDGPU] Inhibit SDWA if target instruction has FI Differential Revision: https://reviews.llvm.org/D85918	2020-08-13 11:34:28 -07:00
Stanislav Mekhanoshin	d25cb5a8a2	[AMDGPU] Fix misleading SDWA verifier error. NFC. The old error from GFX9 shall be updated to GFX9+.	2020-08-13 11:32:17 -07:00
Carl Ritson	d538c5837a	[AMDGPU] Fix missed SI_RETURN_TO_EPILOG in pre-emit peephole SIPreEmitPeephole does not process all terminators, which means it can fail to handle SI_RETURN_TO_EPILOG if immediately preceeded by a branch to the early exit block. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D85872	2020-08-13 21:52:41 +09:00
Ruiling Song	18b1e67523	[AMDGPU] Fix crash when dag-combining bitcast From the code after the 'break', they are processing 64bit scalar and vector bitcast. So I think the break-condition should be (cond1 \|\| cond2) This means we only execute following code if (64bit and dest-is-vector). Also remove a previous fix which is not needed with this new fix. (introduced in: `1349a04ef5`) Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D85804	2020-08-13 10:23:13 +08:00
Matt Arsenault	e14474a39a	AMDGPU/GlobalISel: Select llvm.amdgcn.global.atomic.fadd Remove the intermediate transform in the DAG path. I believe this is the last non-deprecated intrinsic that needs handling.	2020-08-12 10:04:53 -04:00
Matt Arsenault	701228c411	AMDGPU: Handle intrinsics in performMemSDNodeCombine This avoids a possible regression in a future patch	2020-08-12 10:04:53 -04:00
Matt Arsenault	0dc4c36d3a	AMDGPU/GlobalISel: Manually select llvm.amdgcn.writelane Fixup the special case constant bus handling pre-gfx10.	2020-08-11 11:56:16 -04:00
Matt Arsenault	076305568c	AMDGPU/GlobalISel: Prepare for more custom load lowerings Slight restructuring of the code to avoid formatting changes when more cases are handled here.	2020-08-11 11:09:05 -04:00
Matt Arsenault	e2f1b48f86	GlobalISel: Implement bitcast action for G_INSERT_VECTOR_ELT This mirrors the support for the equivalent extracts. This also creates a huge mess that would be greatly improved if we had any bit operation combines.	2020-08-11 10:39:14 -04:00
Matt Arsenault	53f21e0fb7	TableGen/GlobalISel: Hack the operand order for atomic_store ISD::ATOMIC_STORE arbitrarily has the operands in the opposite order from regular ISD::STORE, which always introduced an annoying duplication of patterns to handle both cases. Since in GlobalISel there's just the one G_STORE, we need to swap the operands to correctly emit the type check for the pointer operand. Some work started in `20aafa3156` to migrate SelectionDAG to use ISD::STORE for atomics, but that work seems to have stalled. Since this is the pretty much the last operation which matters which isn't supported for AMDGPU, use this compatibility hack to unblock declaring it functionally complete. Not sure what's going on with the pending_phis AArch64 test. It seems it didn't always use atomics, and I'm not sure what it was originally testing matters anymore.	2020-08-11 10:22:44 -04:00
Kerry McLaughlin	85c7e89f3b	[CodeGen] Refactor getMemBasePlusOffset & getObjectPtrOffset to accept a TypeSize Changes the Offset arguments to both functions from int64_t to TypeSize & updates all uses of the functions to create the offset using TypeSize::Fixed() Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D85220	2020-08-11 12:17:10 +01:00
Matt Arsenault	6fe6b29c29	AMDGPU: Fix assertion in performSHLPtrCombine for 64-bit pointers	2020-08-10 13:46:52 -04:00
Matt Arsenault	68fab44acf	AMDGPU: Fix visiting physreg dest users when folding immediate copies This can fold the immediate into the physical destination, but this should not look for further users of the register. Fixes regression introduced by `766cb615a3`.	2020-08-10 13:46:51 -04:00
Matt Arsenault	40188f807d	AMDGPU/GlobalISel: Don't try to handle undef source operand This is now illegal MIR	2020-08-10 08:49:43 -04:00
Matt Arsenault	a0ec81f70d	AMDGPU/GlobalISel: Merge load/store select cases	2020-08-10 08:46:26 -04:00
Matt Arsenault	c8b17874e5	AMDGPU/GlobalISel: Fix typo	2020-08-10 08:41:17 -04:00
Matt Arsenault	9533f0ea68	AMDGPU/GlobalISel: Use nicer form of buildInstr	2020-08-10 08:41:07 -04:00
Petar Avramovic	0d58d9e8fb	AMDGPU/GlobalISel: Lower G_FREM Add custom lower for G_FREM. Differential Revision: https://reviews.llvm.org/D84324	2020-08-10 10:10:46 +02:00
Piotr Sobczak	62d8b8a225	Fix 64-bit copy to SCC Fix 64-bit copy to SCC by restricting the pattern resulting in such a copy to subtargets supporting 64-bit scalar compare, and mapping the copy to S_CMP_LG_U64. Before introducing the S_CSELECT pattern with explicit SCC (`0045786f14`), there was no need for handling 64-bit copy to SCC ($scc = COPY sreg_64). The proposed handling to read only the low bits was however based on a false premise that it is only one bit that matters, while in fact the copy source might be a vector of booleans and all bits need to be considered. The practical problem of mapping the 64-bit copy to SCC is that the natural instruction to use (S_CMP_LG_U64) is not available on old hardware. Fix it by restricting the problematic pattern to subtargets supporting the instruction (hasScalarCompareEq64). Differential Revision: https://reviews.llvm.org/D85207	2020-08-09 20:50:30 +02:00
Matt Arsenault	3c0597a9e4	AMDGPU: Avoid explicitly listing all the memory nodes	2020-08-07 19:22:46 -04:00
Vang Thao	04bd5b5286	[AMDGPU] Fix not rescheduling without clustering Regions are sometimes skipped which should be rescheduled without memory op clustering. RegionIdx is not incremented when iterating over regions that are flagged to be skipped, causing the index to be incorrect. Thanks to Vang Thao for discovering this bug! Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D85498	2020-08-07 11:15:58 -07:00
Bevin Hansson	5de6c56f7e	[Intrinsic] Add sshl.sat/ushl.sat, saturated shift intrinsics. Summary: This patch adds two intrinsics, llvm.sshl.sat and llvm.ushl.sat, which perform signed and unsigned saturating left shift, respectively. These are useful for implementing the Embedded-C fixed point support in Clang, originally discussed in http://lists.llvm.org/pipermail/llvm-dev/2018-August/125433.html and http://lists.llvm.org/pipermail/cfe-dev/2018-May/058019.html Reviewers: leonardchan, craig.topper, bjope, jdoerfert Subscribers: hiraditya, jdoerfert, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D83216	2020-08-07 15:09:24 +02:00
Matt Arsenault	87b2af8140	AMDGPU/GlobalISel: Enable s_{and\|or}n2_{b32\|b64} patterns	2020-08-06 18:00:38 -04:00
dfukalov	4ccc38813e	[AMDGPU][CostModel] Add f16, f64 and contract cases to fused costs estimation. Add cases of fused fmul+fadd/fsub with f16 and f64 operands to cost model. Also added operations with contract attribute. Fixed line endings in test. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D84995	2020-08-06 21:43:27 +03:00
Matt Arsenault	e00201539f	GlobalISel: Implement fewerElementsVector for G_EXTRACT_VECTOR_ELT Use the same basic strategy as LegalizeVectorTypes. Try to index into smaller pieces if there's a constant index, and otherwise fall back to a stack temporary.	2020-08-06 14:33:16 -04:00
Matt Arsenault	1a0c0944c6	AMDGPU: Define raw/struct variants of buffer atomic fadd Somehow the new FP atomic buffer intrinsics ended up using the legacy style for buffer intrinsics.	2020-08-06 13:36:19 -04:00
Matt Arsenault	90eb7d5283	AMDGPU: Fix spilling of 96-bit AGPRs	2020-08-06 12:42:07 -04:00
Matt Arsenault	56270d1d42	AMDGPU/GlobalISel: Start trying to handle AGPR bank Try to use AGPR banks for the various merge/unmerge type operations. Previously these would introduce copies to VGPR.	2020-08-06 12:39:50 -04:00
Matt Arsenault	34040a4f61	GlobalISel: Define InvalidRegBankID enum value	2020-08-06 12:39:49 -04:00
Matt Arsenault	63cdc9a49f	AMDGPU/GlobalISel: Handle llvm.amdgcn.ds.{fadd\|fmin\|fmax} These intrinsics are missing mangling for both the pointer and data type.	2020-08-06 11:09:08 -04:00
Matt Arsenault	63c4be53cf	AMDGPU/GlobalISel: Try to promote to use packed saturating add/sub This produces worse results right now for i8 vectors, but that should be addressed when we actually try to optimize packed vectors.	2020-08-06 11:08:45 -04:00
Matt Arsenault	dcf3ffb0a8	AMDGPU/GlobalISel: Move frame index selection to patterns Doesn't really save any code until global value is handled too.	2020-08-06 10:42:15 -04:00
Matt Arsenault	d188a608bd	AMDGPU: Fix code duplication between the selectors Not sure this is the right place for this helper.	2020-08-06 10:42:15 -04:00
Matt Arsenault	5a503521e7	AMDGPU/GlobalISel: Implement expansion for rsq.clamp Not sure why we handle this removed instruction on newer subtargets for this one and no others, but maintain compatibility with the DAG.	2020-08-06 10:23:25 -04:00
Matt Arsenault	c015cbc68b	AMDGPU/GlobalISel: Fix trying to widen <3 x s1> boolean ops	2020-08-06 10:07:22 -04:00
Matt Arsenault	28124a0a63	AMDGPU/GlobalISel: Stop using G_EXTRACT in argument lowering We really need to put this undef padding stuff into a helper somewhere, but leave that for when this is moved to generic code.	2020-08-06 09:55:35 -04:00
Matt Arsenault	6c7f640bf7	AMDGPU/GlobalISel: Implement LLT version of allowsMisalignedMemoryAccesses	2020-08-06 09:50:36 -04:00
Matt Arsenault	37894ba661	AMDGPU/GlobalISel: Make s16 phi legal If we were to have an operation with an s16 def that needs to be executed in a waterfall loop, not having s16 legal would place an avoidable burden on RegBankSelect to widen it.	2020-08-06 09:41:14 -04:00
Matt Arsenault	5316256709	AMDGPU/GlobalISel: Fix assert on copy to vcc This was trying to constrain a physical register. By the verifier's understanding, it's impossible to have a 1-bit copy to vcc/vcc_lo so don't try to handle physregs.	2020-08-06 09:41:14 -04:00
Matt Arsenault	0ee1eba581	AMDGPU: Remove ATOMIC_PK_FADD The f32 and v2f16 cases should be handled the same way.	2020-08-05 22:00:52 -04:00
Ruiling Song	5ddc8b49ba	[AMDGPU] add buffer_atomic_swap for float The functionality is used when calling imageAtomicExhange() on float type imageBuffer in Graphics shaders. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D85187	2020-08-06 09:45:48 +08:00
Stanislav Mekhanoshin	0bcda1a261	[AMDGPU] Scavenge temp reg for AGPR spill Differential Revision: https://reviews.llvm.org/D85234	2020-08-05 13:29:19 -07:00
Matt Arsenault	ec8c172d01	AMDGPU: Correct prolog SP initialization logic Having callees that will read SP is not the only reason we need to reference the stack pointer.	2020-08-05 15:47:53 -04:00
Stanislav Mekhanoshin	ea7d0e2996	[AMDGPU] gfx1031 target Differential Revision: https://reviews.llvm.org/D85337	2020-08-05 12:36:26 -07:00
Matt Arsenault	83eaf5d55d	AMDGPU: Eliminate BUFFER_ATOMIC_PK_ADD_F16 node This is redundant with the other no return buffer atomic node, and we don't really need a separate type profile for it.	2020-08-05 15:16:51 -04:00
Matt Arsenault	43c0c9252a	AMDGPU: Refactor buffer atomic intrinsic lowering Move raw/struct buffer atomic lowering to separate functions. This avoids a long nested switch, and simplifies a future patch.	2020-08-05 14:44:55 -04:00
Matt Arsenault	3e52667433	AMDGPU: Fix verifier error with undef source producing s_bitset* This needs to preserve the undef flag.	2020-08-05 14:42:20 -04:00
Jay Foad	8cbf4a17ac	[AMDGPU] Propagate fast math flags in frem lowering Differential Revision: https://reviews.llvm.org/D84518	2020-08-05 09:09:38 +01:00
Jay Foad	04cf4a5a65	[AMDGPU] Lower frem f16 Without this it would fail to select on subtargets that have 16-bit instructions. Differential Revision: https://reviews.llvm.org/D84517	2020-08-05 09:08:40 +01:00
Matt Arsenault	486e84dfa4	AMDGPU/GlobalISel: Use live in helper function for returnaddress	2020-08-04 17:36:01 -04:00
Matt Arsenault	89011fc3c9	AMDGPU/GlobalISel: Select llvm.returnaddress	2020-08-04 17:14:38 -04:00
Matt Arsenault	f8fb7835d6	GlobalISel: Add utilty for getting function argument live ins Get the argument register and ensure there's a copy to the virtual register. AMDGPU and AArch64 have similarish code to get the livein value, and I also want to use this in multiple places. This is a bit more aggressive about setting the register class than the original function, but that's probably OK. I think we're missing a few verifier checks for function live ins. I noticed AArch64's calling convention code is not actually adding liveins to functions, only the entry block (which apparently might not matter that much?). There should probably be a verifier check that entry block live ins are also live into the function. We also might need a verifier check that the copy to the livein virtual register is in the entry block.	2020-08-04 16:55:55 -04:00
Matt Arsenault	0de547ed4a	AMDGPU/GlobalISel: Ensure subreg is valid when selecting G_UNMERGE_VALUES Fixes verifier error with SGPR unmerges with 96-bit result types.	2020-08-04 12:27:34 -04:00
Jay Foad	8ec8ad868d	[AMDGPU] Use fma for lowering frem This gives shorter f64 code and perhaps better accuracy. Differential Revision: https://reviews.llvm.org/D84516	2020-08-04 16:18:23 +01:00
Carl Ritson	57899934ea	[AMDGPU] Make GCNRegBankReassign assign based on subreg banks When scavenging consider the sub-register of the source operand to determine the bank of a candidate register (not just sub0). Without this it is possible to introduce an infinite loop, e.g. $sgpr15_sgpr16_sgpr17 can be assigned for a conflict between $sgpr0 and SGPR_96:sub1. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D84910	2020-08-04 12:54:44 +09:00
Christopher Tetreault	3b92db4c84	[SVE] Remove bad call to VectorType::getNumElements() from AMDGPU Differential Revision: https://reviews.llvm.org/D85151	2020-08-03 15:56:10 -07:00
Cameron McInally	31c7a2fd5c	[FPEnv] Don't transform FSUB(-0,X)->FNEG(X) in SelectionDAGBuilder. This patch stops unconditionally transforming FSUB(-0,X) into an FNEG(X) while building the DAG. There is also one small change to handle the new FSUB(-0,X) similarly to FNEG(X) in the AMDGPU backend. Differential Revision: https://reviews.llvm.org/D84056	2020-08-03 10:22:25 -05:00
Matt Arsenault	2414bab5d7	AMDGPU/GlobalISel: Remove old hacks for boolean selection There were various hacks used to try to avoid making s1 SGPR vs. s1 VCC ambiguous after constraining the register before we had a strategy to deal with this. This also attempted to handle undef operands, which are now illegal gMIR.	2020-08-03 09:04:14 -04:00
Matt Arsenault	fd63e46941	AMDGPU/GlobalISel: Apply load bitcast to s.buffer.load intrinsic Should also apply this to the non-scalar buffer loads.	2020-08-03 08:54:29 -04:00
Matt Arsenault	d8ef1d1251	AMDGPU/GlobalISel: Fix selecting broken copies for s32->s64 anyext These should probably not be legal in the first place, but that might also be a pain.	2020-08-03 08:36:41 -04:00
Matt Arsenault	212570abcf	GlobalISel: Implement bitcast action for G_EXTRACT_VECTOR_ELEMENT For AMDGPU, vectors with elements < 32 bits should be indexed in 32-bit elements and the desired bits extracted from there. For elements > 64-bits, these should be reduce to 64/32 elements to enable the normal dynamic indexing paths. In the dynamic index cases, this produces shorter code most of the time. This does immediately regress the constant index cases, but this should be fixed once we have the most basic of shift combines. The element size > 64 case is pretty much ported from the exisiting DAG implementation for extract element promote. The increasing element size case is new.	2020-08-02 10:42:07 -04:00
Benjamin Kramer	c6f08b14d4	Hide some internal symbols. NFC.	2020-07-31 17:28:02 +02:00
Matt Arsenault	57bd64ff84	Support addrspacecast initializers with isNoopAddrSpaceCast Moves isNoopAddrSpaceCast to the TargetMachine. It logically belongs with the DataLayout.	2020-07-31 10:42:43 -04:00
Vitaly Buka	b0eb40ca39	[NFC] Remove unused GetUnderlyingObject paramenter Depends on D84617. Differential Revision: https://reviews.llvm.org/D84621	2020-07-31 02:10:03 -07:00
Vitaly Buka	89051ebace	[NFC] GetUnderlyingObject -> getUnderlyingObject I am going to touch them in the next patch anyway	2020-07-30 21:08:24 -07:00
Matt Arsenault	e56e9022bc	AMDGPU: Fix liveness errors when copying AGPR tuples Avoid recursively calling copyPhysReg for AGPR handling. This was dropping the necessary super register implicit defs to avoid liveness verifier errors.	2020-07-30 18:13:04 -04:00
Changpeng Fang	243376cdc7	AMDGPU: Put inexpensive ops first in AMDGPUAnnotateUniformValues::visitLoadInst Summary: This is in response to the review of https://reviews.llvm.org/D84873: The expensive check should be reordered last Reviewers: arsenm Differential Revision: https://reviews.llvm.org/D84890	2020-07-30 14:37:06 -07:00
Stanislav Mekhanoshin	5b32518f96	[AMDGPU] Do not use undef on indirect source We are using undef on the indirect move source subreg and then using implicit super-reg. This creates a problem in RA when Greedy decides to split the register. It reassigns the implicit super-reg but does not bother to change undef source because it is really does not matter. The fix is to stop lying to RA and drop undef flag. This has also hit a problem in SIFoldOperands as it can fold immediate into an indirect move since there is no undef flag anymore. That results in multiple test failures, so added the check for this case. Differential Revision: https://reviews.llvm.org/D84899	2020-07-30 10:41:59 -07:00
hsmahesha	33fd4a18e7	[AMDGPU/MemOpsCluster] Clean-up fixme's around mem ops clustering logic Get rid of all fixmes and base heuristic on `num-clustered-dwords`. The main intuition behind this is as follows. The existing heuristic roughly summarizes as below: * Assume, all the mem ops instructions participating in the clustering process, loads/stores same num bytes * If num bytes loaded by each mem op is 4 bytes, then cluster at max 5 mem ops, that is at max 20 bytes * If num bytes loaded by each mem op is 8 bytes, then cluster at max 3 mem ops, that is at max 24 bytes * If num bytes loaded by each mem op is 16 bytes, then cluster at max 2 mem ops, that is at max 32 bytes So, we need to make sure that the new heuristic do not completey deviate away from the above one, and it properly handles both the sub-word loads and the wide loads. Reviewed By: arsenm, rampitec Differential Revision: https://reviews.llvm.org/D84354	2020-07-30 21:41:13 +05:30
Matt Arsenault	0da582d9b6	GlobalISel: Handle llvm.roundeven I still think it's highly questionable that we have two intrinsics with identical behavior and only vary by the name of the libcall used if it happens to be lowered that way, but try to reduce the feature delta between SDAG and GlobalISel for recently added intrinsics. I'm not sure which opcode should be considered the canonical one, but lower roundeven back to round.	2020-07-29 20:01:12 -04:00
Stanislav Mekhanoshin	decfdb8ce3	[AMDGPU] Fixed formatting in GCNHazardRecognizer.cpp. NFC.	2020-07-29 12:21:28 -07:00
Stanislav Mekhanoshin	13b63be472	[AMDGPU] prefer non-mfma in post-RA schedule MFMA instructions shall not be scheduled back to back to avoid MAI SIMD stall. Tell post-RA schedule we would prefer some other instruction instead. Differential Revision: https://reviews.llvm.org/D84883	2020-07-29 12:17:50 -07:00
Matt Arsenault	59fac51ff2	AMDGPU/GlobalISel: Handle llvm.amdgcn.reloc.constant	2020-07-29 14:24:21 -04:00
Matt Arsenault	0b7de7966f	GlobalISel: Implement lower for G_EXTRACT_VECTOR_ELT Use the basic store to stack and reload.	2020-07-29 14:16:28 -04:00
Matt Arsenault	766cb615a3	AMDGPU: Relax restriction on folding immediates into physregs I never completed the work on the patches referenced by `f8bf7d7f42`, but this was intended to avoid folding immediate writes into m0 which the coalescer doesn't understand very well. Relax this to allow simple SGPR immediates to fold directly into VGPR copies. This pattern shows up routinely in current GlobalISel code since nothing is smart enough to emit VGPR constants yet.	2020-07-29 14:01:53 -04:00

1 2 3 4 5 ...

5260 Commits