llvm-project

Commit Graph

Author	SHA1	Message	Date
Amara Emerson	2b067e3335	Change TargetLowering::canMergeStoresTo() to take a MF instead of DAG. DAG is unnecessary and we need this hook to implement store merging on GlobalISel too.	2021-08-06 12:57:53 -07:00
Jay Foad	3ed29f960c	[AMDGPU] NFC refactoring in isel for buffer access intrinsics Rename getBufferOffsetForMMO to updateBufferMMO and pass in the MMO to be updated, in preparation for the bug fix in D106284. Call updateBufferMMO consistently for all buffer intrinsics, even the ones that use setBufferOffsets to decompose a combined offset expression. Add a getIdxEn helper function. Differential Revision: https://reviews.llvm.org/D106354	2021-07-21 11:12:49 +01:00
Daniil Fukalov	3489c2d7b1	[TTI] NFC: Change getTypeLegalizationCost to return InstructionCost. This patch migrates the TTI cost interfaces to return an InstructionCost. See this patch for the introduction of the type: https://reviews.llvm.org/D91174 See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html Reviewed By: sdesmalen, kparzysz Differential Revision: https://reviews.llvm.org/D101533	2021-04-30 22:51:51 +03:00
Petar Avramovic	fb7be0d912	AMDGPU/GlobalISel: Remove redundant G_FCANONICALIZE Add basic version of isCanonicalized for global-isel. Copied from sdag. Add post legalizer combine that deletes G_FCANONICALIZE when its input is already Canonicalized. Differential Revision: https://reviews.llvm.org/D96605	2021-04-27 12:26:37 +02:00
Jay Foad	4af6251cea	[AMDGPU][SDag] Add IMG init in AdjustInstrPostInstrSelection Doing this in a post-isel hook avoids the cost of running SIAddIMGInit which is yet another pass over the MIR. Differential Revision: https://reviews.llvm.org/D99747	2021-04-01 18:13:17 +01:00
Konstantin Zhuravlyov	f4ace63737	AMDGPU: Add target id and code object v4 support - Add target id support (https://clang.llvm.org/docs/ClangOffloadBundler.html#target-id) - Add code object v4 support (https://llvm.org/docs/AMDGPUUsage.html#elf-code-object) - Add kernarg_size to kernel descriptor - Change trap handler ABI to no longer move queue pointer into s[0:1] - Cleanup ELF definitions - Add V2, V3, V4 suffixes to make a clear distinction for code object version - Consolidate note names Differential Revision: https://reviews.llvm.org/D95638	2021-03-24 11:54:05 -04:00
Craig Topper	11ef356d9e	[TargetLowering] Use Align in allowsMisalignedMemoryAccesses. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D96097	2021-02-04 19:22:06 -08:00
Matt Arsenault	2a0db8d70e	AMDGPU: Use more accurate fast f64 fdiv A raw v_rcp_f64 isn't accurate enough, so start applying correction.	2021-01-21 10:51:36 -05:00
dfukalov	6a87e9b08b	[NFC][AMDGPU] Reduce include files dependency. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D93813	2021-01-07 22:22:05 +03:00
Rodrigo Dominguez	f71f5f39f6	[AMDGPU] Implement hardware bug workaround for image instructions Summary: This implements a workaround for a hardware bug in gfx8 and gfx9, where register usage is not estimated correctly for image_store and image_gather4 instructions when D16 is used. Change-Id: I4e30744da6796acac53a9b5ad37ac1c2035c8899 Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D81172	2020-10-07 07:39:52 -04:00
Sebastian Neubauer	6a089ce0e4	[AMDGPU] Use tablegen for argument indices Use tablegen generic tables to get the index of image intrinsic arguments. Before, the computation of which image intrinsic argument is at which index was scattered in a few places, tablegen, the SDag instruction selection and GlobalISel. This patch changes that, so only tablegen contains code to compute indices and the ImageDimIntrinsicInfo table provides these information. Differential Revision: https://reviews.llvm.org/D86270	2020-10-05 11:50:52 +02:00
Simon Pilgrim	8adf92e2d1	[AMDGPU] Remove orphan SITargetLowering::LowerINT_TO_FP declaration. NFCI. Method implementation no longer exists.	2020-09-17 10:45:53 +01:00
Matt Arsenault	70cd9f5b77	AMDGPU/GlobalISel: Start implementing computeKnownBitsForTargetInstr Handle workitem intrinsics. There isn't really away to adequately test this right now, since none of the known bits users are fine grained enough to test the edge conditions. This triggers a number of instances of the new 64-bit to 32-bit shift combine in the existing tests.	2020-08-24 09:53:27 -04:00
Matt Arsenault	e1644a3779	GlobalISel: Reduce G_SHL width if source is extension shl ([sza]ext x, y) => zext (shl x, y). Turns expensive 64 bit shifts into 32 bit if it does not overflow the source type: This is a port of an AMDGPU DAG combine added in `5fa289f0d8`. InstCombine does this already, but we need to do it again here to apply it to shifts introduced for lowered getelementptrs. This will help matching addressing modes that use 32-bit offsets in a future patch. TableGen annoyingly assumes only a single match data operand, so introduce a reusable struct. However, this still requires defining a separate GIMatchData for every combine which is still annoying. Adds a morally equivalent function to the existing getShiftAmountTy. Without this, we would have to do try to repeatedly query the legalizer info and guess at what type to use for the shift.	2020-08-24 09:42:40 -04:00
Matt Arsenault	6c7f640bf7	AMDGPU/GlobalISel: Implement LLT version of allowsMisalignedMemoryAccesses	2020-08-06 09:50:36 -04:00
Matt Arsenault	43c0c9252a	AMDGPU: Refactor buffer atomic intrinsic lowering Move raw/struct buffer atomic lowering to separate functions. This avoids a long nested switch, and simplifies a future patch.	2020-08-05 14:44:55 -04:00
Matt Arsenault	57bd64ff84	Support addrspacecast initializers with isNoopAddrSpaceCast Moves isNoopAddrSpaceCast to the TargetMachine. It logically belongs with the DataLayout.	2020-07-31 10:42:43 -04:00
Stanislav Mekhanoshin	f7a7efbf88	[AMDGPU] Tweak getTypeLegalizationCost() Even though wide vectors are legal they still cost more as we will have to eventually split them. Not all operations can be uniformly done on vector types. Conservatively add the cost of splitting at least to 8 dwords, which is our widest possible load. We are more or less lying to cost mode with this change but this can prevent vectorizer from creation of wide vectors which results in RA problems for us. Differential Revision: https://reviews.llvm.org/D83078	2020-07-06 14:07:48 -07:00
Dmitry Preobrazhensky	1c9d681092	[AMDGPU][CODEGEN] Added support of new inline assembler constraints Added support for constraints 'I', 'J', 'B', 'C', 'DA', 'DB'. See https://gcc.gnu.org/onlinedocs/gcc/Machine-Constraints.html#Machine-Constraints. Reviewers: arsenm, rampitec Differential Revision: https://reviews.llvm.org/D81651	2020-07-02 17:20:15 +03:00
Guillaume Chatelet	52911428ef	[Alignment][NFC] Migrate AMDGPU backend to Align This patch is part of a series to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790 Differential Revision: https://reviews.llvm.org/D82743	2020-06-29 11:56:06 +00:00
Matt Arsenault	95605b784b	AMDGPU/GlobalISel: Implement computeKnownAlignForTargetInstr We probably need to move where intrinsics are lowered to copies to make this useful.	2020-06-18 17:28:00 -04:00
Stanislav Mekhanoshin	295d1fe733	[AMDGPU] Custom lowering of i64 umulo/smulo Differential Revision: https://reviews.llvm.org/D81430	2020-06-08 23:14:19 -07:00
Stanislav Mekhanoshin	5d62606f90	AMDGPU/GlobalISel: cmp/select method for extract element Differential Revision: https://reviews.llvm.org/D80749	2020-06-05 12:57:40 -07:00
Matt Arsenault	af867b7850	DAG: Change computeKnownBitsForFrameIndex to be usable by GISel This wasn't getting much value from the DAG or depth arguments, since it's only called on the frame index root nodes. FrameIndexes can also only return a scalar value, so it also didn't need DemandedElts.	2020-06-04 10:50:26 -04:00
Matt Arsenault	5e007fe998	AMDGPU: Support non-entry block static sized allocas OpenMP emits these for some reason, so handle them. Assume these use 4096 bytes by default, with a flag to override this. Also change the related stack assumption for calls to have a flag.	2020-05-27 18:46:10 -04:00
Matt Arsenault	9786e7552d	Revert "[AMDGPU] NFC target dependent requiresUniformRegister refactored out" This reverts commit `fb38b98338`. This will regress compile time.	2020-05-26 12:58:18 -04:00
alex-t	fb38b98338	[AMDGPU] NFC target dependent requiresUniformRegister refactored out Summary: Target specific method encapsulated into the Target Lowering Info. Reviewers: rampitec, vpykhtin Reviewed By: rampitec Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D70085	2020-05-26 19:49:20 +03:00
Dmitry Preobrazhensky	b087b91c91	[AMDGPU][CODEGEN] Added 'A' constraint for inline assembler Summary: 'A' constraint requires an immediate int or fp constant that can be inlined in an instruction encoding. Reviewers: arsenm, rampitec Differential Revision: https://reviews.llvm.org/D78494	2020-05-25 14:23:34 +03:00
Matt Arsenault	e6605a209c	DAG: Fix wrong legality check for ISD::FMAD Since `1725f28841`, this should check isFMADLegalForFAddFSub rather than the the plain isOperationLegal. This would assert in a subset of cases due to an oddity in how FMAD is selected. We will allow FMA formation pre-legalize, but not FMAD even in cases where it would be valid. The current hook requires passing in the root fadd/fsub. However, in this distributed case, this would be far more complicated to pass in the relevant operand. AMDGPU doesn't get any value from the node, and only needs the type and is the only implementor, so I'm not sure why we have this complexity. Just rename and expand the assert to avoid the more complicated checks spread through the distribution logic.	2020-04-13 10:25:39 -07:00
Matt Arsenault	ab7a41069e	AMDGPU: Fix using wrong instruction for FP conversion This was was never actually hit, but FTRUNC was clearly not the intent here.	2020-03-29 14:03:07 -04:00
Matt Arsenault	015b640be4	AMDGPU: Add flag to used fixed function ABI Pass all arguments to every function, rather than only passing the minimum set of inputs needed for the call graph.	2020-03-13 13:27:05 -07:00
Matt Arsenault	e240b27d6d	AMDGPU/GlobalISel: Allow arbitrary global values Treat unknown address spaces as global	2020-02-17 11:32:28 -08:00
Guillaume Chatelet	3c89b75f23	[NFC] Introduce a type to model memory operation Summary: This is a first step before changing the types to llvm::Align and introduce functions to ease client code. Reviewers: courbet Subscribers: arsenm, sdardis, nemanjai, jvesely, nhaehnle, hiraditya, kbarton, jrtc27, atanasyan, jsji, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D73785	2020-01-31 17:29:01 +01:00
Matt Arsenault	97a1d4bc02	AMDGPU: Don't use separate cache arguments for s_buffer_load node There's not much value to this separate node from the intrinsic. Make the operand structure the same as the intrinsic, so we can reuse the same pattern for GlobalISel.	2020-01-30 14:15:26 -08:00
Matt Arsenault	96352e0a1b	AMDGPU/GlobalISel: Handle LDS with relocations case	2020-01-29 08:18:55 -08:00
Matt Arsenault	255cc5a760	CodeGen: Use LLT instead of EVT in getRegisterByName Only PPC seems to be using it, and only checks some simple cases and doesn't distinguish between FP. Just switch to using LLT to simplify use from GlobalISel.	2020-01-09 17:37:52 -05:00
Matt Arsenault	db0ed3e429	AMDGPU: Refactor treatment of denormal mode Start moving towards treating this as a property of the calling convention, and not the subtarget. The default denormal mode should not be part of the subtarget, and be moved into a separate function attribute. This patch is still NFC. The denormal mode remains as a subtarget feature for now, but make the necessary changes to switch to using an attribute.	2019-11-19 19:55:43 +05:30
Matt Arsenault	b696b9dba7	DAG: Add function context to isFMAFasterThanFMulAndFAdd AMDGPU needs to know the FP mode for the function to answer this correctly when this is removed from the subtarget. AArch64 had to make this more complicated by using this from an IR hook, so add an IR typed overload.	2019-11-19 19:25:26 +05:30
Matt Arsenault	6221767055	DAG: Add DAG argument to isFPExtFoldable For AMDGPU this is dependent on the FP mode, which should eventually not be a property of the subtarget.	2019-10-31 22:32:45 -07:00
Matt Arsenault	1725f28841	DAG: Add new control for ISD::FMAD formation For AMDGPU this depends on whether denormals are enabled in the default FP mode for the function. Currently this is treated as a subtarget feature, so FMAD is selectively legal based on that. I want to move this out of the subtarget features so this can be controlled with a denormal mode attribute. Additionally, this will allow folding based on a future ftz fast math flag.	2019-10-31 07:51:38 -07:00
Matt Arsenault	171cf5302f	AMDGPU/GlobalISel: Handle flat/global G_ATOMIC_CMPXCHG Custom lower this to a target instruction with the merge operands. I think it might be better to directly select this and emit a REG_SEQUENCE, but this would be more work since it would require splitting the tablegen patterns for these cases from the other atomics.	2019-10-25 13:11:09 -07:00
Alexander Timofeev	c4d256a590	[AMDGPU] Come back patch for the 'Assign register class for cross block values according to the divergence.' Detailed description: After https://reviews.llvm.org/D59990 submit several issues were discovered. Changes in common code were preserved but AMDGPU specific part was reverted to keep the backend working correctly. Discovered issues were addressed in the following commits: https://reviews.llvm.org/D67662 https://reviews.llvm.org/D67101 https://reviews.llvm.org/D63953 https://reviews.llvm.org/D63731 This change brings back AMDGPU specific changes. Reviewed by: rampitec, arsenm Differential Revision: https://reviews.llvm.org/D68635 llvm-svn: 374767	2019-10-14 12:01:10 +00:00
Tom Stellard	3a8d80944b	AMDGPU: Add offsets to MMO when lowering buffer intrinsics Summary: Without offsets on the MachineMemOperands (MMOs), MachineInstr::mayAlias() will return true for all reads and writes to the same resource descriptor. This leads to O(N^2) complexity in the MachineScheduler when analyzing dependencies of buffer loads and stores. It also limits the SILoadStoreOptimizer from merging more instructions. This patch reduces the compile time of one pathological compute shader from 12 seconds to 1 second. Reviewers: arsenm, nhaehnle Reviewed By: arsenm Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, hiraditya, jfb, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D65097 llvm-svn: 374087	2019-10-08 17:04:51 +00:00
Matt Arsenault	f24ac13aaa	TLI: Remove DAG argument from getRegisterByName Replace with the MachineFunction. X86 is the only user, and only uses it for the function. This removes one obstacle from using this in GlobalISel. The other is the more tolerable EVT argument. The X86 use of the function seems questionable to me. It checks hasFP, before frame lowering. llvm-svn: 373292	2019-10-01 01:44:39 +00:00
Matt Arsenault	77ac400117	AMDGPU/GlobalISel: Legalize G_GLOBAL_VALUE Handle other cases besides LDS. Mostly a straight port of the existing handling, without the intermediate custom nodes. llvm-svn: 373286	2019-10-01 01:06:43 +00:00
Guillaume Chatelet	18f805a7ea	[Alignment][NFC] Remove unneeded llvm:: scoping on Align types llvm-svn: 373081	2019-09-27 12:54:21 +00:00
Matt Arsenault	c0ceca5883	AMDGPU/GlobalISel: First pass at attempting to legalize load/stores There's still a lot more to do, but this handles decomposing due to alignment. I've gotten it to the point where nothing crashes or infinite loops the legalizer. llvm-svn: 371533	2019-09-10 16:20:14 +00:00
Guillaume Chatelet	3729b17cff	[Alignment][NFC] Use llvm::Align for TargetLowering::getPrefLoopAlignment Summary: This is patch is part of a series to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790 Reviewers: courbet Reviewed By: courbet Subscribers: wuzish, arsenm, nemanjai, jvesely, nhaehnle, hiraditya, kbarton, MaskRay, jsji, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D67386 llvm-svn: 371511	2019-09-10 12:00:43 +00:00
Guillaume Chatelet	aff45e4b23	[LLVM][Alignment] Make functions using log of alignment explicit Summary: This patch renames functions that takes or returns alignment as log2, this patch will help with the transition to llvm::Align. The renaming makes it explicit that we deal with log(alignment) instead of a power of two alignment. A few renames uncovered dubious assignments: - `MirParser`/`MirPrinter` was expecting powers of two but `MachineFunction` and `MachineBasicBlock` were using deal with log2(align). This patch fixes it and updates the documentation. - `MachineBlockPlacement` exposes two flags (`align-all-blocks` and `align-all-nofallthru-blocks`) supposedly interpreted as power of two alignments, internally these values are interpreted as log2(align). This patch updates the documentation, - `MachineFunctionexposes` exposes `align-all-functions` also interpreted as power of two alignment, internally this value is interpreted as log2(align). This patch updates the documentation, Reviewers: lattner, thegameg, courbet Subscribers: dschuff, arsenm, jyknight, dylanmckay, sdardis, nemanjai, jvesely, nhaehnle, javed.absar, hiraditya, kbarton, fedor.sergeev, asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, dexonsmith, PkmX, jocewei, jsji, Jim, s.egerton, llvm-commits, courbet Tags: #llvm Differential Revision: https://reviews.llvm.org/D65945 llvm-svn: 371045	2019-09-05 10:00:22 +00:00
Matt Arsenault	3922392969	AMDGPU: Correct behavior of f16 buffer loads Don't assume format loads for f16. Also fixes support for targets without i16. llvm-svn: 367879	2019-08-05 15:59:07 +00:00

1 2 3 4

189 Commits