llvm-project

Commit Graph

Author	SHA1	Message	Date
Nicolai Haehnle	db6911a6f9	AMDGPU: Remove old-style image intrinsics Summary: This also removes the need for atomic pseudo instructions, since we select the correct encoding directly in SITargetLowering::lowerImage for dimension-aware image intrinsics. Mesa uses dimension-aware image intrinsics since commit a9a7993441. Change-Id: I7473d20009476a4ed6d919cae4e6dca9ff42e77a Reviewers: arsenm, rampitec, mareko, tpr, b-sumner Subscribers: kzhuravl, wdng, yaxunl, dstuttard, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D48167 llvm-svn: 335231	2018-06-21 13:37:45 +00:00
Nicolai Haehnle	7a9c03f484	AMDGPU: Select MIMG instructions manually in SITargetLowering Summary: Having TableGen patterns for image intrinsics is hitting limitations: for D16 we already have to manually pre-lower the packing of data values, and we will have to do the same for A16 eventually. Since there is already some custom C++ code anyway, it is arguably easier to just do everything in C++, now that we can use the beefed-up generic tables backend of TableGen to provide all the required metadata and map intrinsics to corresponding opcodes. With this approach, all image intrinsic lowering happens in SITargetLowering::lowerImage. That code is dense due to all the cases that it handles, but it should still be easier to follow than what we had before, by virtue of it all being done in a single location, and by virtue of not relying on the TableGen pattern magic that very few people really understand. This means that we will have MachineSDNodes with MIMG instructions during DAG combining, but that seems alright: previously we had intrinsic nodes instead, but those are similarly opaque to the generic CodeGen infrastructure, and the final pattern matching just did a 1:1 translation to machine instructions anyway. If anything, the fact that we now merge the address words into a vector before DAG combine should be an advantage. Change-Id: I417f26bd88f54ce9781c1668acc01f3f99774de6 Reviewers: arsenm, rampitec, rtaylor, tstellar Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D48017 llvm-svn: 335228	2018-06-21 13:36:57 +00:00
Nicolai Haehnle	0ab200b6c9	AMDGPU: Refactor MIMG instruction TableGen using generic tables Summary: This allows us to access rich information about MIMG opcodes from C++ code. Simplifying the mapping between equivalent opcodes of different data size becomes quite natural. This also flattens the MIMG-related class and multiclass hierarchy a little, and collapses together some of the scaffolding for sample and gather4 opcodes. Change-Id: I1a2549fdc1e881ff100e5393d2d87e73729a0ccd Reviewers: arsenm, rampitec Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D48016 llvm-svn: 335227	2018-06-21 13:36:44 +00:00
Nicolai Haehnle	e741d7e0fd	AMDGPU: Use generic tables instead of SearchableTable Summary: Reviewers: arsenm, rampitec Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D48014 Change-Id: Ibb43f90d955275571aff17d0c3ecfb5e5b299641 llvm-svn: 335226	2018-06-21 13:36:33 +00:00
Nicolai Haehnle	f267431901	AMDGPU: Turn D16 for MIMG instructions into a regular operand Summary: This allows us to reduce the number of different machine instruction opcodes, which reduces the table sizes and helps flatten the TableGen multiclass hierarchies. We can do this because for each hardware MIMG opcode, we have a full set of IMAGE_xxx_Vn_Vm machine instructions for all required sizes of vdata and vaddr registers. Instead of having separate D16 machine instructions, a packed D16 instructions loading e.g. 4 components can simply use the same V2 opcode variant that non-D16 instructions use. We still require a TSFlag for D16 buffer instructions, because the D16-ness of buffer instructions is part of the opcode. Renaming the flag should help avoid future confusion. The one non-obvious code change is that for gather4 instructions, the disassembler can no longer automatically decide whether to use a V2 or a V4 variant. The existing logic which choose the correct variant for other MIMG instruction is extended to cover gather4 as well. As a bonus, some of the assembler error messages are now more helpful (e.g., complaining about a wrong data size instead of a non-existing instruction). While we're at it, delete a whole bunch of dead legacy TableGen code. Change-Id: I89b02c2841c06f95e662541433e597f5d4553978 Reviewers: arsenm, rampitec, kzhuravl, artem.tamazov, dp, rtaylor Subscribers: wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D47434 llvm-svn: 335222	2018-06-21 13:36:01 +00:00
Stanislav Mekhanoshin	3b11794dbf	[AMDGPU] setcc (select cc, CT, CF), CF, eq \| ne -> xor cc, -1 \| cc This is the common case in the BE when we serialize condition and then rematerialize it. Use either original or inverted condition. Differential Revision: https://reviews.llvm.org/D48246 llvm-svn: 334882	2018-06-16 03:46:59 +00:00
Matt Arsenault	63bc0e3cb9	AMDGPU: Add combine for short vector extract_vector_elts Try to access pieces 4 bytes at a time. This helps various hasOneUse extract_vector_elt combines, such as load width reductions. Avoids test regressions in a future commit. llvm-svn: 334836	2018-06-15 15:31:36 +00:00
Matt Arsenault	02dc7e19e2	AMDGPU: Make v4i16/v4f16 legal Some image loads return these, and it's awkward working around them not being legal. llvm-svn: 334835	2018-06-15 15:15:46 +00:00
Tom Stellard	264c171f36	AMDGPU: Move isSDNodeSourceOfDivergence() implementation to SITargetLowering Summary: The code that handles ISD:Register and ISD::CopyFromReg assumes the target is amdgcn, so this is broken on r600. We don't need this analysis on r600 anyway so we can safely move it to SITargetLowering. Reviewers: alex-t, arsenm, nhaehnle Reviewed By: arsenm Subscribers: msearles, kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D46298 llvm-svn: 334607	2018-06-13 15:06:37 +00:00
Stanislav Mekhanoshin	8fd3c4e431	[AMDGPU] DAG combine to produce V_PERM_B32 Differential Revision: https://reviews.llvm.org/D48099 llvm-svn: 334559	2018-06-12 23:50:37 +00:00
Daniil Fukalov	c9a098b314	[AMDGPU] Inline asm - added i16, half and i128 types support AMDGPU inline assembler support i16, half and i128 typed variables in constraints, but they were reported as error. Needed to fix https://github.com/RadeonOpenCompute/ROCm/issues/341, e.g. to be able to load with global_load_dwordx4 to a 128bit integer variable Differential Revision: https://reviews.llvm.org/D44920 llvm-svn: 334301	2018-06-08 16:29:04 +00:00
Matt Arsenault	90083d3088	AMDGPU: Try a lot harder to emit scalar loads This has two main components. First, widen widen short constant loads in DAG when they have the correct alignment. This is already done a bit in AMDGPUCodeGenPrepare, since that has access to DivergenceAnalysis. This can't help kernarg loads created in the DAG. Start to use DAG divergence analysis to help this case. The second part is to avoid kernel argument lowering breaking the alignment of short vector elements because calling convention lowering wants to split everything into legal register types. When loading a split type, load the nearest 4-byte aligned segment and shift to get the desired bits. This extra load of the earlier argument piece ends up merging, and the bit extract hopefully folds out. There are a number of improvements and regressions with this, but I think as-is this is a better compromise between several of the worst parts of SelectionDAG. Particularly when i16 is legal, this produces worse code for i8 and i16 element vector kernel arguments. This is partially due to the very weak load merging the DAG does. It only looks for fairly specific combines between pairs of loads which no longer appear. In particular this causes v4i16 loads to be split into 2 components when previously the two halves were merged. Worse, because of the newly introduced shifts, there is a lot more unnecessary vector packing and unpacking code emitted. At least some of this is due to reporting false for isTypeDesirableForOp for i16 as a workaround for the lack of divergence information in the DAG. The cases where this happens it doesn't actually matter, but the relevant code in SimplifyDemandedBits doens't have the context to know to ignore this. The use of the scalar cache is probably more important than the mess of mostly scalar instructions doing this packing and unpacking. Future work can fix this, possibly by making better use of the new DAG divergence information for controlling promotion decisions, or adding another version of shift + trunc + shift combines that doesn't only know about the used types. llvm-svn: 334180	2018-06-07 09:54:49 +00:00
Matt Arsenault	e9524f1fb3	AMDGPU: Custom lower v2f16 fneg/fabs with illegal f16 Fixes terrible code on targets without f16 support. The legalization creates a mess that is difficult to recover from. Also should avoid randomly breaking these tests multiple times in sequence in future commits. Some regressions in cases where it happens to be better to pull the source modifier after the conversion. llvm-svn: 334132	2018-06-06 21:28:11 +00:00
Matt Arsenault	9224c00d2b	AMDGPU: Use more custom insert/extract_vector_elt lowering Apply to i8 vectors. llvm-svn: 334044	2018-06-05 19:52:46 +00:00
Amaury Sechet	8467411dad	Set ADDE/ADDC/SUBE/SUBC to expand by default Summary: They've been deprecated in favor of UADDO/ADDCARRY or USUBO/SUBCARRY for a while. Target that uses these opcodes are changed in order to ensure their behavior doesn't change. Reviewers: efriedma, craig.topper, dblaikie, bkramer Subscribers: jholewinski, arsenm, jyknight, sdardis, nemanjai, nhaehnle, kbarton, fedor.sergeev, asb, rbar, johnrusso, simoncook, jordy.potman.lists, apazos, sabuasal, niosHD, jrtc27, zzheng, edward-jones, mgrang, atanasyan, llvm-commits Differential Revision: https://reviews.llvm.org/D47422 llvm-svn: 333748	2018-06-01 13:21:33 +00:00
Stanislav Mekhanoshin	d4b500cb08	[AMDGPU] Track occupancy in MFI Keep track of achieved occupancy in SIMachineFunctionInfo. At the moment we have a lot of duplicated or even missed code to query and maintain occupancy info. Record it in the MFI and query in a single call. Interfaces: - getOccupancy() - returns current recorded achieved occupancy. - getMinAllowedOccupancy() - returns lesser of the achieved occupancy and the lowest occupancy we are ready to tolerate. For example if a kernel is memory bound we are ready to tolerate 4 waves. - limitOccupancy() - record occupancy level if we have to lower it. - increaseOccupancy() - record occupancy if scheduler managed to increase the occupancy. MFI takes care of integrating different checks affecting occupancy, including LDS use and waves-per-eu attribute. Note that scheduler starts with not yet known register pressure, so has to record either limit or increase in occupancy after it is done. Later passes can just query a resulting value. New interface is used in the active scheduler and NFC wrt its work. Changes are also made to experimental schedulers to use it and record an occupancy after they are done. Before the change waves-per-eu was ignored by experimental schedulers and tolerance window for memory bound kernels was not used. Differential Revision: https://reviews.llvm.org/D47509 llvm-svn: 333629	2018-05-31 05:36:04 +00:00
Matt Arsenault	7b4826e6ce	AMDGPU: Use better alignment for kernarg lowering This was just emitting loads with the ABI alignment for the raw type. The true alignment is often better, especially when an illegal vector type was scalarized. The better alignment allows using a scalar load more often. llvm-svn: 333558	2018-05-30 16:17:51 +00:00
Matt Arsenault	ceafc55e5a	AMDGPU: Pass function directly instead of MachineFunction These functions just query the underlying IR function, so pass it directly. llvm-svn: 333442	2018-05-29 17:42:50 +00:00
Matt Arsenault	2fb9ccf770	AMDGPU: Add nuw to add off of kernarg ptr llvm-svn: 333441	2018-05-29 17:42:38 +00:00
Tim Renouf	fa213f797b	[AMDGPU] Fixed build warning Summary: V2: Use cast instead of extra if. Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D47426 Change-Id: I6ac31da0306f79706960284a7ebd7b9c6237a83a llvm-svn: 333397	2018-05-29 08:15:37 +00:00
Tom Stellard	1b95fed6f7	AMDGPU/R600: Remove code for handling AMDGPUISD::CLAMP Summary: We don't generate AMDGPUISD::CLAMP for R600 now that llvm.AMDGPU.clamp is gone. Reviewers: arsenm, nhaehnle Reviewed By: arsenm Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D47181 llvm-svn: 333153	2018-05-24 05:28:34 +00:00
Tom Stellard	b12f4dec08	AMDGPU: Move AMDGPUTargetLowering::isFPExtFoldable() into SITargetLowering Summary: This is always false for R600. Reviewers: arsenm, nhaehnle Reviewed By: arsenm Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D47180 llvm-svn: 333016	2018-05-22 19:37:55 +00:00
Matt Arsenault	1349a04ef5	AMDGPU: Make v2i16/v2f16 legal on VI This usually results in better code. Fixes using inline asm with short2, and also fixes having a different ABI for function parameters between VI and gfx9. Partially cleans up the mess used for lowering of the d16 operations. Making v4f16 legal will help clean this up more, but this requires additional work. llvm-svn: 332953	2018-05-22 06:32:10 +00:00
Tom Stellard	44b30b4537	AMDGPU: Remove #include "MCTargetDesc/AMDGPUMCTargetDesc.h" from common headers Summary: MCTargetDesc/AMDGPUMCTargetDesc.h contains enums for all the instuction and register defintions, which are huge so we only want to include them where needed. This will also make it easier if we want to split the R600 and GCN definitions into separate tablegenerated files. I was unable to remove AMDGPUMCTargetDesc.h from SIMachineFunctionInfo.h because it uses some enums from the header to initialize default values for the SIMachineFunction class, so I ended up having to remove includes of SIMachineFunctionInfo.h from headers too. Reviewers: arsenm, nhaehnle Reviewed By: nhaehnle Subscribers: MatzeB, kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, javed.absar, llvm-commits Differential Revision: https://reviews.llvm.org/D46272 llvm-svn: 332930	2018-05-22 02:03:23 +00:00
Tony Tye	43259df44a	[AMDGPU] Change llvm.debugtrap to be a debug breakpoint that can resume execution. No longer require the queue pointer to be passed in in fixed SGPRs. Differential Revision: https://reviews.llvm.org/D46769 llvm-svn: 332485	2018-05-16 16:19:34 +00:00
Matt Arsenault	67a9815a5c	AMDGPU: Custom lower v4i16/v4f16 vector operations Avoids stack access. Also handle extract hi elt pattern from truncate + shift to avoid a couple test regressions. llvm-svn: 332453	2018-05-16 11:47:30 +00:00
Stanislav Mekhanoshin	57d341c27a	[AMDGPU] Fix handling of void types in isLegalAddressingMode It is legal for the type passed to isLegalAddressingMode to be unsized or, more specifically, VoidTy. In this case, we must check the legality of load / stores for all legal types. Directly trying to call getTypeStoreSize is incorrect, and leads to breakage in e.g. Loop Strength Reduction. This change guards against that behaviour. Differential Revision: https://reviews.llvm.org/D40405 llvm-svn: 332409	2018-05-15 22:07:51 +00:00
Matt Arsenault	dfb88dfe30	AMDGPU: Make undef legal for v2i16/v2f16 This is apparently necessary to stop undef from being turned into a build_vector of 0s. llvm-svn: 332195	2018-05-13 10:04:38 +00:00
Farhana Aleen	e24f3ff8de	[AMDGPU] Support horizontal vectorization of min/max. Author: FarhanaAleen Reviewed By: rampitec Subscribers: AMDGPU Differential Revision: https://reviews.llvm.org/D46604 llvm-svn: 331920	2018-05-09 21:18:34 +00:00
Matt Arsenault	378f86998c	AMDGPU: Stop special casing constant indexes of extract_vector_elt The same result folds out of the dynamic expansion logic if the index is constant. llvm-svn: 331906	2018-05-09 18:29:26 +00:00
Matt Arsenault	869cbedc81	AMDGPU: Fix broken dynamic vector indexing for packed types The intention of this was to multiply by 16, not shift by 16. llvm-svn: 331793	2018-05-08 18:43:25 +00:00
Michael Berg	7acc81b744	Fast Math Flag mapping into SDNode Summary: Adding support for Fast flags in the SDNode to leverage fast math sub flag usage. Reviewers: spatel, arsenm, jbhateja, hfinkel, escha, qcolombet, echristo, wristow, javed.absar Reviewed By: spatel Subscribers: llvm-commits, rampitec, nhaehnle, tstellar, FarhanaAleen, nemanjai, javed.absar, jbhateja, hfinkel, wdng Differential Revision: https://reviews.llvm.org/D45710 llvm-svn: 331547	2018-05-04 18:48:20 +00:00
Farhana Aleen	07e612340f	[AMDGPU] A trivial fix for a buildbot failure caused by "commit 224a839fcbbead221f872cd32a1dd0c308d37299". Author: FarhanaAleen llvm-svn: 331383	2018-05-02 18:16:39 +00:00
Farhana Aleen	150cb6d91a	Revert "[AMDGPU] performAddCombine should run after DAG is legalized." This reverts commit 6b97d2995566b4dddd6bf0d75579ff44501d4494. llvm-svn: 331371	2018-05-02 16:48:52 +00:00
Farhana Aleen	2f4100f56e	[AMDGPU] performAddCombine should run after DAG is legalized. Summary: performAddCombine should run after DAG is legalized; Otherwise generic optimization in the DAGCombiner can optimize an addcarry+trunc into an addcarry instruction with illegal types. Author: FarhanaAleen Reviewed By: rampitec Subscribers: llvm-commits, AMDGPU Differential Revision: https://reviews.llvm.org/D46337 llvm-svn: 331368	2018-05-02 16:24:10 +00:00
Farhana Aleen	e2dfe8a853	[AMDGPU] Support horizontal vectorization. Author: FarhanaAleen Reviewed By: rampitec, arsenm Subscribers: llvm-commits, AMDGPU Differential Revision: https://reviews.llvm.org/D46213 llvm-svn: 331313	2018-05-01 21:41:12 +00:00
Adrian Prantl	5f8f34e459	Remove \brief commands from doxygen comments. We've been running doxygen with the autobrief option for a couple of years now. This makes the \brief markers into our comments redundant. Since they are a visual distraction and we don't want to encourage more \brief markers in new code either, this patch removes them all. Patch produced by for i in $(git grep -l '\\brief'); do perl -pi -e 's/\\brief //g' $i & done Differential Revision: https://reviews.llvm.org/D46290 llvm-svn: 331272	2018-05-01 15:54:18 +00:00
Matt Arsenault	0084adc516	AMDGPU: Add Vega12 and Vega20 Changes by Matt Arsenault Konstantin Zhuravlyov llvm-svn: 331215	2018-04-30 19:08:16 +00:00
Matt Arsenault	540512c297	DAG: Fix not legalizing vector fcanonicalizes If an fcanoncialize was done on a vector type that was legal, llvm-svn: 330981	2018-04-26 19:21:37 +00:00
Matt Arsenault	fcc5ba46b7	AMDGPU: Extend extract_vector_elt fneg combine to fabs Fixes a regression in a future commit. llvm-svn: 330980	2018-04-26 19:21:32 +00:00
Hiroshi Inoue	372ffa15cb	[NFC] fix trivial typos in comments "the the" -> "the", "we we" -> "we", etc llvm-svn: 330006	2018-04-13 11:37:06 +00:00
Marek Olsak	a9a58fa236	AMDGPU: enable 128-bit for local addr space under an option Author: Samuel Pitoiset ds_read_b128 and ds_write_b128 have been recently enabled under the amdgpu-ds128 option because the performance benefit is unclear. Though, using 128-bit loads/stores for the local address space appears to introduce regressions in tessellation shaders. Not sure what is broken, but as ds_read_b128/ds_write_b128 are not enabled by default, just introduce a global option and enable 128-bit only if requested (until it's fixed/used correctly). v2: - fix regressions in merge-stores.ll and multiple_tails.ll Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=105464 llvm-svn: 329764	2018-04-10 22:48:23 +00:00
Alex Shlyapnikov	79f2c720b5	Revert "AMDGPU: enable 128-bit for local addr space under an option" This reverts commit r329591. It breaks various bots: http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux-fast/builds/16516 http://lab.llvm.org:8011/builders/clang-ppc64be-linux/builds/17374 http://lab.llvm.org:8011/builders/clang-ppc64le-linux/builds/15992 http://lab.llvm.org:8011/builders/clang-ppc64be-linux-lnt http://lab.llvm.org:8011/builders/clang-ppc64le-linux-lnt/builds/11251 ... llvm-svn: 329610	2018-04-09 19:47:38 +00:00
Marek Olsak	52b033b827	AMDGPU: enable 128-bit for local addr space under an option Author: Samuel Pitoiset ds_read_b128 and ds_write_b128 have been recently enabled under the amdgpu-ds128 option because the performance benefit is unclear. Though, using 128-bit loads/stores for the local address space appears to introduce regressions in tessellation shaders. Not sure what is broken, but as ds_read_b128/ds_write_b128 are not enabled by default, just introduce a global option and enable 128-bit only if requested (until it's fixed/used correctly). Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=105464 llvm-svn: 329591	2018-04-09 16:56:32 +00:00
Nicolai Haehnle	2f5a73820c	AMDGPU: Dimension-aware image intrinsics Summary: These new image intrinsics contain the texture type as part of their name and have each component of the address/coordinate as individual parameters. This is a preparatory step for implementing the A16 feature, where coordinates are passed as half-floats or -ints, but the Z compare value and texel offsets are still full dwords, making it difficult or impossible to distinguish between A16 on or off in the old-style intrinsics. Additionally, these intrinsics pass the 'texfailpolicy' and 'cachectrl' as i32 bit fields to reduce operand clutter and allow for future extensibility. v2: - gather4 supports 2darray images - fix a bug with 1D images on SI Change-Id: I099f309e0a394082a5901ea196c3967afb867f04 Reviewers: arsenm, rampitec, b-sumner Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, llvm-commits, t-tye Differential Revision: https://reviews.llvm.org/D44939 llvm-svn: 329166	2018-04-04 10:58:54 +00:00
Farhana Aleen	e80aeac0f2	[AMDGPU] performMinMaxCombine should not optimize patterns of vectors to min3/max3. Summary: There are no packed instructions for min3 or max3. So, performMinMaxCombine should not optimize vectors of f16 to min3/max3. Author: FarhanaAleen Reviewed By: arsenm Subscribers: llvm-commits, AMDGPU Differential Revision: https://reviews.llvm.org/D45219 llvm-svn: 329131	2018-04-03 23:00:30 +00:00
Farhana Aleen	936947349a	Revert "MSG" This reverts commit 9a0ce889d1c39c74d69ecad5ce9c875155ae55de. This was committed by mistake. llvm-svn: 329119	2018-04-03 21:51:45 +00:00
Farhana Aleen	3ab409dc86	MSG llvm-svn: 329114	2018-04-03 21:20:39 +00:00
Nicolai Haehnle	5d0d30304c	AMDGPU: Make getTgtMemIntrinsic table-driven for resource-based intrinsics Summary: Avoids having to list all intrinsics manually. This is in preparation for the new dimension-aware image intrinsics, which I'd rather not have to list here by hand. Change-Id: If7ced04998397ef68c4cb8f7de66b5050fb767e5 Reviewers: arsenm, rampitec, b-sumner Subscribers: kzhuravl, wdng, mgorny, yaxunl, dstuttard, tpr, llvm-commits, t-tye Differential Revision: https://reviews.llvm.org/D44937 llvm-svn: 328938	2018-04-01 17:09:07 +00:00
Matt Arsenault	6c041a3cab	AMDGPU: Fix selection error on constant loads with < 4 byte alignment llvm-svn: 328818	2018-03-29 19:59:28 +00:00
Craig Topper	2fa1436206	[IR][CodeGen] Remove dependency on EVT from IR/Function.cpp. Move EVT to CodeGen layer. Currently EVT is in the IR layer only because of Function.cpp needing a very small piece of the functionality of EVT::getEVTString(). The rest of EVT is used in codegen making CodeGen a better place for it. The previous code converted a Type* to EVT and then called getEVTString. This was only expected to handle the primitive types from Type*. Since there only a few primitive types, we can just print them as strings directly. Differential Revision: https://reviews.llvm.org/D45017 llvm-svn: 328806	2018-03-29 17:21:10 +00:00
Matt Arsenault	0a0c871f60	AMDGPU: Fix crash when MachinePointerInfo invalid The combine on a select of a load only triggers for addrspace 0, and discards the MachinePointerInfo. The conservative default needs to be used for this. llvm-svn: 328652	2018-03-27 18:39:45 +00:00
Matt Arsenault	e9f3679031	AMDGPU: Fix FP restore from being reordered with stack ops In a function, s5 is used as the frame base SGPR. If a function is calling another function, during the call sequence it is copied to a preserved SGPR and restored. Before it was possible for the scheduler to move stack operations before the restore of s5, since there's nothing to associate a frame index access with the restore. Add an implicit use of s5 to the adjcallstack pseudo which ends the call sequence to preven this from happening. I'm not 100% satisfied with this solution, but I'm not sure what else would be better. llvm-svn: 328650	2018-03-27 18:38:51 +00:00
David Blaikie	36a0f226b1	Fix layering by moving ValueTypes.h from CodeGen to IR ValueTypes.h is implemented in IR already. llvm-svn: 328397	2018-03-23 23:58:31 +00:00
David Blaikie	13e77db2df	Fix layering of MachineValueType.h by moving it from CodeGen to Support This is used by llvm tblgen as well as by LLVM Targets, so the only common place is Support for now. (maybe we need another target for these sorts of things - but for now I'm at least making them correct & we can make them better if/when people have strong feelings) llvm-svn: 328395	2018-03-23 23:58:25 +00:00
Farhana Aleen	c6c9dc8773	[AMDGPU] Supported ds_write_b128 generation. Summary: This is a follow-on patch of https://reviews.llvm.org/D44210 Author: FarhanaAleen Reviewed By: msearles Subscribers: llvm-commits, AMDGPU Differential Revision: https://reviews.llvm.org/D44319 llvm-svn: 327726	2018-03-16 18:12:00 +00:00
Farhana Aleen	a7cb31123c	[AMDGPU] Supported ds_read_b128 generation; Widened vector length for local address-space. Summary: Starting from GCN 2nd generation, ISA supports ds_read_b128 on top of ds_read_b64. This patch supports ds_read_b128 instruction pattern and generation of this instruction. In the vectorizer, this patch also widen the vector length so that vectorizer generates 128 bit loads for local address-space which gets translated to ds_read_b128. Since the performance benefit is not clear; compiler generates ds_read_b128 under -amdgpu-ds128. Author: FarhanaAleen Reviewed By: rampitec, arsenm Subscribers: llvm-commits, AMDGPU Differential Revision: https://reviews.llvm.org/D44210 llvm-svn: 327153	2018-03-09 17:41:39 +00:00
Farhana Aleen	89196642f7	[AMDGPU] Increased vector length for global/constant loads. Summary: GCN ISA supports instructions that can read 16 consecutive dwords from memory through the scalar data cache; loadstoreVectorizer should take advantage of the wider vector length and pack 16/8 elements of dwords/quadwords. Author: FarhanaAleen Reviewed By: rampitec Subscribers: llvm-commits, AMDGPU Differential Revision: https://reviews.llvm.org/D44179 llvm-svn: 326910	2018-03-07 17:09:18 +00:00
Farhana Aleen	347d12b4ce	Revert "[AMDGPU] Widened vector length for global/constant address space." This reverts commit ce988cc100dc65e7c6c727aff31ceb99231cab03. llvm-svn: 326907	2018-03-07 16:55:27 +00:00
Farhana Aleen	0d03d0588d	[AMDGPU] Widened vector length for global/constant address space. llvm-svn: 326904	2018-03-07 16:29:05 +00:00
Craig Topper	80d3bb3b4b	[TargetLowering] Rename DAGCombinerInfo::isAfterLegalizeVectorOps to DAGCombiner::isAfterLegalizeDAG since that's what it checks. NFC The code checks Level == AfterLegalizeDAG which is the fourth and last of the possible DAG combine stages that we have. There is a Level called AfterLegalVectorOps, but that's the third DAG combine and it doesn't always run. A function called isAfterLegalVectorOps should imply it returns true in either of the DAG combines that runs after the legalize vector ops stage, but that's not what this function does. llvm-svn: 326832	2018-03-06 19:44:52 +00:00
Alexander Timofeev	2e5eeceeb7	Pass Divergence Analysis data to Selection DAG to drive divergence dependent instruction selection. Differential revision: https://reviews.llvm.org/D35267 llvm-svn: 326703	2018-03-05 15:12:21 +00:00
Jan Vesely	b283ea0f0f	AMDGPU/GCN: Promote i16 ctpop i16 capable ASICs do not support i16 operands for this instruction. Add tablegen pattern to merge chained i16 additions. Differential Revision: https://reviews.llvm.org/D43985 llvm-svn: 326535	2018-03-02 02:50:22 +00:00
Changpeng Fang	da38b5fd49	AMDGPU/SI: Turn off GPR Indexing Mode immediately after the interested instruction. Summary: In the current implementation of GPR Indexing Mode when the index is of non-uniform, the s_set_gpr_idx_off instruction is incorrectly inserted after the loop. This will lead the instructions with vgpr operands (v_readfirstlane for example) to read incorrect vgpr. In this patch, we fix the issue by inserting s_set_gpr_idx_on/off immediately around the interested instruction. Reviewers: rampitec Differential Revision: https://reviews.llvm.org/D43297 llvm-svn: 325355	2018-02-16 16:31:30 +00:00
Stanislav Mekhanoshin	c078ca92eb	[AMDGPU] Remove non-temporal flag from argument loads Kernel arguments likely read by all workitems and should not bypass cache. Fixes performance hit in sub-dword argument loads. Differential Revision: https://reviews.llvm.org/D43249 llvm-svn: 325146	2018-02-14 18:05:14 +00:00
Matt Arsenault	923712b6b5	Reapply "AMDGPU: Add 32-bit constant address space" This reverts r324494 and reapplies r324487. llvm-svn: 324747	2018-02-09 16:57:57 +00:00
Matt Arsenault	bcf7bec4b8	AMDGPU: Fix layering issue Move utility function that depends on codegen. Fixes build with r324487 reapplied. llvm-svn: 324746	2018-02-09 16:57:48 +00:00
Rafael Espindola	f4e3f3e31c	Revert "AMDGPU: Add 32-bit constant address space" This reverts commit r324487. It broke clang tests. llvm-svn: 324494	2018-02-07 18:09:35 +00:00
Marek Olsak	871c30e540	AMDGPU: Add 32-bit constant address space Note: This is a candidate for LLVM 6.0, because it was planned to be in that release but was delayed due to a long review period. Merge conflict in release_60 - resolution: Add "-p6:32:32" into the second (non-amdgiz) string. Only scalar loads support 32-bit pointers. An address in a VGPR will fail to compile. That's OK because the results of loads will only be used in places where VGPRs are forbidden. Updated AMDGPUAliasAnalysis and used SReg_64_XEXEC. The tests cover all uses cases we need for Mesa. Reviewers: arsenm, nhaehnle Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D41651 llvm-svn: 324487	2018-02-07 16:01:00 +00:00
Marek Olsak	13e4741275	AMDGPU: Add intrinsics llvm.amdgcn.cvt.{pknorm.i16, pknorm.u16, pk.i16, pk.u16} Reviewers: arsenm, nhaehnle Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye Differential Revision: https://reviews.llvm.org/D41663 llvm-svn: 323908	2018-01-31 20:18:04 +00:00
Daniil Fukalov	6e1dc68117	[AMDGPU] fix LDS f32 intrinsics - using qualified pointer addrspace in intrinsics class to avoid .f32 mangling - changed too common atomic mangling to ds - added missing intrinsics to AMDGPUTTIImpl::getTgtMemIntrinsic Reviewed by: b-sumner Differential Revision: https://reviews.llvm.org/D42383 llvm-svn: 323516	2018-01-26 11:09:38 +00:00
Changpeng Fang	4737e892de	AMDGPU/SI: Add d16 support for image intrinsics. Summary: This patch implements d16 support for image load, image store and image sample intrinsics. Reviewers: Matt, Brian. Differential Revision: https://reviews.llvm.org/D3991 llvm-svn: 322903	2018-01-18 22:08:53 +00:00
Daniil Fukalov	d5fca554e2	[AMDGPU] add LDS f32 intrinsics added llvm.amdgcn.atomic.{add\|min\|max}.f32 intrinsics to allow generate ds_{add\|min\|max}[_rtn]_f32 instructions needed for OpenCL float atomics in LDS Reviewed by: arsenm Differential Revision: https://reviews.llvm.org/D37985 llvm-svn: 322656	2018-01-17 14:05:05 +00:00
Changpeng Fang	44dfa1de3b	AMDGPU/SI: Add d16 support for buffer intrinsics. Differential Revision: https://reviews.llvm.org/D38906 Reviewers: Matt and Brian. llvm-svn: 322402	2018-01-12 21:12:19 +00:00
Matt Arsenault	e19bc2ee0f	AMDGPU: Use unique PSVs for buffer resources Also fixes using the wrong memory type for some intrinsics when custom lowering them. llvm-svn: 321557	2017-12-29 17:18:21 +00:00
Matt Arsenault	905f3518ba	AMDGPU: Implement getTgtMemIntrinsic for images Currently all images are lowered to have a single image PseudoSourceValue. Image stores happen to have overly strict mayLoad/mayStore/hasSideEffects flags set on them, so this happens to work. When these are fixed to be correct, the scheduler breaks this because the identical PSVs are assumed to be the same address. These need to be unique to the image resource value. llvm-svn: 321555	2017-12-29 17:18:14 +00:00
Matthias Braun	f1caa2833f	MachineFunction: Return reference from getFunction(); NFC The Function can never be nullptr so we can return a reference. llvm-svn: 320884	2017-12-15 22:22:58 +00:00
Matt Arsenault	7d7adf4f2e	TLI: Allow using PSV for intrinsic mem operands llvm-svn: 320756	2017-12-14 22:34:10 +00:00
Matt Arsenault	1117133687	DAG: Expose all MMO flags in getTgtMemIntrinsic Rather than adding more bits to express every MMO flag you could want, just directly use the MMO flags. Also fixes using a bunch of bool arguments to getMemIntrinsicNode. On AMDGPU, buffer and image intrinsics should always have MODereferencable set, but currently there is no way to do that directly during the initial intrinsic lowering. llvm-svn: 320746	2017-12-14 21:39:51 +00:00
Matt Arsenault	cad7fa857c	AMDGPU: Partially fix disassembly of MIMG instructions Stores failed to decode at all since they didn't have a DecoderNamespace set. Loads worked, but did not change the register width displayed to match the numbmer of enabled channels. The number of printed registers for vaddr is still wrong, but I don't think that's encoded in the instruction so there's not much we can do about that. Image atomics are still broken. MIMG is the same encoding for SI/VI, but the image atomic classes are split up into encoding specific versions unlike every other MIMG instruction. They have isAsmParserOnly set on them for some reason. dmask is also special for these, so we probably should not have it as an explicit operand as it is now. llvm-svn: 320614	2017-12-13 21:07:51 +00:00
Matt Arsenault	856777d8c9	AMDGPU: image_getlod and image_getresinfo do not read memory llvm-svn: 320187	2017-12-08 20:00:57 +00:00
Matt Arsenault	ecad0d5364	AMDGPU: Preserve MMO in adjustWritemask Follow up to r319705. Currently the MMO is produced after this in the custom inserter, so this doesn't change anything yet. llvm-svn: 320186	2017-12-08 20:00:45 +00:00
Matt Arsenault	68f0505263	AMDGPU: Fix creating invalid copy when adjusting dmask Move the entire optimization to one place. Before it was possible to adjust dmask without changing the register class of the output instruction, since they were done in separate places. Fix all lane sizes and move all of the optimization into the DAG folding. llvm-svn: 319705	2017-12-04 22:18:27 +00:00
Matt Arsenault	84445dd13c	AMDGPU: Use gfx9 carry-less add/sub instructions llvm-svn: 319491	2017-11-30 22:51:26 +00:00
Matt Arsenault	b655fa9ce2	DAG: Add nuw when splitting loads and stores The object can't straddle the address space wrap around, so I think it's OK to assume any offsets added to the base object pointer can't overflow. Similar logic already appears to be applied in SelectionDAGBuilder when lowering aggregate returns. llvm-svn: 319272	2017-11-29 01:25:12 +00:00
Francis Visoiu Mistrih	9d7bb0cb40	[CodeGen] Print register names in lowercase in both MIR and debug output As part of the unification of the debug format and the MIR format, always print registers as lowercase. * Only debug printing is affected. It now follows MIR. Differential Revision: https://reviews.llvm.org/D40417 llvm-svn: 319187	2017-11-28 17:15:09 +00:00
Yaxun Liu	c596226604	[AMDGPU] Fix SITargetLowering::LowerCall for pointer info of byval argument SITargetLowering::LowerCall uses dummy pointer info for byval argument, which causes flat load instead of buffer load. This patch fixes that. Differential Revision: https://reviews.llvm.org/D40040 llvm-svn: 318844	2017-11-22 16:13:35 +00:00
David Blaikie	b3bde2ea50	Fix a bunch more layering of CodeGen headers that are in Target All these headers already depend on CodeGen headers so moving them into CodeGen fixes the layering (since CodeGen depends on Target, not the other way around). llvm-svn: 318490	2017-11-17 01:07:10 +00:00
Matt Arsenault	301162c4fe	AMDGPU: Replace i64 add/sub lowering Use VOP3 add/addc like usual. This has some tradeoffs. Inline immediates fold a little better, but other constants are worse off. SIShrinkInstructions could be made smarter to handle these cases. This allows us to avoid selecting scalar adds where we need to track the carry in scc and replace its users. This makes it easier to use the carryless VALU adds. llvm-svn: 318340	2017-11-15 21:51:43 +00:00
Matt Arsenault	45b98189bd	AMDGPU: Don't use MUBUF vaddr if address may overflow Effectively revert r263964. Before we would not allow this if vaddr was not known to be positive. llvm-svn: 318240	2017-11-15 00:45:43 +00:00
Matt Arsenault	c8903125cd	AMDGPU: Handle or in multi-use shl ptr combine llvm-svn: 318223	2017-11-14 23:46:42 +00:00
Matt Arsenault	e5e0c742df	AMDGPU: Preserve nuw in shl add ptr combine llvm-svn: 318017	2017-11-13 05:33:35 +00:00
Matt Arsenault	fbe9533509	AMDGPU: Fix multi-use shl/add combine This was using a custom function that didn't handle the addressing modes properly for private. Use isLegalAddressingMode to avoid duplicating this. Additionally, skip the combine if there is only one use since the standard combine will handle it. llvm-svn: 318013	2017-11-13 05:11:54 +00:00
Marek Olsak	5cec64195c	AMDGPU: Lower buffer store and atomic intrinsics manually Summary: Without this, SIMemoryLegalizer inserts s_waitcnt vmcnt(0) before every buffer store and atomic instruction. Reviewers: arsenm, nhaehnle Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, llvm-commits, t-tye Differential Revision: https://reviews.llvm.org/D39060 llvm-svn: 317754	2017-11-09 01:52:48 +00:00
Matt Arsenault	4f6318fe1b	AMDGPU: Select v_mad_u64_u32 and v_mad_i64_i32 llvm-svn: 317492	2017-11-06 17:04:37 +00:00
Yaxun Liu	1ac16619d2	[AMDGPU] Fix assertion due to assuming pointer in default addr space is 32 bit The backend assumes pointer in default addr space is 32 bit, which is not true for the new addr space mapping and causes assertion for unresolved functions. This patch fixes that. Differential Revision: https://reviews.llvm.org/D39643 llvm-svn: 317476	2017-11-06 13:01:33 +00:00
Marek Olsak	ce76ea0394	AMDGPU: Add new intrinsic llvm.amdgcn.kill(i1) Summary: Kill the thread if operand 0 == false. llvm.amdgcn.wqm.vote can be applied to the operand. Also allow kill in all shader stages. Reviewers: arsenm, nhaehnle Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D38544 llvm-svn: 316427	2017-10-24 10:27:13 +00:00
Mark Searles	4e3d6160db	Use the return value of UpdateNodeOperands(); in some cases, UpdateNodeOperands() modifies the node in-place and using the return value isn’t strictly necessary. However, it does not necessarily modify the node, but may return a resultant node if it already exists in the DAG. See comments in UpdateNodeOperands(). In that case, the return value must be used to avoid such scenarios as an infinite loop (node is assumed to have been updated, so added back to the worklist, and re-processed; however, node hasn’t changed so it is once again passed to UpdateNodeOperands(), assumed modified, added back to worklist; cycle infinitely repeats). Differential Revision: https://reviews.llvm.org/D38466 llvm-svn: 315957	2017-10-16 23:38:53 +00:00
Matt Arsenault	e11d8aca77	AMDGPU: Implement hasBitPreservingFPLogic llvm-svn: 315754	2017-10-13 21:10:22 +00:00
Tim Renouf	c8ffffe462	[AMDGPU] For amdpal, widen interpolation mode workaround Summary: The interpolation mode workaround ensures that at least one interpolation mode is enabled in PSInputAddr. It does not also check PSInputEna on the basis that the user might enable bits in that depending on run-time state. However, for amdpal os type, the user does not enable some bits after compilation based on run-time states; the register values being generated here are the final ones set in the hardware. Therefore, apply the workaround to PSInputAddr and PSInputEnable together. (The case where a bit is set in PSInputAddr but not in PSInputEnable is where the frontend set up an input arg for a particular interpolation mode, but nothing uses that input arg. Really we should have an earlier pass that removes such an arg.) Reviewers: arsenm, nhaehnle, dstuttard Subscribers: kzhuravl, wdng, yaxunl, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D37758 llvm-svn: 315591	2017-10-12 16:16:41 +00:00

1 2 3 4 5 ...

506 Commits