llvm-project

Commit Graph

Author	SHA1	Message	Date
Arthur Eubanks	9abc457724	[NewPM][AMDGPU] Port amdgpu-simplifylib/amdgpu-usenative And add them to the pipeline via AMDGPUTargetMachine::registerPassBuilderCallbacks(), which mirrors AMDGPUTargetMachine::adjustPassManager(). These passes can't be unconditionally added to PassRegistry.def since they are only present when the AMDGPU backend is enabled. And there are no target-specific headers in llvm/include, so parsing these pass names must occur somewhere in the AMDGPU directory. I decided the best place was inside the TargetMachine, since the PassBuilder invokes TargetMachine::registerPassBuilderCallbacks() anyway. If we come up with a cleaner solution for target-specific passes in the future that's fine, but there aren't too many target-specific IR passes living in target-specific directories so it shouldn't be too bad to change in the future. Reviewed By: ychen, arsenm Differential Revision: https://reviews.llvm.org/D93863	2020-12-28 10:38:51 -08:00
alex-t	644da789e3	[AMDGPU] Split edge to make si_if dominate end_cf Basic block containing "if" not necessarily dominates block that is the "false" target for the if. That "false" target block may have another predecessor besides the "if" block. IR value corresponding to the Exec mask is generated by the si_if intrinsic and then used by the end_cf intrinsic. In this case IR verifier complains that 'Def does not dominate all uses'. This change split the edge between the "if" block and "false" target block to make it dominated by the "if" block. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D91435	2020-12-28 17:14:02 +03:00
Dmitry Preobrazhensky	8c25bb3d0d	[AMDGPU][MC] Improved errors handling for v_interp* operands See bug 48596 (https://bugs.llvm.org/show_bug.cgi?id=48596) Reviewers: rampitec Differential Revision: https://reviews.llvm.org/D93757	2020-12-28 16:15:48 +03:00
Dmitry Preobrazhensky	5b17263b6b	[AMDGPU][MC][NFC] Parser refactoring See bug 48515 (https://bugs.llvm.org/show_bug.cgi?id=48515) Reviewers: rampitec Differential Revision: https://reviews.llvm.org/D93756	2020-12-28 14:59:49 +03:00
Kazu Hirata	d6ff5cf995	[Target] Use llvm::any_of (NFC)	2020-12-24 19:43:26 -08:00
Praveen Velliengiri	61177943c9	[AMDGPU] Use MUBUF instructions for global address space access Currently, the compiler crashes in instruction selection of global load/stores in gfx600 due to the lack of FLAT instructions. This patch fix the crash by selecting MUBUF instructions for global load/stores in gfx600. Authored-by: Praveen Velliengiri <Praveen.Velliengiri@amd.com> Reviewed by: t-tye Differential revision: https://reviews.llvm.org/D92483	2020-12-24 10:13:04 +00:00
Stanislav Mekhanoshin	747f67e034	[AMDGPU] Fix adjustWritemask subreg handling If we happen to extract a non-dword subreg that breaks the logic of the function and it may shrink the dmask because it does not recognize the use of a lane(s). This bug is next to impossible to trigger with the current lowering in the BE, but it breaks in one of my future patches. Differential Revision: https://reviews.llvm.org/D93782	2020-12-23 14:43:31 -08:00
Sebastian Neubauer	221fdedc69	[AMDGPU][GlobalISel] Fold flat vgpr + constant addresses Use getPtrBaseWithConstantOffset in selectFlatOffsetImpl to fold more vgpr+constant addresses. Differential Revision: https://reviews.llvm.org/D93692	2020-12-23 10:40:30 +01:00
Matt Arsenault	581d13f8ae	GlobalISel: Return APInt from getConstantVRegVal Returning int64_t was arbitrarily limiting for wide integer types, and the functions should handle the full generality of the IR. Also changes the full form which returns the originally defined vreg. Add another wrapper for the common case of just immediately converting to int64_t (arguably this would be useful for the full return value case as well). One possible issue with this change is some of the existing uses did break without conversion to getConstantVRegSExtVal, and it's possible some without adequate test coverage are now broken.	2020-12-22 22:23:58 -05:00
Matt Arsenault	8bf9cdeaee	AMDGPU: Use Register	2020-12-22 21:55:59 -05:00
Matt Arsenault	bac54639c7	AMDGPU: Add spilled CSR SGPRs to entry block live ins	2020-12-22 21:55:59 -05:00
Matt Arsenault	29ed846d67	AMDGPU: Fix assert when checking for implicit operand legality	2020-12-22 20:56:24 -05:00
Stanislav Mekhanoshin	d15119a02d	[AMDGPU][GlobalISel] GlobalISel for flat scratch It does not seem to fold offsets but this is not specific to the flat scratch as getPtrBaseWithConstantOffset() does not return the split for these tests unlike its SDag counterpart. Differential Revision: https://reviews.llvm.org/D93670	2020-12-22 16:33:06 -08:00
Stanislav Mekhanoshin	ca4bf58e4e	[AMDGPU] Support unaligned flat scratch in TLI Adjust SITargetLowering::allowsMisalignedMemoryAccessesImpl for unaligned flat scratch support. Mostly needed for global isel. Differential Revision: https://reviews.llvm.org/D93669	2020-12-22 16:12:31 -08:00
Stanislav Mekhanoshin	ae8f4b2178	[AMDGPU] Folding of FI operand with flat scratch Differential Revision: https://reviews.llvm.org/D93501	2020-12-22 10:48:04 -08:00
Dmitry Preobrazhensky	f4f49d9d0d	[AMDGPU][MC][NFC] Fix for sanitizer error in `8ab5770` Corrected to fix sanitizer error introduced by `8ab5770`	2020-12-21 20:42:35 +03:00
Dmitry Preobrazhensky	8ab5770a17	[AMDGPU][MC][NFC] Parser refactoring See bug 48515 (https://bugs.llvm.org/show_bug.cgi?id=48515) Reviewers: rampitec Differential Revision: https://reviews.llvm.org/D93548	2020-12-21 20:21:07 +03:00
Carl Ritson	7722494834	[AMDGPU][NFC] Remove unused Hi16Elt definition	2020-12-18 20:38:54 +09:00
dfukalov	9ed8e0caab	[NFC] Reduce include files dependency and AA header cleanup (part 2). Continuing work started in https://reviews.llvm.org/D92489: Removed a bunch of includes from "AliasAnalysis.h" and "LoopPassManager.h". Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D92852	2020-12-17 14:04:48 +03:00
Matt Arsenault	f333736757	AMDGPU: Remove SGPRSpillVGPRDefinedSet hack These VGPRs should be reserved and therefore do not need "correct" liveness. They should not have undef uses, which can still cause issues.	2020-12-16 21:33:35 -05:00
Roman Lebedev	49dac4aca0	[SimplifyCFG] MergeBlockIntoPredecessor() already knows how to preserve DomTree ... so just ensure that we pass DomTreeUpdater it into it. Fixes DomTree preservation for a large number of tests, all of which are marked as such so that they do not regress.	2020-12-17 01:03:49 +03:00
Piotr Sobczak	c7afb698ca	[AMDGPU] Avoid calling copyFastMathFlags in wrong context Calling Instruction::copyFastMathFlags() assumes the caller is FPMathOperator. Avoid calling the function for instructions that are not instances of FPMathOperator.	2020-12-16 10:22:51 +01:00
Sebastian Neubauer	409a2f0f9e	[AMDGPU] Allow no saddr for global addtid insts I think the global_load/store_dword_addtid instructions support switching off the scalar address. Add assembler and disassembler support for this. Differential Revision: https://reviews.llvm.org/D93288	2020-12-16 10:01:40 +01:00
Stanislav Mekhanoshin	eb66bf0802	[AMDGPU] Print SCRATCH_EN field after the kernel Differential Revision: https://reviews.llvm.org/D93353	2020-12-15 22:44:30 -08:00
Matt Arsenault	97f51f0489	AMDGPU: Remove redundant CCAction for i1	2020-12-15 17:00:27 -05:00
Tony	d5ea8f7010	[AMDGPU] Clarify scratch initialization - Clarify documentation on initializing scratch. - Rename compute_pgm_rsrc2 field for enabling scratch from ENABLE_SGPR_PRIVATE_SEGMENT_WAVEFRONT_OFFSET to ENABLE_PRIVATE_SEGMENT to match hardware definition. Differential Revision: https://reviews.llvm.org/D93271	2020-12-15 20:14:20 +00:00
Sebastian Neubauer	91445979be	[AMDGPU] Unify flat offset logic Move getNumFlatOffsetBits from AMDGPUAsmParser and SIInstrInfo into AMDGPUBaseInfo. Differential Revision: https://reviews.llvm.org/D93287	2020-12-15 14:59:59 +01:00
Changpeng Fang	ce0c0013d8	AMDGPU: If a store defines (alias) a load, it clobbers the load. Summary: If a store defines (must alias) a load, it clobbers the load. Fixes: SWDEV-258915 Reviewers: arsenm Differential Revision: https://reviews.llvm.org/D92951	2020-12-14 16:34:32 -08:00
Stanislav Mekhanoshin	cf5845d6c4	[AMDGPU] Use multi-dword flat scratch for spilling Differential Revision: https://reviews.llvm.org/D93067	2020-12-14 14:19:29 -08:00
Michael Liao	1fd1f638b6	[amdgpu] Fix a crash case when `V_CNDMASK` could be simplified. - Once an instruction is simplified, foldable candidates from it should be invalidated or skipped as the operand index is no longer valid. Differential Revision: https://reviews.llvm.org/D93174	2020-12-14 13:08:13 -05:00
Stanislav Mekhanoshin	87d7757bbe	[SLP] Control maximum vectorization factor from TTI D82227 has added a proper check to limit PHI vectorization to the maximum vector register size. That unfortunately resulted in at least a couple of regressions on SystemZ and x86. This change reverts PHI handling from D82227 and replaces it with a more general check in SLPVectorizerPass::tryToVectorizeList(). Moved to tryToVectorizeList() it allows to restart vectorization if initial chunk fails. However, this function is more general and handles not only PHI but everything which SLP handles. If vectorization factor would be limited to maximum vector register size it would limit much more vectorization than before leading to further regressions. Therefore a new TTI callback getMaximumVF() is added with the default 0 to preserve current behavior and limit nothing. Then targets can decide what is better for them. The callback gets ElementSize just like a similar getMinimumVF() function and the main opcode of the chain. The latter is to avoid regressions at least on the AMDGPU. We can have loads and stores up to 128 bit wide, and <2 x 16> bit vector math on some subtargets, where the rest shall not be vectorized. I.e. we need to differentiate based on the element size and operation itself. Differential Revision: https://reviews.llvm.org/D92059	2020-12-14 08:49:40 -08:00
Jay Foad	07e92e6b60	[AMDGPU] Make use of HasSMemRealTime predicate. NFC. We have this subtarget feature so it makes sense to use it here. This is NFC because it's always defined by default on GFX8+. Differential Revision: https://reviews.llvm.org/D93202	2020-12-14 16:34:57 +00:00
Carl Ritson	62c246eda2	[AMDGPU][NFC] Rename opsel/opsel_hi/neg_lo/neg_hi with suffix 0 These parameters set a default value of 0, so I believe they should include a 0 suffix. This allows for versions which do not set a default value in future. Reviewed By: foad Differential Revision: https://reviews.llvm.org/D93187	2020-12-14 20:01:56 +09:00
Carl Ritson	af4570cd3a	[AMDGPU][NFC] Remove unused VOP3Mods0Clamp This is unused and the selection function does not exist. Reviewed By: foad Differential Revision: https://reviews.llvm.org/D93188	2020-12-14 20:00:58 +09:00
Sebastian Neubauer	5733167f54	[AMDGPU] Mark amdgpu_gfx functions as module entry function - Allows lds allocations - Writes resource usage into COMPUTE_PGM_RSRC1 registers in PAL metadata Differential Revision: https://reviews.llvm.org/D92946	2020-12-14 10:43:39 +01:00
Jay Foad	4f25e53982	[AMDGPU] Make use of emitRemovedIntrinsicError. NFC. Change-Id: I482bbf528255f2eacd3878ddfe7edb9a8f63d5c2	2020-12-11 14:02:14 +00:00
Mirko Brkusanin	0c7cce54eb	[AMDGPU] Resolve issues when picking between ds_read/write and ds_read2/write2 Both ds_read_b128 and ds_read2_b64 are valid for 128bit 16-byte aligned loads but the one that will be selected is determined either by the order in tablegen or by the AddedComplexity attribute. Currently ds_read_b128 has priority. While ds_read2_b64 has lower alignment requirements, we cannot always restrict ds_read_b128 to 16-byte alignment because of unaligned-access-mode option. This was causing ds_read_b128 to be selected for 8-byte aligned loads regardles of chosen access mode. To resolve this we use two patterns for selecting ds_read_b128. One requires alignment of 16-byte and the other requires unaligned-access-mode option. Same goes for ds_write2_b64 and ds_write_b128. Differential Revision: https://reviews.llvm.org/D92767	2020-12-10 12:40:49 +01:00
Stanislav Mekhanoshin	4617cc68f6	[AMDGPU] Fix expansion of 192 bit spills in PEI Differential Revision: https://reviews.llvm.org/D92979	2020-12-09 16:36:29 -08:00
Scott Linder	9260a99999	[MC][AMDGPU] Consume EndOfStatement in asm parser Avoids spurious newlines showing up in the output when emitting assembly via MC. Reviewed By: MaskRay, arsenm Differential Revision: https://reviews.llvm.org/D92690	2020-12-09 21:45:55 +00:00
Scott Linder	f5f4b8b60f	[AMDGPU][MC] Restore old error position for "too few operands" Revert part of https://reviews.llvm.org/D92084 to make it simpler to start consuming the EndOfStatement token within AMDGPU's ParseInstruction in a future patch. This also brings us back to what every other target currently does. A future change to move the position back to the end of the statement would likely need to audit all of the AMDGPUOperand SMLoc ranges, and determine the SMLoc for the last character of the last operand. Reviewed By: dp Differential Revision: https://reviews.llvm.org/D92960	2020-12-09 21:09:47 +00:00
Austin Kerbow	4aa842a800	[AMDGPU] Add new pseudos for indirect addressing with VGPR Indexing It is possible for copies or spills to be inserted in the middle of indirect addressing sequences which use VGPR indexing. Spills to accvgprs could be effected by the indexing mode. Add new pseudo instructions that are expanded after register allocation to avoid the problematic spill or copy placement. Differential Revision: https://reviews.llvm.org/D91048	2020-12-08 12:24:12 -08:00
Stanislav Mekhanoshin	dd89249498	[AMDGPU] Annotate vgpr<->agpr spills in asm Differential Revision: https://reviews.llvm.org/D92125	2020-12-07 11:25:25 -08:00
Petar Avramovic	3a042dcd2e	[AMDGPU] Fix default value of glc for mubuf rtn atomics Mubuf rtn atomics use GLC_1 thus default value for glc operand should be -1, see https://reviews.llvm.org/D90730. This allows us to report error when rtn atomic requires glc=1 but does not have glc operand in input. Differential Revision: https://reviews.llvm.org/D92654	2020-12-07 14:00:08 +01:00
Dmitry Preobrazhensky	a0b3a9391c	[AMDGPU][MC] Improved diagnostics message for sym/expr operands See bug 48295 (https://bugs.llvm.org/show_bug.cgi?id=48295) Reviewers: rampitec Differential Revision: https://reviews.llvm.org/D92088	2020-12-05 14:05:53 +03:00
Dmitry Preobrazhensky	e97dd11977	[AMDGPU][MC] Corrected error position for invalid MOVREL src See bug 47518 (https://bugs.llvm.org/show_bug.cgi?id=47518) Reviewers: rampitec Differential Revision: https://reviews.llvm.org/D92084	2020-12-05 13:23:14 +03:00
Kazu Hirata	2dc4a14e4d	[AMDGPU] Use llvm::is_contained (NFC)	2020-12-04 21:42:55 -08:00
Paul C. Anagnostopoulos	415fab6f67	[TableGen] Eliminate the 'code' type Update the documentation. Rework various backends that relied on the code type. Differential Revision: https://reviews.llvm.org/D92269	2020-12-03 10:19:11 -05:00
Mircea Trofin	bab72dd5d5	[NFC][MC] TargetRegisterInfo::getSubReg is a MCRegister. Typing the API appropriately. Differential Revision: https://reviews.llvm.org/D92341	2020-12-02 15:46:38 -08:00
Jay Foad	d28624a209	[AMDGPU] Stop adding an implicit def of vcc_hi for wave32 This doesn't seem to be needed for anything. Differential Revision: https://reviews.llvm.org/D92400	2020-12-02 10:11:42 +00:00
Caroline Concatto	4b0ef2b075	[NFC][CostModel]Extend class IntrinsicCostAttributes to use ElementCount Type This patch replaces the attribute `unsigned VF` in the class IntrinsicCostAttributes by `ElementCount VF`. This is a non-functional change to help upcoming patches to compute the cost model for scalable vector inside this class. Differential Revision: https://reviews.llvm.org/D91532	2020-12-01 11:12:51 +00:00

1 2 3 4 5 ...

5549 Commits