llvm-project

Commit Graph

Author	SHA1	Message	Date
Sebastian Neubauer	221fdedc69	[AMDGPU][GlobalISel] Fold flat vgpr + constant addresses Use getPtrBaseWithConstantOffset in selectFlatOffsetImpl to fold more vgpr+constant addresses. Differential Revision: https://reviews.llvm.org/D93692	2020-12-23 10:40:30 +01:00
Matt Arsenault	663f16684d	AMDGPU: Fix verifier error on killed spill of partially undef register This does unfortunately end up with extra waitcnts getting inserted that were avoided before. Ideally we would avoid the spills of these undef components in the first place.	2020-10-15 09:45:44 -04:00
Sebastian Neubauer	a343b9b032	Revert "[AMDGPU] Insert waitcnt after returning from call" This reverts commit `ca907bfb57`. According to michel.daenzer, > This completely broke the Mesa radeonsi driver on Navi 14. Xorg + > xterm come up with major corruption & psychedelic colours.	2020-09-23 17:16:39 +02:00
Sebastian Neubauer	ca907bfb57	[AMDGPU] Insert waitcnt after returning from call When memory operations are outstanding on function calls, either the caller or the callee can insert a waitcnt to ensure that all reads are finished. Calls need some time to be executed, so if the callee inserts the waitcnt, filling the instruction buffer and waiting for memory will be interleaved, hiding some latency. This comes at the cost of having a waitcnt inside functions that may not be needed as no memory operations are outstanding. For function calls, this is already implemented. The same principal applies to returns: If the caller inserts a waitcnt after the call, the callee does not have to wait and the return and memory operation can be run in parallel. This commit implements waiting in the caller after returning from a function call. Differential Revision: https://reviews.llvm.org/D87674	2020-09-23 12:17:59 +02:00
Jay Foad	c799f873cb	[AMDGPU] Don't cluster stores Clustering loads has caching benefits, but as far as I know there is no advantage to clustering stores on any AMDGPU subtargets. The disadvantage is that it tends to increase register pressure and restricts scheduling freedom. Differential Revision: https://reviews.llvm.org/D85530	2020-09-14 13:40:17 +01:00
QingShan Zhang	3359ea62ed	[Scheduling] Create the missing dependency edges for store cluster If it is load cluster, we don't need to create the dependency edges(SUb->reg) from SUb to SUa as they both depend on the base register "reg" +-------+ +----> reg \| \| +---+---+ \| ^ \| \| \| \| \| \| \| +---+---+ \| \| SUa \| Load 0(reg) \| +---+---+ \| ^ \| \| \| \| \| +---+---+ +----+ SUb \| Load 4(reg) +-------+ But if it is store cluster, we need to create it as follow shows to avoid the instruction store depend on scheduled in-between SUb and SUa. +-------+ +----> reg \| \| +---+---+ \| ^ \| \| Missing +-------+ \| \| +-------------------->+ y \| \| \| \| +---+---+ \| +---+-+-+ ^ \| \| SUa \| Store x 0(reg) \| \| +---+---+ \| \| ^ \| \| \| +------------------------+ \| \| \| \| +---+--++ +----+ SUb \| Store y 4(reg) +-------+ Reviewed By: evandro, arsenm, rampitec, foad, fhahn Differential Revision: https://reviews.llvm.org/D72031	2020-08-07 04:58:03 +00:00
Matt Arsenault	e00201539f	GlobalISel: Implement fewerElementsVector for G_EXTRACT_VECTOR_ELT Use the same basic strategy as LegalizeVectorTypes. Try to index into smaller pieces if there's a constant index, and otherwise fall back to a stack temporary.	2020-08-06 14:33:16 -04:00

7 Commits