In GFX10 VOP3 can have a literal, which opens up the possibility of two
operands using the same literal value, which is allowed and only counts
as one use of the constant bus.
AMDGPUAsmParser::validateConstantBusLimitations already knew about this
but SIInstrInfo::verifyInstruction did not.
Differential Revision: https://reviews.llvm.org/D100770
Don't shrink VOP3 instructions if there are any uses of a carry-out
operand, because the shrunken form of the instruction would write the
carry-out to vcc instead of to a virtual register.
Differential Revision: https://reviews.llvm.org/D100760
This patch removes -fixed-abi check for indirect calls
and also adds queue-ptr which is required for indirect calls to work.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D100633
Remove the MachineDCE pass after the first SIFoldOperands pass now
that SIFoldOperands deletes its own dead instructions.
Reapply after fixing dependent change D100188.
Differential Revision: https://reviews.llvm.org/D100189
Have funcattrs expand all implied attributes into the IR. This expands the infrastructure from D100400, but for definitions not declarations this time.
Somewhat subtly, this mostly isn't semantic. Because the accessors did the inference, any client which used the accessor was already getting the stronger result. Clients that directly checked presence of attributes (there are some), will see a stronger result now.
The old behavior can end up quite confusing for two reasons:
* Without this change, we have situations where function-attrs appears to fail when inferring an attribute (as seen by a human reading IR), but that consuming code will see that it should have been implied. As a human trying to sanity check test results and study IR for optimization possibilities, this is exceeding error prone and confusing. (I'll note that I wasted several hours recently because of this.)
* We can have transforms which trigger without the IR appearing (on inspection) to meet the preconditions. This change doesn't prevent this from happening (as the accessors still involve multiple checks), but it should make it less frequent.
I'd argue in favor of deleting the extra checks out of the accessors after this lands, but I want that in it's own review as a) it's purely stylistic, and b) I already know there's some disagreement.
Once this lands, I'm also going to do a cleanup change which will delete some now redundant duplicate predicates in the inference code, but again, that deserves to be a change of it's own.
Differential Revision: https://reviews.llvm.org/D100226
This patch changed the isLegalUse check to ensure that
LSRInstance::GenerateConstantOffsetsImpl generates an
offset that results in a legal addressing mode and
formula. The check is changed to look similar to the
assert check used for illegal formulas.
Differential Revision: https://reviews.llvm.org/D100383
Change-Id: Iffb9e32d59df96b8f072c00f6c339108159a009a
Returning in memory is not supported, so fall back to sret.
Also, extend i1 and i16 to i32. Otherwise, they would be passed through
memory.
Differential Revision: https://reviews.llvm.org/D100543
Now since LDS uses within non-kernel functions are being handled in the
pass - LowerModuleLDS, we *NO* need to *forcefully* inline non-kernel
functions just because they use LDS. Do forceful inlining only when the
pass - LowerModuleLDS is not enabled. It is enabled by default.
Reviewed By: JonChesterfield
Differential Revision: https://reviews.llvm.org/D100481
This reverts commit ab98f2c712 and 98eea392cd.
It includes a fix for the clang test which triggered the revert. I failed to notice this one because there was another AMDGPU llvm test with a similiar name and the exact same text in the error message. Odd. Since only one build bot reported the clang test, I didn't notice that one.
Rename the name of "LDS lowering" pass from `amdgpu-disable-lower-module-lds` to
`amdgpu-enable-lower-module-lds` as later is consistent and reads better.
Reviewed By: JonChesterfield
Differential Revision: https://reviews.llvm.org/D100441
Otherwise it reuses the same register for storing the stack slot
offset if the stack slot offset is big.
Differential Revision: https://reviews.llvm.org/D100461
This patch adds attributes corresponding to
implicits to functions/kernels if
1. it has an indirect call OR
2. it's address is taken.
Once such attributes are set, rest of the codegen would work
out-of-box for indirect calls. This patch eliminates
the potential overhead -fixed-abi imposes even though indirect functions
calls are not used.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D99347
Retry of 330619a3a6 that includes a clang test update.
Original commit message:
If we run passes before lowering llvm.expect intrinsics to metadata,
then those passes have no way to act on the hints provided by llvm.expect.
SimplifyCFG is the known offender, and we made it smarter about profile
metadata in D98898 <https://reviews.llvm.org/D98898>.
In the motivating example from https://llvm.org/PR49336 , this means we
were ignoring the recommended method for a programmer to tell the compiler
that a compare+branch is expensive. This change appears to solve that case -
the metadata survives to the backend, the compare order is as expected in IR,
and the backend does not do anything to reverse it.
We make the same change to the old pass manager to keep things synchronized.
Differential Revision: https://reviews.llvm.org/D100213
If we run passes before lowering llvm.expect intrinsics to metadata,
then those passes have no way to act on the hints provided by llvm.expect.
SimplifyCFG is the known offender, and we made it smarter about profile
metadata in D98898.
In the motivating example from https://llvm.org/PR49336 , this means we
were ignoring the recommended method for a programmer to tell the compiler
that a compare+branch is expensive. This change appears to solve that case -
the metadata survives to the backend, the compare order is as expected in IR,
and the backend does not do anything to reverse it.
We make the same change to the old pass manager to keep things synchronized.
Differential Revision: https://reviews.llvm.org/D100213
Spilling the fp or bp to scratch could overwrite VGPRs of inactive
lanes. Fix that by using only the active lanes of the scavenged VGPR.
This builds on the assumptions that
1. a function is never called with exec=0
2. lanes do not die in a function, i.e. exec!=0 in the function epilog
3. no new lanes are active when exiting the function, i.e. exec in the
epilog is a subset of exec in the prolog.
Differential Revision: https://reviews.llvm.org/D96869
Spilling SGPRs to scratch uses a temporary VGPR. LLVM currently cannot
determine if a VGPR is used in other lanes or not, so we need to save
all lanes of the VGPR. We even need to save the VGPR if it is marked as
dead.
The generated code depends on two things:
- Can we scavenge an SGPR to save EXEC?
- And can we scavenge a VGPR?
If we can scavenge an SGPR, we
- save EXEC into the SGPR
- set the needed lane mask
- save the temporary VGPR
- write the spilled SGPR into VGPR lanes
- save the VGPR again to the target stack slot
- restore the VGPR
- restore EXEC
If we were not able to scavenge an SGPR, we do the same operations, but
everytime the temporary VGPR is written to memory, we
- write VGPR to memory
- flip exec (s_not exec, exec)
- write VGPR again (previously inactive lanes)
Surprisingly often, we are able to scavenge an SGPR, even though we are
at the brink of running out of SGPRs.
Scavenging a VGPR does not have a great effect (saves three instructions
if no SGPR was scavenged), but we need to know if the VGPR we use is
live before or not, otherwise the machine verifier complains.
Differential Revision: https://reviews.llvm.org/D96336
Added cost estimation for switch instruction, updated costs of branches, fixed
phi cost.
Had to increase `-amdgpu-unroll-threshold-if` default value since conditional
branch cost (size) was corrected to higher value.
Test renamed to "control-flow.ll".
Removed redundant code in `X86TTIImpl::getCFInstrCost()` and
`PPCTTIImpl::getCFInstrCost()`.
Reviewed By: rampitec
Differential Revision: https://reviews.llvm.org/D96805
Remove the MachineDCE pass after the first SIFoldOperands pass now
that SIFoldOperands deletes its own dead instructions.
Differential Revision: https://reviews.llvm.org/D100189
This is cheap to implement, means less work for future passes like
MachineDCE, and slightly improves the folding in some cases.
Differential Revision: https://reviews.llvm.org/D100117
Pretty straightforward use of existing infrastructure and port of the attributor inference rules for nosync.
A couple points of interest:
* I deliberately switched from "monotonic or better" to "unordered or better". This is simply me being conservative and is better in line with the rest of the optimizer. We treat monotonic conservatively pretty much everywhere.
* The operand bundle test change is suspicious. It looks like we might have missed something here, but if so, it's an issue with the existing nofree inference as well. I'm going to take a closer look at that separately.
* I needed to keep the previous inference from readnone. This surprised me, but made sense once I realized readonly inference goes to lengths to reason about local vs non-local memory and that writes to local memory are okay. This is fine for the purpose of nosync, but would e.g. prevent us from inferring nofree from readnone - which is slightly surprising.
Differential Revision: https://reviews.llvm.org/D99769
It used to work correctly even with a KILL, but there is
no reason to consider meta instructions since they do not
create real HW uses.
Differential Revision: https://reviews.llvm.org/D100135
After D99249 we use three different loop pass managers for LICM,
LoopRotate and LICM+LoopUnswitch. This happens because LazyBFI
and LazyBPI are not preserved by LoopRotate (note that D74640
is no longer needed). Avoid this by marking them as preserved.
My understanding of D86156 is that it is okay to simply preserve
them (which LoopUnswitch already does for the same reason) and
rely on callbacks to deal with deleted blocks.
Differential Revision: https://reviews.llvm.org/D99843
Look through copies to find more cases where the two values being
selected are identical. The motivation for this is just to be able to
remove the weird special case where tryFoldCndMask was called from
foldInstOperand, part way through folding a move-immediate into its
users, without regressing any lit tests.
ScratchExecCopy needs to be marked as live, we cannot use that register
while EXEC is stored in there.
Marking SGPRForFPSaveRestoreCopy and SGPRForBPSaveRestoreCopy as
available is unnecessary, they should not be live at that point anway.
Differential Revision: https://reviews.llvm.org/D100098
Test CodeGen/AMDGPU/amdgpu.private-memory.ll and
CodeGen/AMDGPU/private-memory-r600.ll have a block of CHECK directives
whose prefix is inconsistent: R600-CHECK Vs R600. This leads to a
R600-NOT directive using an undefined CHAN variable due to R600-CHECK
directives never being considered by FileCheck. Fixing the prefix leads
to the testcase failing. As per https://reviews.llvm.org/D99865#2675235
this commit removes the directives instead since it is not possible to
write a reliable check.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D99865
This fixes an oversight in D99747 which moved the IMG init code from
SIAddIMGInit to AdjustInstrPostInstrSelection, but did not set the
hasPostISelHook flag on gather4 instructions.
Differential Revision: https://reviews.llvm.org/D99953
This allows these optimisations to apply to e.g. `urem i16` directly
before `urem` is promoted to i32 on architectures where i16 operations
are not intrinsically legal (such as on Aarch64). The legalization then
later can happen more directly and generated code gets a chance to avoid
wasting time on computing results in types wider than necessary, in the end.
Seems like mostly an improvement in terms of results at least as far as x86_64 and aarch64 are concerned, with a few regressions here and there. It also helps in preventing regressions in changes like {D87976}.
Reviewed By: lebedev.ri
Differential Revision: https://reviews.llvm.org/D88785