Currently the return address ABI registers s[30:31], which fall in the call
clobbered register range, are added as a live-in on the function entry to
preserve its value when we have calls so that it gets saved and restored
around the calls.
But the DWARF unwind information (CFI) needs to track where the return address
resides in a frame and the above approach makes it difficult to track the
return address when the CFI information is emitted during the frame lowering,
due to the involvment of understanding the control flow.
This patch moves the return address ABI registers s[30:31] into callee saved
registers range and stops adding live-in for return address registers, so that
the CFI machinery will know where the return address resides when CSR
save/restore happen during the frame lowering.
And doing the above poses an issue that now the return instruction uses undefined
register `sgpr30_sgpr31`. This is resolved by hiding the return address register
use by the return instruction through the `SI_RETURN` pseudo instruction, which
doesn't take any input operands, until the `SI_RETURN` pseudo gets lowered to the
`S_SETPC_B64_return` during the `expandPostRAPseudo()`.
As an added benefit, this patch simplifies overall return instruction handling.
Note: The AMDGPU CFI changes are there only in the downstream code and another
version of this patch will be posted for review for the downstream code.
Reviewed By: arsenm, ronlieb
Differential Revision: https://reviews.llvm.org/D114652
Currently the return address ABI registers s[30:31], which fall in the call
clobbered register range, are added as a live-in on the function entry to
preserve its value when we have calls so that it gets saved and restored
around the calls.
But the DWARF unwind information (CFI) needs to track where the return address
resides in a frame and the above approach makes it difficult to track the
return address when the CFI information is emitted during the frame lowering,
due to the involvment of understanding the control flow.
This patch moves the return address ABI registers s[30:31] into callee saved
registers range and stops adding live-in for return address registers, so that
the CFI machinery will know where the return address resides when CSR
save/restore happen during the frame lowering.
And doing the above poses an issue that now the return instruction uses undefined
register `sgpr30_sgpr31`. This is resolved by hiding the return address register
use by the return instruction through the `SI_RETURN` pseudo instruction, which
doesn't take any input operands, until the `SI_RETURN` pseudo gets lowered to the
`S_SETPC_B64_return` during the `expandPostRAPseudo()`.
As an added benefit, this patch simplifies overall return instruction handling.
Note: The AMDGPU CFI changes are there only in the downstream code and another
version of this patch will be posted for review for the downstream code.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D114652
This patch changes the AMDGPU_Gfx calling convention. It defines the SGPR registers s[4:29] as callee-save and leaves some SGPRs usable for callers. The intention is to avoid unneccessary s_mov instructions for arguments the caller would otherwise save and restore in these registers.
Reviewed By: sebastian-ne
Differential Revision: https://reviews.llvm.org/D111637
These intrinsics maps to the 24-bit v_mul_hi instructions.
This change also fixes an incorrect assumption on the associativity of
24-bit mulhi in its SDNode record in tblgen.
Differential Revision: https://reviews.llvm.org/D112394
1. Splitted out some parts of R600 target to separate modules/headers.
2. Reduced some include lists in headers.
3. Found and fixed issue with override `GCNTargetMachine::getSubtargetImpl()`
and `R600TargetMachine::getSubtargetImpl()` had different return value type
than base class.
4. Minor forward declarations cleanup.
Reviewed By: foad
Differential Revision: https://reviews.llvm.org/D108596
When running the tests on PowerPC and x86, the lit test GlobalISel/trunc.ll fails at the memory sanitize step. This seems to be due to wrong invalid logic (which matches even if it shouldn't) and likely missing variable initialisation."
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D95878
This reverts commits 62af0305b7cc..677a3529d3e6 from D93708.
They cause failures in the sanitizer builds because of uninitialized
values.
A fix is in D95878, but it might take some time until this is pushed,
so reverting the changes for now.
Try to handle arbitrary scalar BFEs by packing the operands. The DAG
gives up on non-constant arguments. We're still missing any constant
folding, so we end up with pretty ugly code most of the time. Also
handle the 64-bit scalar case, which the DAG doesn't try to do.
Manually select this is as a tablegen workraound. Both SelectionDAG
and GlobalISel end up misplacing the copy to m0 when both instructions
in the output need it. Neither considers that both output instructions
depend on m0. I don't know of any other pattern we need to handle this
case, so it's less effort to just workaround this for now.
I'm mildly worried about potentially reordering exp/exp_done with
IntrWriteMem on the intrinsic.
Requires hacking out the illegal type on SI, so manually select that
case during lowering.
This enables GlobalISel to handle various intrinsics. The custom node
pattern will be ignored, and the intrinsic will work. This will also
allow SelectionDAG to directly select the intrinsics, but as they are
all custom lowered to the nodes, this ends up leaving dead code in the
table.
Eventually either GlobalISel should add the equivalent of custom nodes
equivalent, or intrinsics should be directly used. These each have
different tradeoffs.
There are a few more to handle, but these are easy to handle
ones. Some others fail for other reasons.
llvm-svn: 371432
to reflect the new license.
We understand that people may be surprised that we're moving the header
entirely to discuss the new license. We checked this carefully with the
Foundation's lawyer and we believe this is the correct approach.
Essentially, all code in the project is now made available by the LLVM
project under our new license, so you will see that the license headers
include that license only. Some of our contributors have contributed
code under our old license, and accordingly, we have retained a copy of
our old license notice in the top-level files in each project and
repository.
llvm-svn: 351636
Summary:
The optimization to early break out of loops if all threads are dead was
never fully implemented.
But the PHI node analyzing is actually causing a number of problems, so
remove all the extra code for it.
(This does actually regress code quality in a few places because it
ends up relying more heavily on phi's of i1, which we don't do a
great job with. However, since it fixes real bugs in the wild, we
should take this change. I have some prototype changes to improve
i1 lowering in general -- not just for control flow -- which should
help recover the code quality, I just need to make those changes
fit for general consumption. -- Nicolai)
Change-Id: I6fc6c6c8961857ac6009fcfb9f7e5e48dc23fbb1
Patch-by: Christian König <christian.koenig@amd.com>
Reviewers: arsenm, rampitec, tpr
Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, t-tye, llvm-commits
Differential Revision: https://reviews.llvm.org/D53359
llvm-svn: 345718
If a source of rcp instruction is a result of any conversion from
an integer convert it into rcp_iflag instruction. No FP exception
can ever happen except division by zero if a single precision rcp
argument is a representation of an integral number.
Differential Revision: https://reviews.llvm.org/D48569
llvm-svn: 335742