This patch mechanically replaces None with std::nullopt where the
compiler would warn if None were deprecated. The intent is to reduce
the amount of manual work required in migrating from Optional to
std::optional.
This is part of an effort to migrate from llvm::Optional to
std::optional:
https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716
All instructions that can raise fp exceptions also read FPCR, with the
only other instructions that interact with it being the MSR/MRS to
write/read FPCR.
Introducing an FPCR register also requires adjusting
invalidateWindowsRegisterPairing in AArch64FrameLowering.cpp to use
the encoded value of registers instead of their enum value, as the
enum value is based on the alphabetical order of register names and
now FPCR is placed between FP and LR.
This change unfortunately means a large number of mir tests need to
be adjusted due to instructions now requiring an implicit fpcr operand
to be present.
Differential Revision: https://reviews.llvm.org/D121929
Whenever a call to __chkstk was made, the frame lowering previously
omitted the aligning (as NumBytes was reset to zero before doing
alignment).
This fixes https://github.com/llvm/llvm-project/issues/56182.
The initial version of this produced invalid code for small
functions with no local stack allocations, if those functions
were marked with the "stackrealign" attribute. If building
with -mstack-alignment=16 (which otherwise mostly would be a
no-op), this attribute is added on the main function.
Differential Revision: https://reviews.llvm.org/D135687
This reverts commit 50e0aced45.
This could accidentally start producing invalid code in some
cases (in particular, if compiling with -mstack-alignment=16, which
one could expect to be a no-op for a target where the stack always
is aligned to 16 bytes anyway).
If the stack is realigned, we've emitted a frame pointer and
already terminated the SEH prologue, making this dead code since
a07787c9a5.
The immediate to this SEH opcode was entirely bogus - we don't
know how many bytes the AND operation adjusts the SP, and by
doing "NumBytes & andMaskEncoded" (where andMaskEncoded was the
immediate bitpattern for the AND instruction), the immediate to the
opcode was total gibberish.
This hasn't had any practical effect, since the original stack
pointer always was restored from the frame pointer afterwards anyway.
Differential Revision: https://reviews.llvm.org/D135815
Without this, unwinding through functions that does use PAC
would fail, if PAC actually was active.
Differential Revision: https://reviews.llvm.org/D135103
After setting up the FP, the rest of the prologue doesn't need to
be replayed for unwinding the stack frame.
This allows reverting the functional parts of
2f7fbf8376 (but fixing inconsistent
duplicate setting of HasWinCFI).
Differential Revision: https://reviews.llvm.org/D135686
When returning from a function with both SCS and PAC-RET enabled, we need to
authenticate the return address from the stack and then load from the SCS,
but this was happening in the reverse order when RETA[AB] were being used.
Fix it by disabling the use of RETA[AB] when SCS is enabled.
Fixes pr58072.
Differential Revision: https://reviews.llvm.org/D134931
Part of initial Arm64EC patchset.
Arm64EC code needs to use functions with a different name, to avoid
using the x64 versions.
Differential Revision: https://reviews.llvm.org/D125417
rGcf97e0ec42b8 makes $x18 to be treated as callee-saved in functions with
Windows calling convention on non-Windows OSes.
Here we mark $x18 as callee-saved for functions with Windows calling
convention on Darwin, as well as on other non-Windows platforms, in
order to prevent some miscompilations (like miscompilation of
win64cc-darwin-backup-x18.ll).
Since getCalleeSavedRegs doesn't return x18 in list of callee-saved
registers, assignCalleeSavedSpillSlots and determineCalleeSaves
consider different sets of registers as callee-saved. It causes an
error:
```
Assertion failed: ((!HasCalleeSavedStackSize || getCalleeSavedStackSize() == Size) && "Invalid size calculated for callee saves"), function getCalleeSavedStackSize, file
AArch64MachineFunctionInfo.h, line 292.
```
Differential Revision: https://reviews.llvm.org/D130676
Some cl::ZeroOrMore were added to avoid the `may only occur zero or one times!`
error. More were added due to cargo cult. Since the error has been removed,
cl::ZeroOrMore is unneeded.
Also remove cl::init(false) while touching the lines.
Support the "-fzero-call-used-regs" option on AArch64. This involves much less
specialized code than the X86 version. Most of the checks can be done with
TableGen.
Reviewed By: nickdesaulniers, MaskRay
Differential Revision: https://reviews.llvm.org/D124836
Without SVE, after a dynamic stack allocation has modified the SP, it is
presumed that a frame pointer restoration will revert the SP back to
it's correct value prior to any caller stack being restored. However the
SVE frame is restored using the stack pointer directly, as it is located
after the frame pointer. This means that in the presence of a dynamic
stack allocation, any SVE callee state gets corrupted as SP has the
incorrect value when the SVE state is restored.
To address this issue, when variable sized objects and SVE CSRs are
present, treat the stack as having been realigned, hence restoring the
stack pointer from the frame pointerr prior to restoring the SVE state.
Differential Revision: https://reviews.llvm.org/D124615
The frame layout on Windows differs from that on other platforms. It
will spill the registers in descending numeric value (i.e. x30, x29,
...). Furthermore, the x29, x30 pair is particularly important as it
is used for the fast stack walking. As a result, we cannot simply
insert the Swift async frame record in between the store. To provide
the simplistic search mechanism, always spill the async frame record
prior to the spilled registers.
This was caught by the assertion failure in the frame lowering code when
building the runtime for Windows AArch64.
Fixes: #55058
Differential Revision: https://reviews.llvm.org/D124498
Reviewed By: mstorsjo
autiasp, autibsp instructions are the counterpart of paciasp/pacibsp instructions
therefore let's emit .cfi_negate_ra_state for these too.
In case of Armv8.3 instruction set the retaa/retbb will do the return and authentication
in one step here we can't emit the . cfi_negate_ra_state because that would be point after
the ret* instruction.
Reviewed By: nickdesaulniers, MaskRay
Differential Revision: https://reviews.llvm.org/D111780
When untagging the stack, the compiler may emit a sequence like:
```
.LBB0_1:
st2g sp, [sp], #32
sub x8, x8, #32
cbnz x8, .LBB0_1
stg sp, [sp], #16
```
These stack adjustments cannot be described by CFI instructions.
This patch disables merging of SP update with untagging, i.e. makes the
compiler use an additional scratch register (there should be plenty
available at this point as we are in the epilogue) and generate:
```
mov x9, sp
mov x8, #256
stg x9, [x9], #16
.LBB0_1:
sub x8, x8, #32
st2g x9, [x9], #32
cbnz x8, .LBB0_1
add sp, sp, #272
```
Merging is disabled only when we need to generate asynchronous unwind
tables.
Reviewed By: eugenis
Differential Revision: https://reviews.llvm.org/D114548
This pass inserts the necessary CFI instructions to compensate for the
inconsistency of the call-frame information caused by linear (non-CGA
aware) nature of the unwind tables.
Unlike the `CFIInstrInserer` pass, this one almost always emits only
`.cfi_remember_state`/`.cfi_restore_state`, which results in smaller
unwind tables and also transparently handles custom unwind info
extensions like CFA offset adjustement and save locations of SVE
registers.
This pass takes advantage of the constraints taht LLVM imposes on the
placement of save/restore points (cf. `ShrinkWrap.cpp`):
* there is a single basic block, containing the function prologue
* possibly multiple epilogue blocks, where each epilogue block is
complete and self-contained, i.e. CSR restore instructions (and the
corresponding CFI instructions are not split across two or more
blocks.
* prologue and epilogue blocks are outside of any loops
Thus, during execution, at the beginning and at the end of each basic
block the function can be in one of two states:
- "has a call frame", if the function has executed the prologue, or
has not executed any epilogue
- "does not have a call frame", if the function has not executed the
prologue, or has executed an epilogue
These properties can be computed for each basic block by a single RPO
traversal.
From the point of view of the unwind tables, the "has/does not have
call frame" state at beginning of each block is determined by the
state at the end of the previous block, in layout order.
Where these states differ, we insert compensating CFI instructions,
which come in two flavours:
- CFI instructions, which reset the unwind table state to the
initial one. This is done by a target specific hook and is
expected to be trivial to implement, for example it could be:
```
.cfi_def_cfa <sp>, 0
.cfi_same_value <rN>
.cfi_same_value <rN-1>
...
```
where `<rN>` are the callee-saved registers.
- CFI instructions, which reset the unwind table state to the one
created by the function prologue. These are the sequence:
```
.cfi_restore_state
.cfi_remember_state
```
In this case we also insert a `.cfi_remember_state` after the
last CFI instruction in the function prologue.
Reviewed By: MaskRay, danielkiss, chill
Differential Revision: https://reviews.llvm.org/D114545
This pass inserts the necessary CFI instructions to compensate for the
inconsistency of the call-frame information caused by linear (non-CFG
aware) nature of the unwind tables.
Unlike the `CFIInstrInserer` pass, this one almost always emits only
`.cfi_remember_state`/`.cfi_restore_state`, which results in smaller
unwind tables and also transparently handles custom unwind info
extensions like CFA offset adjustement and save locations of SVE
registers.
This pass takes advantage of the constraints that LLVM imposes on the
placement of save/restore points (cf. `ShrinkWrap.cpp`):
* there is a single basic block, containing the function prologue
* possibly multiple epilogue blocks, where each epilogue block is
complete and self-contained, i.e. CSR restore instructions (and the
corresponding CFI instructions are not split across two or more
blocks.
* prologue and epilogue blocks are outside of any loops
Thus, during execution, at the beginning and at the end of each basic
block the function can be in one of two states:
- "has a call frame", if the function has executed the prologue, or
has not executed any epilogue
- "does not have a call frame", if the function has not executed the
prologue, or has executed an epilogue
These properties can be computed for each basic block by a single RPO
traversal.
In order to accommodate backends which do not generate unwind info in
epilogues we compute an additional property "strong no call frame on
entry" which is set for the entry point of the function and for every
block reachable from the entry along a path that does not execute the
prologue. If this property holds, it takes precedence over the "has a
call frame" property.
From the point of view of the unwind tables, the "has/does not have
call frame" state at beginning of each block is determined by the
state at the end of the previous block, in layout order.
Where these states differ, we insert compensating CFI instructions,
which come in two flavours:
- CFI instructions, which reset the unwind table state to the
initial one. This is done by a target specific hook and is
expected to be trivial to implement, for example it could be:
```
.cfi_def_cfa <sp>, 0
.cfi_same_value <rN>
.cfi_same_value <rN-1>
...
```
where `<rN>` are the callee-saved registers.
- CFI instructions, which reset the unwind table state to the one
created by the function prologue. These are the sequence:
```
.cfi_restore_state
.cfi_remember_state
```
In this case we also insert a `.cfi_remember_state` after the
last CFI instruction in the function prologue.
Reviewed By: MaskRay, danielkiss, chill
Differential Revision: https://reviews.llvm.org/D114545
Re-commit of 32e8b550e5
This patch rearranges emission of CFI instructions, so the resulting
DWARF and `.eh_frame` information is precise at every instruction.
The current state is that the unwind info is emitted only after the
function prologue. This is fine for synchronous (e.g. C++) exceptions,
but the information is generally incorrect when the program counter is
at an instruction in the prologue or the epilogue, for example:
```
stp x29, x30, [sp, #-16]! // 16-byte Folded Spill
mov x29, sp
.cfi_def_cfa w29, 16
...
```
after the `stp` is executed the (initial) rule for the CFA still says
the CFA is in the `sp`, even though it's already offset by 16 bytes
A correct unwind info could look like:
```
stp x29, x30, [sp, #-16]! // 16-byte Folded Spill
.cfi_def_cfa_offset 16
mov x29, sp
.cfi_def_cfa w29, 16
...
```
Having this information precise up to an instruction is useful for
sampling profilers that would like to get a stack backtrace. The end
goal (towards this patch is just a step) is to have fully working
`-fasynchronous-unwind-tables`.
Reviewed By: danielkiss, MaskRay
Differential Revision: https://reviews.llvm.org/D111411
The prologue and epilogue emission were unbalanced in light of different
strategies of async frame context emission. Adjust the epilogue emission
to match the prologue emission. This makes the elision work properly as
well as the deployment based. Due to the fact that the epilogue always
was clearing a bit (which should not be set in the first place), the
client would not notice the behavioural issue unless the deployment
version was in effect.
It caused builds to assert with:
(StackSize == 0 && "We already have the CFA offset!"),
function generateCompactUnwindEncoding, file AArch64AsmBackend.cpp, line 624.
when targeting iOS. See comment on the code review for reproducer.
> This patch rearranges emission of CFI instructions, so the resulting
> DWARF and `.eh_frame` information is precise at every instruction.
>
> The current state is that the unwind info is emitted only after the
> function prologue. This is fine for synchronous (e.g. C++) exceptions,
> but the information is generally incorrect when the program counter is
> at an instruction in the prologue or the epilogue, for example:
>
> ```
> stp x29, x30, [sp, #-16]! // 16-byte Folded Spill
> mov x29, sp
> .cfi_def_cfa w29, 16
> ...
> ```
>
> after the `stp` is executed the (initial) rule for the CFA still says
> the CFA is in the `sp`, even though it's already offset by 16 bytes
>
> A correct unwind info could look like:
> ```
> stp x29, x30, [sp, #-16]! // 16-byte Folded Spill
> .cfi_def_cfa_offset 16
> mov x29, sp
> .cfi_def_cfa w29, 16
> ...
> ```
>
> Having this information precise up to an instruction is useful for
> sampling profilers that would like to get a stack backtrace. The end
> goal (towards this patch is just a step) is to have fully working
> `-fasynchronous-unwind-tables`.
>
> Reviewed By: danielkiss, MaskRay
>
> Differential Revision: https://reviews.llvm.org/D111411
This reverts commit 32e8b550e5.
When the stack has SVE objects, fixed-width objects are often better accessed
from the SP, instead of the FP, because part/all of the fixed-width offset
can be folded into the (non-scalable) addressing mode, where otherwise an
ADDVL would be required.
Reviewed By: david-arm
Differential Revision: https://reviews.llvm.org/D120738
This patch rearranges emission of CFI instructions, so the resulting
DWARF and `.eh_frame` information is precise at every instruction.
The current state is that the unwind info is emitted only after the
function prologue. This is fine for synchronous (e.g. C++) exceptions,
but the information is generally incorrect when the program counter is
at an instruction in the prologue or the epilogue, for example:
```
stp x29, x30, [sp, #-16]! // 16-byte Folded Spill
mov x29, sp
.cfi_def_cfa w29, 16
...
```
after the `stp` is executed the (initial) rule for the CFA still says
the CFA is in the `sp`, even though it's already offset by 16 bytes
A correct unwind info could look like:
```
stp x29, x30, [sp, #-16]! // 16-byte Folded Spill
.cfi_def_cfa_offset 16
mov x29, sp
.cfi_def_cfa w29, 16
...
```
Having this information precise up to an instruction is useful for
sampling profilers that would like to get a stack backtrace. The end
goal (towards this patch is just a step) is to have fully working
`-fasynchronous-unwind-tables`.
Reviewed By: danielkiss, MaskRay
Differential Revision: https://reviews.llvm.org/D111411
This patch is in preparation for the async unwind CFI.
Move the emission of the shadow call stack prologue/epilogue
instructions to the `emitPrologue`/`emitEpilogue`. This greatly
simplifies especially epilogue generation and makes unnecessary some
quite fragile code, that tries to skip over those
Reviewed By: MaskRay, efriedma
Differential Revision: https://reviews.llvm.org/D112329
This patch is in preparation for the async unwind CFI.
Put the first `LDP` the end, so that the load-store optimizer can run
and merge the `LDP` and the `ADD` into a post-index `LDP`.
Do this always and as early as at the time of the initial creation of
the CSR restore instructions, even if that `LDP` is not guaranteed to
be mergeable with a subsequent `SP` increment.
This greatly simplifies the CFI generation for prologue, as otherwise
we have to take extra steps to ensure reordering does not cross CFI
instructions.
Reviewed By: danielkiss
Differential Revision: https://reviews.llvm.org/D112328
Fix a couple of things that were causing stack protection to not work
correctly in functions that have scalable vectors on the stack:
* Use TypeSize when determining if accesses to a variable are
considered out-of-bounds so that the behaviour is correct for
scalable vectors.
* When stack protection is enabled move the stack protector location
to the top of the SVE locals, so that any overflow in them (or the
other locals which are below that) will be detected.
Fixes: https://github.com/llvm/llvm-project/issues/51137
Differential Revision: https://reviews.llvm.org/D111631
Do more efforts to use sp if it is possible to lower a frame index.
Reviewers: reames, loicottet, ostannard, t.p.northover
Reviewed By: reames
Subscribers: arphaman, danilaml, hiraditya, kristof.beyls, llvm-commits, Matt, yrouban
Differential Revision: https://reviews.llvm.org/D111133
autiasp, autibsp instructions are the counterpart of paciasp/pacibsp instructions
therefore let's emit .cfi_negate_ra_state for these too.
In case of Armv8.3 instruction set the retaa/retbb will do the return and authentication
in one step here we can't emit the . cfi_negate_ra_state because that would be point after
the ret* instruction.
Reviewed By: nickdesaulniers, MaskRay
Differential Revision: https://reviews.llvm.org/D111780
PR45875 notes an instance where exception handling crashes on aarch64-fuchsia
where SCS is enabled by default. The underlying issue seems to be that within libunwind,
various _Unwind_* functions, the x18 register is not updated if a function is marked
with nounwind. This removes the check for nounwind and emits the CFI instruction that updates x18.
Differential Revision: https://reviews.llvm.org/D79822
Introduce a new command-line flag `-swift-async-fp={auto|always|never}`
that controls how code generation sets the Swift extended async frame
info bit. There are three possibilities:
* `auto`: which determines how to set the bit based on deployment target, either
statically or dynamically via `swift_async_extendedFramePointerFlags`.
* `always`: the default, always set the bit statically, regardless of deployment
target.
* `never`: never set the bit, regardless of deployment target.
Patch by Doug Gregor <dgregor@apple.com>
Reviewed By: doug.gregor
Differential Revision: https://reviews.llvm.org/D109392
When back-deploying Swift async code we can't always toggle the flag showing an
extended frame is present because it will confuse unwinders on systems released
before this feature. So in cases where the code might run there, we `or` in a
mask provided by the runtime (as an absolute symbol) telling us whether the
unwinders can cope.
When deploying only for newer OSs, we can still hard-code the bit-set for
greater efficiency.