Commit Graph

268 Commits

Author SHA1 Message Date
Florian Mayer 0593ce5f0b [MC] Add 'G' to augmentation string for MTE instrumented functions
This was agreed on in
https://lists.llvm.org/pipermail/llvm-dev/2020-May/141345.html

The thread proposed two options
* add a character to augmentation string and handle in libuwind
* use a separate personality function.

It was determined that this is the simpler and better option.

This is part of ARM's Aarch64 ABI:
https://github.com/ARM-software/abi-aa/blob/main/aadwarf64/aadwarf64.rst#id22

The next step after this is teaching libunwind to untag when this
augmentation character is set.

Reviewed By: MaskRay, eugenis

Differential Revision: https://reviews.llvm.org/D127007
2022-06-08 12:36:32 -07:00
Fangrui Song 557efc9a8b [llvm] Remove unneeded cl::ZeroOrMore for cl::opt options. NFC
Some cl::ZeroOrMore were added to avoid the `may only occur zero or one times!`
error. More were added due to cargo cult. Since the error has been removed,
cl::ZeroOrMore is unneeded.

Also remove cl::init(false) while touching the lines.
2022-06-03 21:59:05 -07:00
Bill Wendling d497129f9b [AArch64] Use proper instruction mnemonics for FPRs
The FPR128 regs need MOVIv2d_ns and SVE regs need DUP_ZI_D.

Differential Revision: https://reviews.llvm.org/D126083
2022-05-20 12:02:26 -07:00
Bill Wendling 6e00a34cdb [AArch64] Add support for -fzero-call-used-regs
Support the "-fzero-call-used-regs" option on AArch64. This involves much less
specialized code than the X86 version. Most of the checks can be done with
TableGen.

Reviewed By: nickdesaulniers, MaskRay

Differential Revision: https://reviews.llvm.org/D124836
2022-05-19 16:58:28 -07:00
Bradley Smith 8f623f4ab0 [AArch64][SVE] Restore SP from FP when SVE CSRs and variable sized objects are present
Without SVE, after a dynamic stack allocation has modified the SP, it is
presumed that a frame pointer restoration will revert the SP back to
it's correct value prior to any caller stack being restored. However the
SVE frame is restored using the stack pointer directly, as it is located
after the frame pointer. This means that in the presence of a dynamic
stack allocation, any SVE callee state gets corrupted as SP has the
incorrect value when the SVE state is restored.

To address this issue, when variable sized objects and SVE CSRs are
present, treat the stack as having been realigned, hence restoring the
stack pointer from the frame pointerr prior to restoring the SVE state.

Differential Revision: https://reviews.llvm.org/D124615
2022-05-04 12:57:03 +00:00
Saleem Abdulrasool 24ba1302b3 AArch64: modify Swift async frame record storage on Windows
The frame layout on Windows differs from that on other platforms. It
will spill the registers in descending numeric value (i.e. x30, x29,
...). Furthermore, the x29, x30 pair is particularly important as it
is used for the fast stack walking. As a result, we cannot simply
insert the Swift async frame record in between the store. To provide
the simplistic search mechanism, always spill the async frame record
prior to the spilled registers.

This was caught by the assertion failure in the frame lowering code when
building the runtime for Windows AArch64.

Fixes: #55058

Differential Revision: https://reviews.llvm.org/D124498
Reviewed By: mstorsjo
2022-04-30 09:01:33 -07:00
Daniel Kiss de07cde67b [AArch64] Emit .cfi_negate_ra_state for PAC-auth instructions.
autiasp, autibsp instructions are the counterpart of paciasp/pacibsp instructions
therefore let's emit .cfi_negate_ra_state for these too.
In case of Armv8.3 instruction set the retaa/retbb will do the return and authentication
in one step here we can't emit the . cfi_negate_ra_state because that would be point after
the ret* instruction.

Reviewed By: nickdesaulniers, MaskRay

Differential Revision: https://reviews.llvm.org/D111780
2022-04-22 13:25:57 +02:00
Momchil Velikov 24c84bd236 [AArch64] Async unwind - Fix MTE codegen emitting frame adjustments in a loop
When untagging the stack, the compiler may emit a sequence like:
```
        .LBB0_1:
          st2g sp, [sp], #32
          sub x8, x8, #32
          cbnz x8, .LBB0_1
          stg sp, [sp], #16
```
These stack adjustments cannot be described by CFI instructions.

This patch disables merging of SP update with untagging, i.e. makes the
compiler use an additional scratch register (there should be plenty
available at this point as we are in the epilogue) and generate:
```
            mov     x9, sp
            mov     x8, #256
            stg     x9, [x9], #16
    .LBB0_1:
            sub     x8, x8, #32
            st2g    x9, [x9], #32
            cbnz    x8, .LBB0_1
            add     sp, sp, #272
```
Merging is disabled only when we need to generate asynchronous unwind
tables.

Reviewed By: eugenis

Differential Revision: https://reviews.llvm.org/D114548
2022-04-15 14:00:23 +01:00
Momchil Velikov d0ea42a7c1 [AArch64] Async unwind - function epilogues
Reviewed By: MaskRay, chill

Differential Revision: https://reviews.llvm.org/D112330
2022-04-12 16:50:50 +01:00
Momchil Velikov b4ad28da19 [CodeGen] Async unwind - add a pass to fix CFI information
This pass inserts the necessary CFI instructions to compensate for the
inconsistency of the call-frame information caused by linear (non-CGA
aware) nature of the unwind tables.

Unlike the `CFIInstrInserer` pass, this one almost always emits only
`.cfi_remember_state`/`.cfi_restore_state`, which results in smaller
unwind tables and also transparently handles custom unwind info
extensions like CFA offset adjustement and save locations of SVE
registers.

This pass takes advantage of the constraints taht LLVM imposes on the
placement of save/restore points (cf. `ShrinkWrap.cpp`):

  * there is a single basic block, containing the function prologue

  * possibly multiple epilogue blocks, where each epilogue block is
    complete and self-contained, i.e. CSR restore instructions (and the
    corresponding CFI instructions are not split across two or more
    blocks.

  * prologue and epilogue blocks are outside of any loops

Thus, during execution, at the beginning and at the end of each basic
block the function can be in one of two states:

  - "has a call frame", if the function has executed the prologue, or
     has not executed any epilogue

  - "does not have a call frame", if the function has not executed the
    prologue, or has executed an epilogue

These properties can be computed for each basic block by a single RPO
traversal.

From the point of view of the unwind tables, the "has/does not have
call frame" state at beginning of each block is determined by the
state at the end of the previous block, in layout order.

Where these states differ, we insert compensating CFI instructions,
which come in two flavours:

- CFI instructions, which reset the unwind table state to the
    initial one.  This is done by a target specific hook and is
    expected to be trivial to implement, for example it could be:
```
     .cfi_def_cfa <sp>, 0
     .cfi_same_value <rN>
     .cfi_same_value <rN-1>
     ...
```
where `<rN>` are the callee-saved registers.

- CFI instructions, which reset the unwind table state to the one
    created by the function prologue. These are the sequence:
```
       .cfi_restore_state
       .cfi_remember_state
```
In this case we also insert a `.cfi_remember_state` after the
last CFI instruction in the function prologue.

Reviewed By: MaskRay, danielkiss, chill

Differential Revision: https://reviews.llvm.org/D114545
2022-04-11 13:27:26 +01:00
Muhammad Omair Javaid 0320115c16 Revert "[CodeGen] Async unwind - add a pass to fix CFI information"
This reverts commit 980c3e6dd2.

This commit had failing tests with clang crashing across various
AArch64/Linux buildots.

https://lab.llvm.org/buildbot/#/builders/179/builds/3346

Differential Revision: https://reviews.llvm.org/D114545
2022-04-05 13:12:30 +05:00
Momchil Velikov 980c3e6dd2 [CodeGen] Async unwind - add a pass to fix CFI information
This pass inserts the necessary CFI instructions to compensate for the
inconsistency of the call-frame information caused by linear (non-CFG
aware) nature of the unwind tables.

Unlike the `CFIInstrInserer` pass, this one almost always emits only
`.cfi_remember_state`/`.cfi_restore_state`, which results in smaller
unwind tables and also transparently handles custom unwind info
extensions like CFA offset adjustement and save locations of SVE
registers.

This pass takes advantage of the constraints that LLVM imposes on the
placement of save/restore points (cf. `ShrinkWrap.cpp`):

  * there is a single basic block, containing the function prologue

  * possibly multiple epilogue blocks, where each epilogue block is
    complete and self-contained, i.e. CSR restore instructions (and the
    corresponding CFI instructions are not split across two or more
    blocks.

  * prologue and epilogue blocks are outside of any loops

Thus, during execution, at the beginning and at the end of each basic
block the function can be in one of two states:

  - "has a call frame", if the function has executed the prologue, or
     has not executed any epilogue

  - "does not have a call frame", if the function has not executed the
    prologue, or has executed an epilogue

These properties can be computed for each basic block by a single RPO
traversal.

In order to accommodate backends which do not generate unwind info in
epilogues we compute an additional property "strong no call frame on
entry" which is set for the entry point of the function and for every
block reachable from the entry along a path that does not execute the
prologue. If this property holds, it takes precedence over the "has a
call frame" property.

From the point of view of the unwind tables, the "has/does not have
call frame" state at beginning of each block is determined by the
state at the end of the previous block, in layout order.

Where these states differ, we insert compensating CFI instructions,
which come in two flavours:

- CFI instructions, which reset the unwind table state to the
    initial one.  This is done by a target specific hook and is
    expected to be trivial to implement, for example it could be:
```
     .cfi_def_cfa <sp>, 0
     .cfi_same_value <rN>
     .cfi_same_value <rN-1>
     ...
```
where `<rN>` are the callee-saved registers.

- CFI instructions, which reset the unwind table state to the one
    created by the function prologue. These are the sequence:
```
       .cfi_restore_state
       .cfi_remember_state
```
In this case we also insert a `.cfi_remember_state` after the
last CFI instruction in the function prologue.

Reviewed By: MaskRay, danielkiss, chill

Differential Revision: https://reviews.llvm.org/D114545
2022-04-04 14:38:22 +01:00
Momchil Velikov 50a97aacac [AArch64] Async unwind - function prologues
Re-commit of 32e8b550e5

This patch rearranges emission of CFI instructions, so the resulting
DWARF and `.eh_frame` information is precise at every instruction.

The current state is that the unwind info is emitted only after the
function prologue. This is fine for synchronous (e.g. C++) exceptions,
but the information is generally incorrect when the program counter is
at an instruction in the prologue or the epilogue, for example:

```
stp	x29, x30, [sp, #-16]!           // 16-byte Folded Spill
mov	x29, sp
.cfi_def_cfa w29, 16
...
```

after the `stp` is executed the (initial) rule for the CFA still says
the CFA is in the `sp`, even though it's already offset by 16 bytes

A correct unwind info could look like:
```
stp	x29, x30, [sp, #-16]!           // 16-byte Folded Spill
.cfi_def_cfa_offset 16
mov	x29, sp
.cfi_def_cfa w29, 16
...
```

Having this information precise up to an instruction is useful for
sampling profilers that would like to get a stack backtrace. The end
goal (towards this patch is just a step) is to have fully working
`-fasynchronous-unwind-tables`.

Reviewed By: danielkiss, MaskRay

Differential Revision: https://reviews.llvm.org/D111411
2022-03-24 16:16:44 +00:00
Saleem Abdulrasool c31f0a0050 AArch64: correct epilogue/prologue emission for swift async
The prologue and epilogue emission were unbalanced in light of different
strategies of async frame context emission.  Adjust the epilogue emission
to match the prologue emission.  This makes the elision work properly as
well as the deployment based.  Due to the fact that the epilogue always
was clearing a bit (which should not be set in the first place), the
client would not notice the behavioural issue unless the deployment
version was in effect.
2022-03-09 18:41:10 +00:00
Hans Wennborg 85c53c7092 Revert "[AArch64] Async unwind - function prologues"
It caused builds to assert with:

  (StackSize == 0 && "We already have the CFA offset!"),
  function generateCompactUnwindEncoding, file AArch64AsmBackend.cpp, line 624.

when targeting iOS. See comment on the code review for reproducer.

> This patch rearranges emission of CFI instructions, so the resulting
> DWARF and `.eh_frame` information is precise at every instruction.
>
> The current state is that the unwind info is emitted only after the
> function prologue. This is fine for synchronous (e.g. C++) exceptions,
> but the information is generally incorrect when the program counter is
> at an instruction in the prologue or the epilogue, for example:
>
> ```
> stp     x29, x30, [sp, #-16]!           // 16-byte Folded Spill
> mov     x29, sp
> .cfi_def_cfa w29, 16
> ...
> ```
>
> after the `stp` is executed the (initial) rule for the CFA still says
> the CFA is in the `sp`, even though it's already offset by 16 bytes
>
> A correct unwind info could look like:
> ```
> stp     x29, x30, [sp, #-16]!           // 16-byte Folded Spill
> .cfi_def_cfa_offset 16
> mov     x29, sp
> .cfi_def_cfa w29, 16
> ...
> ```
>
> Having this information precise up to an instruction is useful for
> sampling profilers that would like to get a stack backtrace. The end
> goal (towards this patch is just a step) is to have fully working
> `-fasynchronous-unwind-tables`.
>
> Reviewed By: danielkiss, MaskRay
>
> Differential Revision: https://reviews.llvm.org/D111411

This reverts commit 32e8b550e5.
2022-03-04 17:36:26 +01:00
Sander de Smalen 7c65d2288b [AArch64] Improve access to fixed-width object when stack has SVE.
When the stack has SVE objects, fixed-width objects are often better accessed
from the SP, instead of the FP, because part/all of the fixed-width offset
can be folded into the (non-scalable) addressing mode, where otherwise an
ADDVL would be required.

Reviewed By: david-arm

Differential Revision: https://reviews.llvm.org/D120738
2022-03-04 09:33:59 +00:00
Momchil Velikov 63c9aca12a Revert "[AArch64] Async unwind - function epilogues"
This reverts commit 74319d6794.

It causes test failures that look like infinite loop in asan/hwasan
unwinding.
2022-03-02 15:01:57 +00:00
Momchil Velikov 74319d6794 [AArch64] Async unwind - function epilogues
Counterpart of https://reviews.llvm.org/D111411 this change makes the
unwind information instruction precise in function epilogues.

Reviewed By: MaskRay

Differential Revision: https://reviews.llvm.org/D112330
2022-03-02 13:15:11 +00:00
Momchil Velikov 32e8b550e5 [AArch64] Async unwind - function prologues
This patch rearranges emission of CFI instructions, so the resulting
DWARF and `.eh_frame` information is precise at every instruction.

The current state is that the unwind info is emitted only after the
function prologue. This is fine for synchronous (e.g. C++) exceptions,
but the information is generally incorrect when the program counter is
at an instruction in the prologue or the epilogue, for example:

```
stp	x29, x30, [sp, #-16]!           // 16-byte Folded Spill
mov	x29, sp
.cfi_def_cfa w29, 16
...
```

after the `stp` is executed the (initial) rule for the CFA still says
the CFA is in the `sp`, even though it's already offset by 16 bytes

A correct unwind info could look like:
```
stp	x29, x30, [sp, #-16]!           // 16-byte Folded Spill
.cfi_def_cfa_offset 16
mov	x29, sp
.cfi_def_cfa w29, 16
...
```

Having this information precise up to an instruction is useful for
sampling profilers that would like to get a stack backtrace. The end
goal (towards this patch is just a step) is to have fully working
`-fasynchronous-unwind-tables`.

Reviewed By: danielkiss, MaskRay

Differential Revision: https://reviews.llvm.org/D111411
2022-02-28 13:37:57 +00:00
Momchil Velikov 20a093e2bc [AArch64] Async unwind - Refactor generation of shadow call stack prologue/epilogue
This patch is in preparation for the async unwind CFI.

Move the emission of the shadow call stack prologue/epilogue
instructions to the `emitPrologue`/`emitEpilogue`. This greatly
simplifies especially epilogue generation and makes unnecessary some
quite fragile code, that tries to skip over those

Reviewed By: MaskRay, efriedma

Differential Revision: https://reviews.llvm.org/D112329
2022-02-25 11:09:23 +00:00
Momchil Velikov 17e85cd410 [AArch64] Async unwind - Always place the first LDP at the end when ReverseCSRRestoreSeq is true
This patch is in preparation for the async unwind CFI.

Put the first `LDP` the end, so that the load-store optimizer can run
and merge the `LDP` and the `ADD` into a post-index `LDP`.

Do this always and as early as at the time of the initial creation of
the CSR restore instructions, even if that `LDP` is not guaranteed to
be mergeable with a subsequent `SP` increment.

This greatly simplifies the CFI generation for prologue, as otherwise
we have to take extra steps to ensure reordering does not cross CFI
instructions.

Reviewed By: danielkiss

Differential Revision: https://reviews.llvm.org/D112328
2022-02-24 18:48:07 +00:00
Momchil Velikov 25e92920c9 [AArch64] Async unwind - helper functions to decide on CFI emission
Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D112327
2022-02-24 18:16:50 +00:00
Jim Lin d6b0734837 [NFC] Use Register instead of unsigned 2022-01-19 20:17:04 +08:00
Tim Northover 581e855623 AArch64: don't claim to preserve registers used by prologue code 2022-01-10 12:27:04 +00:00
Daniel Kiss 131c06e6da Revert "[AArch64] Emit .cfi_negate_ra_state for PAC-auth instructions."
This reverts commit f903c85055.
2022-01-06 19:17:45 +01:00
John Brawn dc9f65be45 [AArch64][SVE] Fix handling of stack protection with SVE
Fix a couple of things that were causing stack protection to not work
correctly in functions that have scalable vectors on the stack:
 * Use TypeSize when determining if accesses to a variable are
   considered out-of-bounds so that the behaviour is correct for
   scalable vectors.
 * When stack protection is enabled move the stack protector location
   to the top of the SVE locals, so that any overflow in them (or the
   other locals which are below that) will be detected.

Fixes: https://github.com/llvm/llvm-project/issues/51137

Differential Revision: https://reviews.llvm.org/D111631
2021-12-14 11:30:48 +00:00
Serguei Katkov 3557f49353 [AARCH64] Teach AArch64FrameLowering::getFrameIndexReferencePreferSP really prefer SP.
Do more efforts to use sp if it is possible to lower a frame index.

Reviewers: reames, loicottet, ostannard, t.p.northover
Reviewed By: reames
Subscribers: arphaman, danilaml, hiraditya, kristof.beyls, llvm-commits, Matt, yrouban
Differential Revision: https://reviews.llvm.org/D111133
2021-11-19 11:14:02 +07:00
Kazu Hirata 14d656b3d8 [Target] Use llvm::reverse (NFC) 2021-11-06 13:08:21 -07:00
Daniel Kiss f903c85055 [AArch64] Emit .cfi_negate_ra_state for PAC-auth instructions.
autiasp, autibsp instructions are the counterpart of paciasp/pacibsp instructions
therefore let's emit .cfi_negate_ra_state for these too.
In case of Armv8.3 instruction set the retaa/retbb will do the return and authentication
in one step here we can't emit the . cfi_negate_ra_state because that would be point after
the ret* instruction.

Reviewed By: nickdesaulniers, MaskRay

Differential Revision: https://reviews.llvm.org/D111780
2021-10-20 11:03:52 +02:00
Leonard Chan 4dc462b589 [AArch64] Emit CFI instruction for updating x18 when using ShadowCallStack with exception unwinding
PR45875 notes an instance where exception handling crashes on aarch64-fuchsia
where SCS is enabled by default. The underlying issue seems to be that within libunwind,
various _Unwind_* functions, the x18 register is not updated if a function is marked
with nounwind. This removes the check for nounwind and emits the CFI instruction that updates x18.

Differential Revision: https://reviews.llvm.org/D79822
2021-10-08 14:20:26 -07:00
Doug Gregor a773db7d76 Add a command-line flag to control the Swift extended async frame info.
Introduce a new command-line flag `-swift-async-fp={auto|always|never}`
that controls how code generation sets the Swift extended async frame
info bit. There are three possibilities:

* `auto`: which determines how to set the bit based on deployment target, either
statically or dynamically via `swift_async_extendedFramePointerFlags`.
* `always`: the default, always set the bit statically, regardless of deployment
target.
* `never`: never set the bit, regardless of deployment target.

Patch by Doug Gregor <dgregor@apple.com>

Reviewed By: doug.gregor

Differential Revision: https://reviews.llvm.org/D109392
2021-09-16 06:57:45 -07:00
Tim Northover 5d070c8259 SwiftAsync: use runtime-provided flag for extended frame if back-deploying
When back-deploying Swift async code we can't always toggle the flag showing an
extended frame is present because it will confuse unwinders on systems released
before this feature. So in cases where the code might run there, we `or` in a
mask provided by the runtime (as an absolute symbol) telling us whether the
unwinders can cope.

When deploying only for newer OSs, we can still hard-code the bit-set for
greater efficiency.
2021-09-13 13:54:46 +01:00
Fangrui Song 0e03450ae4 [AArch64] Remove an uneeded !NeedsWinCFI check. NFC 2021-09-05 21:02:56 -07:00
Kyungwoo Lee 6530ea4095 [AArch64] Fix Local Deallocation for Homogeneous Prolog/Epilog
The stack adjustment for local deallocation was incorrectly ported.

Reviewed By: MaskRay

Differential Revision: https://reviews.llvm.org/D106760
2021-07-25 10:51:11 -07:00
Pablo Barrio 571c8c5263 [AArch64][v8.3A] Avoid inserting implicit landing pads (PACI*SP)
PACI*SP have the advantage that they are in HINT space, meaning
they can be run successfully in hardware without PAuth support -
they will just behave as a NOP. However, PACI*SP are also implicit
landing pads (think of an extra BTI jc). Therefore, they allow
indirect jumps of all kinds into them, potentially inserting new
gadgets. This patch replaces PACI*SP by PACI* LR, SP when
compiling explicitly for hardware with full PAuth support. PACI*
is not in the HINT space, therefore it will fault when run in
hardware without PAuth support, but it is also not a landing pad,
making programs safer in newer HW.

Differential Revision: https://reviews.llvm.org/D101920
2021-06-24 18:24:32 +01:00
Tim Northover 769ced3d57 AArch64: mark x22 livein if it's an async context that gets stored.
This fixes a crash with expensive checks enabled (the verifier was not happy).
2021-05-17 11:56:03 +01:00
Tim Northover 82a0e808bb IR/AArch64/X86: add "swifttailcc" calling convention.
Swift's new concurrency features are going to require guaranteed tail calls so
that they don't consume excessive amounts of stack space. This would normally
mean "tailcc", but there are also Swift-specific ABI desires that don't
naturally go along with "tailcc" so this adds another calling convention that's
the combination of "swiftcc" and "tailcc".

Support is added for AArch64 and X86 for now.
2021-05-17 10:48:34 +01:00
Tim Northover ea0eec69f1 IR+AArch64: add a "swiftasync" argument attribute.
This extends any frame record created in the function to include that
parameter, passed in X22.

The new record looks like [X22, FP, LR] in memory, and FP is stored with 0b0001
in bits 63:60 (CodeGen assumes they are 0b0000 in normal operation). The effect
of this is that tools walking the stack should expect to see one of three
values there:

  * 0b0000 => a normal, non-extended record with just [FP, LR]
  * 0b0001 => the extended record [X22, FP, LR]
  * 0b1111 => kernel space, and a non-extended record.

All other values are currently reserved.

If compiling for arm64e this context pointer is address-discriminated with the
discriminator 0xc31a and the DB (process-specific) key.

There is also an "i8** @llvm.swift.async.context.addr()" intrinsic providing
front-ends access to this slot (and forcing its creation initialized to nullptr
if necessary).
2021-05-14 11:43:58 +01:00
Tomas Matheson a9968c0a33 [NFC][CodeGen] Tidy up TargetRegisterInfo stack realignment functions
Currently needsStackRealignment returns false if canRealignStack returns false.
This means that the behavior of needsStackRealignment does not correspond to
it's name and description; a function might need stack realignment, but if it
is not possible then this function returns false. Furthermore,
needsStackRealignment is not virtual and therefore some backends have made use
of canRealignStack to indicate whether a function needs stack realignment.

This patch attempts to clarify the situation by separating them and introducing
new names:

 - shouldRealignStack - true if there is any reason the stack should be
   realigned

 - canRealignStack - true if we are still able to realign the stack (e.g. we
   can still reserve/have reserved a frame pointer)

 - hasStackRealignment = shouldRealignStack && canRealignStack (not target
   customisable)

Targets can now override shouldRealignStack to indicate that stack realignment
is required.

This change will make it easier in a future change to handle the case where we
need to realign the stack but can't do so (for example when the register
allocator creates an aligned spill after the frame pointer has been
eliminated).

Differential Revision: https://reviews.llvm.org/D98716

Change-Id: Ib9a4d21728bf9d08a545b4365418d3ffe1af4d87
2021-03-30 17:31:39 +01:00
Bradley Smith ea834c8365 Revert "[AArch64][SVE] Allow accesses to SVE stack objects to use frame pointer"
This patch introduced codegen faults.  An attempt to fix this was done
in https://reviews.llvm.org/D97193, but ultimately it was decided to
approach this differently.

This reverts commit 42635856ed.

Differential Revision: https://reviews.llvm.org/D98350
2021-03-11 13:32:35 +00:00
Oliver Stannard 8d632ca436 [ARM] Add comment explaining stack frame layout
Add a comment explaining how we lay out stack frames for ARM targets,
based on the existing one for AArch64. Also expand the comment to
explain reserved call frames for both architectures.

Differential revision: https://reviews.llvm.org/D98258
2021-03-09 15:20:32 +00:00
Amara Emerson 0146d20631 [AArch64] Do not fold SP adjustments into pre-increment addr modes if it overflows the redzone.
Instead of outright disabling this completely with the noredzone attribute,
we only avoid doing the optimization if there are memory operations between
the adjustment and the load/store that the adjustment would be folded into.
This avoids the case of something like a stack cookie being corrupted if an
exception happens before the pre-increment to the SP occurs.

This also prevents the folding happening if we have a redzone, but the offset
being folded is above the redzone amount (128 bytes in this case).

rdar://73269336

Differential Revision: https://reviews.llvm.org/D95179
2021-02-24 09:55:48 -08:00
Kyungwoo Lee 4f58b1bd29 [AArch64] Homogeneous Prolog and Epilog Size Optimization
Second land attempt. MachineVerifier DefRegState expensive check errors fixed.

Prologs and epilogs handle callee-save registers and tend to be irregular with
different immediate offsets that are not often handled by the MachineOutliner.
Commit D18619/a5335647d5e8 (combining stack operations) stretched irregularity
further.

This patch tries to emit homogeneous stores and loads with the same offset for
prologs and epilogs respectively. We have observed that this canonicalizes
(homogenizes) prologs and epilogs significantly and results in a greatly
increased chance of outlining, resulting in a code size reduction.

Despite the above results, there are still size wins to be had that the
MachineOutliner does not provide due to the special handling X30/LR. To handle
the LR case, his patch custom-outlines prologs and epilogs in place. It does
this by doing the following:

  * Injects HOM_Prolog and HOM_Epilog pseudo instructions during a Prolog and
    Epilog Injection Pass.
  * Lowers and optimizes said pseudos in a AArchLowerHomogneousPrologEpilog Pass.
  * Outlined helpers are created on demand. Identical helpers are merged by the linker.
  * An opt-in flag is introduced to enable this feature. Another threshold flag
    is also introduced to control the aggressiveness of outlining for application's need.

This reduced an average of 4% of code size on LLVM-TestSuite/CTMark targeting arm64/-Oz.

Differential Revision: https://reviews.llvm.org/D76570
2021-02-02 14:57:26 -08:00
Puyan Lotfi 8f7f2c4211 Revert "[AArch64] Homogeneous Prolog and Epilog Size Optimization"
This reverts commit 0426be3df6.

Reverting due to some expensive-checks failures in tests.
2021-02-02 02:33:44 -05:00
Kyungwoo Lee 0426be3df6 [AArch64] Homogeneous Prolog and Epilog Size Optimization
Prologs and epilogs handle callee-save registers and tend to be irregular with
different immediate offsets that are not often handled by the MachineOutliner.
Commit D18619/a5335647d5e8 (combining stack operations) stretched irregularity
further.

This patch tries to emit homogeneous stores and loads with the same offset for
prologs and epilogs respectively. We have observed that this canonicalizes
(homogenizes) prologs and epilogs significantly and results in a greatly
increased chance of outlining, resulting in a code size reduction.

Despite the above results, there are still size wins to be had that the
MachineOutliner does not provide due to the special handling X30/LR. To handle
the LR case, his patch custom-outlines prologs and epilogs in place. It does
this by doing the following:

  * Injects HOM_Prolog and HOM_Epilog pseudo instructions during a Prolog and
    Epilog Injection Pass.
  * Lowers and optimizes said pseudos in a AArchLowerHomogneousPrologEpilog Pass.
  * Outlined helpers are created on demand. Identical helpers are merged by the linker.
  * An opt-in flag is introduced to enable this feature. Another threshold flag
    is also introduced to control the aggressiveness of outlining for application's need.

This reduced an average of 4% of code size on LLVM-TestSuite/CTMark targeting arm64/-Oz.

Differential Revision: https://reviews.llvm.org/D76570
2021-02-02 00:26:51 -05:00
Bradley Smith 42635856ed [AArch64][SVE] Allow accesses to SVE stack objects to use frame pointer
The layout of the stack frame for SVE means that using the frame pointer
rather than the stack pointer for an access to an SVE stack object
removes the need for an additional add to jump over the non-SVE objects.

Likewise the opposite is true for non-SVE stack objects.

This patch allows for the former to be done by having HasFP return true
in the presence of both SVE and non-SVE stack objects, and also fixes a
minor issue whereby the later would not be done for certain offsets.
2021-01-28 12:39:57 +00:00
Hsiangkai Wang 914e2f5a02 [NFC] Use generic name for scalable vector stack ID.
Differential Revision: https://reviews.llvm.org/D94471
2021-01-13 10:57:43 +08:00
Mark Murray af7cce2fa4 [AArch64] Add +pauth archictecture option, allowing the v8.3a pointer authentication extension.
Differential Revision: https://reviews.llvm.org/D94083
2021-01-08 13:21:11 +00:00
Jay Foad 000400ca0a Fix speling in comments. NFC. 2020-11-23 14:43:24 +00:00
Sander de Smalen d57bba7cf8 [SVE] Return StackOffset for TargetFrameLowering::getFrameIndexReference.
To accommodate frame layouts that have both fixed and scalable objects
on the stack, describing a stack location or offset using a pointer + uint64_t
is not sufficient. For this reason, we've introduced the StackOffset class,
which models both the fixed- and scalable sized offsets.

The TargetFrameLowering::getFrameIndexReference is made to return a StackOffset,
so that this can be used in other interfaces, such as to eliminate frame indices
in PEI or to emit Debug locations for variables on the stack.

This patch is purely mechanical and doesn't change the behaviour of how
the result of this function is used for fixed-sized offsets. The patch adds
various checks to assert that the offset has no scalable component, as frame
offsets with a scalable component are not yet supported in various places.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D90018
2020-11-05 11:02:18 +00:00