Commit Graph

255 Commits

Author SHA1 Message Date
Saleem Abdulrasool c31f0a0050 AArch64: correct epilogue/prologue emission for swift async
The prologue and epilogue emission were unbalanced in light of different
strategies of async frame context emission.  Adjust the epilogue emission
to match the prologue emission.  This makes the elision work properly as
well as the deployment based.  Due to the fact that the epilogue always
was clearing a bit (which should not be set in the first place), the
client would not notice the behavioural issue unless the deployment
version was in effect.
2022-03-09 18:41:10 +00:00
Hans Wennborg 85c53c7092 Revert "[AArch64] Async unwind - function prologues"
It caused builds to assert with:

  (StackSize == 0 && "We already have the CFA offset!"),
  function generateCompactUnwindEncoding, file AArch64AsmBackend.cpp, line 624.

when targeting iOS. See comment on the code review for reproducer.

> This patch rearranges emission of CFI instructions, so the resulting
> DWARF and `.eh_frame` information is precise at every instruction.
>
> The current state is that the unwind info is emitted only after the
> function prologue. This is fine for synchronous (e.g. C++) exceptions,
> but the information is generally incorrect when the program counter is
> at an instruction in the prologue or the epilogue, for example:
>
> ```
> stp     x29, x30, [sp, #-16]!           // 16-byte Folded Spill
> mov     x29, sp
> .cfi_def_cfa w29, 16
> ...
> ```
>
> after the `stp` is executed the (initial) rule for the CFA still says
> the CFA is in the `sp`, even though it's already offset by 16 bytes
>
> A correct unwind info could look like:
> ```
> stp     x29, x30, [sp, #-16]!           // 16-byte Folded Spill
> .cfi_def_cfa_offset 16
> mov     x29, sp
> .cfi_def_cfa w29, 16
> ...
> ```
>
> Having this information precise up to an instruction is useful for
> sampling profilers that would like to get a stack backtrace. The end
> goal (towards this patch is just a step) is to have fully working
> `-fasynchronous-unwind-tables`.
>
> Reviewed By: danielkiss, MaskRay
>
> Differential Revision: https://reviews.llvm.org/D111411

This reverts commit 32e8b550e5.
2022-03-04 17:36:26 +01:00
Sander de Smalen 7c65d2288b [AArch64] Improve access to fixed-width object when stack has SVE.
When the stack has SVE objects, fixed-width objects are often better accessed
from the SP, instead of the FP, because part/all of the fixed-width offset
can be folded into the (non-scalable) addressing mode, where otherwise an
ADDVL would be required.

Reviewed By: david-arm

Differential Revision: https://reviews.llvm.org/D120738
2022-03-04 09:33:59 +00:00
Momchil Velikov 63c9aca12a Revert "[AArch64] Async unwind - function epilogues"
This reverts commit 74319d6794.

It causes test failures that look like infinite loop in asan/hwasan
unwinding.
2022-03-02 15:01:57 +00:00
Momchil Velikov 74319d6794 [AArch64] Async unwind - function epilogues
Counterpart of https://reviews.llvm.org/D111411 this change makes the
unwind information instruction precise in function epilogues.

Reviewed By: MaskRay

Differential Revision: https://reviews.llvm.org/D112330
2022-03-02 13:15:11 +00:00
Momchil Velikov 32e8b550e5 [AArch64] Async unwind - function prologues
This patch rearranges emission of CFI instructions, so the resulting
DWARF and `.eh_frame` information is precise at every instruction.

The current state is that the unwind info is emitted only after the
function prologue. This is fine for synchronous (e.g. C++) exceptions,
but the information is generally incorrect when the program counter is
at an instruction in the prologue or the epilogue, for example:

```
stp	x29, x30, [sp, #-16]!           // 16-byte Folded Spill
mov	x29, sp
.cfi_def_cfa w29, 16
...
```

after the `stp` is executed the (initial) rule for the CFA still says
the CFA is in the `sp`, even though it's already offset by 16 bytes

A correct unwind info could look like:
```
stp	x29, x30, [sp, #-16]!           // 16-byte Folded Spill
.cfi_def_cfa_offset 16
mov	x29, sp
.cfi_def_cfa w29, 16
...
```

Having this information precise up to an instruction is useful for
sampling profilers that would like to get a stack backtrace. The end
goal (towards this patch is just a step) is to have fully working
`-fasynchronous-unwind-tables`.

Reviewed By: danielkiss, MaskRay

Differential Revision: https://reviews.llvm.org/D111411
2022-02-28 13:37:57 +00:00
Momchil Velikov 20a093e2bc [AArch64] Async unwind - Refactor generation of shadow call stack prologue/epilogue
This patch is in preparation for the async unwind CFI.

Move the emission of the shadow call stack prologue/epilogue
instructions to the `emitPrologue`/`emitEpilogue`. This greatly
simplifies especially epilogue generation and makes unnecessary some
quite fragile code, that tries to skip over those

Reviewed By: MaskRay, efriedma

Differential Revision: https://reviews.llvm.org/D112329
2022-02-25 11:09:23 +00:00
Momchil Velikov 17e85cd410 [AArch64] Async unwind - Always place the first LDP at the end when ReverseCSRRestoreSeq is true
This patch is in preparation for the async unwind CFI.

Put the first `LDP` the end, so that the load-store optimizer can run
and merge the `LDP` and the `ADD` into a post-index `LDP`.

Do this always and as early as at the time of the initial creation of
the CSR restore instructions, even if that `LDP` is not guaranteed to
be mergeable with a subsequent `SP` increment.

This greatly simplifies the CFI generation for prologue, as otherwise
we have to take extra steps to ensure reordering does not cross CFI
instructions.

Reviewed By: danielkiss

Differential Revision: https://reviews.llvm.org/D112328
2022-02-24 18:48:07 +00:00
Momchil Velikov 25e92920c9 [AArch64] Async unwind - helper functions to decide on CFI emission
Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D112327
2022-02-24 18:16:50 +00:00
Jim Lin d6b0734837 [NFC] Use Register instead of unsigned 2022-01-19 20:17:04 +08:00
Tim Northover 581e855623 AArch64: don't claim to preserve registers used by prologue code 2022-01-10 12:27:04 +00:00
Daniel Kiss 131c06e6da Revert "[AArch64] Emit .cfi_negate_ra_state for PAC-auth instructions."
This reverts commit f903c85055.
2022-01-06 19:17:45 +01:00
John Brawn dc9f65be45 [AArch64][SVE] Fix handling of stack protection with SVE
Fix a couple of things that were causing stack protection to not work
correctly in functions that have scalable vectors on the stack:
 * Use TypeSize when determining if accesses to a variable are
   considered out-of-bounds so that the behaviour is correct for
   scalable vectors.
 * When stack protection is enabled move the stack protector location
   to the top of the SVE locals, so that any overflow in them (or the
   other locals which are below that) will be detected.

Fixes: https://github.com/llvm/llvm-project/issues/51137

Differential Revision: https://reviews.llvm.org/D111631
2021-12-14 11:30:48 +00:00
Serguei Katkov 3557f49353 [AARCH64] Teach AArch64FrameLowering::getFrameIndexReferencePreferSP really prefer SP.
Do more efforts to use sp if it is possible to lower a frame index.

Reviewers: reames, loicottet, ostannard, t.p.northover
Reviewed By: reames
Subscribers: arphaman, danilaml, hiraditya, kristof.beyls, llvm-commits, Matt, yrouban
Differential Revision: https://reviews.llvm.org/D111133
2021-11-19 11:14:02 +07:00
Kazu Hirata 14d656b3d8 [Target] Use llvm::reverse (NFC) 2021-11-06 13:08:21 -07:00
Daniel Kiss f903c85055 [AArch64] Emit .cfi_negate_ra_state for PAC-auth instructions.
autiasp, autibsp instructions are the counterpart of paciasp/pacibsp instructions
therefore let's emit .cfi_negate_ra_state for these too.
In case of Armv8.3 instruction set the retaa/retbb will do the return and authentication
in one step here we can't emit the . cfi_negate_ra_state because that would be point after
the ret* instruction.

Reviewed By: nickdesaulniers, MaskRay

Differential Revision: https://reviews.llvm.org/D111780
2021-10-20 11:03:52 +02:00
Leonard Chan 4dc462b589 [AArch64] Emit CFI instruction for updating x18 when using ShadowCallStack with exception unwinding
PR45875 notes an instance where exception handling crashes on aarch64-fuchsia
where SCS is enabled by default. The underlying issue seems to be that within libunwind,
various _Unwind_* functions, the x18 register is not updated if a function is marked
with nounwind. This removes the check for nounwind and emits the CFI instruction that updates x18.

Differential Revision: https://reviews.llvm.org/D79822
2021-10-08 14:20:26 -07:00
Doug Gregor a773db7d76 Add a command-line flag to control the Swift extended async frame info.
Introduce a new command-line flag `-swift-async-fp={auto|always|never}`
that controls how code generation sets the Swift extended async frame
info bit. There are three possibilities:

* `auto`: which determines how to set the bit based on deployment target, either
statically or dynamically via `swift_async_extendedFramePointerFlags`.
* `always`: the default, always set the bit statically, regardless of deployment
target.
* `never`: never set the bit, regardless of deployment target.

Patch by Doug Gregor <dgregor@apple.com>

Reviewed By: doug.gregor

Differential Revision: https://reviews.llvm.org/D109392
2021-09-16 06:57:45 -07:00
Tim Northover 5d070c8259 SwiftAsync: use runtime-provided flag for extended frame if back-deploying
When back-deploying Swift async code we can't always toggle the flag showing an
extended frame is present because it will confuse unwinders on systems released
before this feature. So in cases where the code might run there, we `or` in a
mask provided by the runtime (as an absolute symbol) telling us whether the
unwinders can cope.

When deploying only for newer OSs, we can still hard-code the bit-set for
greater efficiency.
2021-09-13 13:54:46 +01:00
Fangrui Song 0e03450ae4 [AArch64] Remove an uneeded !NeedsWinCFI check. NFC 2021-09-05 21:02:56 -07:00
Kyungwoo Lee 6530ea4095 [AArch64] Fix Local Deallocation for Homogeneous Prolog/Epilog
The stack adjustment for local deallocation was incorrectly ported.

Reviewed By: MaskRay

Differential Revision: https://reviews.llvm.org/D106760
2021-07-25 10:51:11 -07:00
Pablo Barrio 571c8c5263 [AArch64][v8.3A] Avoid inserting implicit landing pads (PACI*SP)
PACI*SP have the advantage that they are in HINT space, meaning
they can be run successfully in hardware without PAuth support -
they will just behave as a NOP. However, PACI*SP are also implicit
landing pads (think of an extra BTI jc). Therefore, they allow
indirect jumps of all kinds into them, potentially inserting new
gadgets. This patch replaces PACI*SP by PACI* LR, SP when
compiling explicitly for hardware with full PAuth support. PACI*
is not in the HINT space, therefore it will fault when run in
hardware without PAuth support, but it is also not a landing pad,
making programs safer in newer HW.

Differential Revision: https://reviews.llvm.org/D101920
2021-06-24 18:24:32 +01:00
Tim Northover 769ced3d57 AArch64: mark x22 livein if it's an async context that gets stored.
This fixes a crash with expensive checks enabled (the verifier was not happy).
2021-05-17 11:56:03 +01:00
Tim Northover 82a0e808bb IR/AArch64/X86: add "swifttailcc" calling convention.
Swift's new concurrency features are going to require guaranteed tail calls so
that they don't consume excessive amounts of stack space. This would normally
mean "tailcc", but there are also Swift-specific ABI desires that don't
naturally go along with "tailcc" so this adds another calling convention that's
the combination of "swiftcc" and "tailcc".

Support is added for AArch64 and X86 for now.
2021-05-17 10:48:34 +01:00
Tim Northover ea0eec69f1 IR+AArch64: add a "swiftasync" argument attribute.
This extends any frame record created in the function to include that
parameter, passed in X22.

The new record looks like [X22, FP, LR] in memory, and FP is stored with 0b0001
in bits 63:60 (CodeGen assumes they are 0b0000 in normal operation). The effect
of this is that tools walking the stack should expect to see one of three
values there:

  * 0b0000 => a normal, non-extended record with just [FP, LR]
  * 0b0001 => the extended record [X22, FP, LR]
  * 0b1111 => kernel space, and a non-extended record.

All other values are currently reserved.

If compiling for arm64e this context pointer is address-discriminated with the
discriminator 0xc31a and the DB (process-specific) key.

There is also an "i8** @llvm.swift.async.context.addr()" intrinsic providing
front-ends access to this slot (and forcing its creation initialized to nullptr
if necessary).
2021-05-14 11:43:58 +01:00
Tomas Matheson a9968c0a33 [NFC][CodeGen] Tidy up TargetRegisterInfo stack realignment functions
Currently needsStackRealignment returns false if canRealignStack returns false.
This means that the behavior of needsStackRealignment does not correspond to
it's name and description; a function might need stack realignment, but if it
is not possible then this function returns false. Furthermore,
needsStackRealignment is not virtual and therefore some backends have made use
of canRealignStack to indicate whether a function needs stack realignment.

This patch attempts to clarify the situation by separating them and introducing
new names:

 - shouldRealignStack - true if there is any reason the stack should be
   realigned

 - canRealignStack - true if we are still able to realign the stack (e.g. we
   can still reserve/have reserved a frame pointer)

 - hasStackRealignment = shouldRealignStack && canRealignStack (not target
   customisable)

Targets can now override shouldRealignStack to indicate that stack realignment
is required.

This change will make it easier in a future change to handle the case where we
need to realign the stack but can't do so (for example when the register
allocator creates an aligned spill after the frame pointer has been
eliminated).

Differential Revision: https://reviews.llvm.org/D98716

Change-Id: Ib9a4d21728bf9d08a545b4365418d3ffe1af4d87
2021-03-30 17:31:39 +01:00
Bradley Smith ea834c8365 Revert "[AArch64][SVE] Allow accesses to SVE stack objects to use frame pointer"
This patch introduced codegen faults.  An attempt to fix this was done
in https://reviews.llvm.org/D97193, but ultimately it was decided to
approach this differently.

This reverts commit 42635856ed.

Differential Revision: https://reviews.llvm.org/D98350
2021-03-11 13:32:35 +00:00
Oliver Stannard 8d632ca436 [ARM] Add comment explaining stack frame layout
Add a comment explaining how we lay out stack frames for ARM targets,
based on the existing one for AArch64. Also expand the comment to
explain reserved call frames for both architectures.

Differential revision: https://reviews.llvm.org/D98258
2021-03-09 15:20:32 +00:00
Amara Emerson 0146d20631 [AArch64] Do not fold SP adjustments into pre-increment addr modes if it overflows the redzone.
Instead of outright disabling this completely with the noredzone attribute,
we only avoid doing the optimization if there are memory operations between
the adjustment and the load/store that the adjustment would be folded into.
This avoids the case of something like a stack cookie being corrupted if an
exception happens before the pre-increment to the SP occurs.

This also prevents the folding happening if we have a redzone, but the offset
being folded is above the redzone amount (128 bytes in this case).

rdar://73269336

Differential Revision: https://reviews.llvm.org/D95179
2021-02-24 09:55:48 -08:00
Kyungwoo Lee 4f58b1bd29 [AArch64] Homogeneous Prolog and Epilog Size Optimization
Second land attempt. MachineVerifier DefRegState expensive check errors fixed.

Prologs and epilogs handle callee-save registers and tend to be irregular with
different immediate offsets that are not often handled by the MachineOutliner.
Commit D18619/a5335647d5e8 (combining stack operations) stretched irregularity
further.

This patch tries to emit homogeneous stores and loads with the same offset for
prologs and epilogs respectively. We have observed that this canonicalizes
(homogenizes) prologs and epilogs significantly and results in a greatly
increased chance of outlining, resulting in a code size reduction.

Despite the above results, there are still size wins to be had that the
MachineOutliner does not provide due to the special handling X30/LR. To handle
the LR case, his patch custom-outlines prologs and epilogs in place. It does
this by doing the following:

  * Injects HOM_Prolog and HOM_Epilog pseudo instructions during a Prolog and
    Epilog Injection Pass.
  * Lowers and optimizes said pseudos in a AArchLowerHomogneousPrologEpilog Pass.
  * Outlined helpers are created on demand. Identical helpers are merged by the linker.
  * An opt-in flag is introduced to enable this feature. Another threshold flag
    is also introduced to control the aggressiveness of outlining for application's need.

This reduced an average of 4% of code size on LLVM-TestSuite/CTMark targeting arm64/-Oz.

Differential Revision: https://reviews.llvm.org/D76570
2021-02-02 14:57:26 -08:00
Puyan Lotfi 8f7f2c4211 Revert "[AArch64] Homogeneous Prolog and Epilog Size Optimization"
This reverts commit 0426be3df6.

Reverting due to some expensive-checks failures in tests.
2021-02-02 02:33:44 -05:00
Kyungwoo Lee 0426be3df6 [AArch64] Homogeneous Prolog and Epilog Size Optimization
Prologs and epilogs handle callee-save registers and tend to be irregular with
different immediate offsets that are not often handled by the MachineOutliner.
Commit D18619/a5335647d5e8 (combining stack operations) stretched irregularity
further.

This patch tries to emit homogeneous stores and loads with the same offset for
prologs and epilogs respectively. We have observed that this canonicalizes
(homogenizes) prologs and epilogs significantly and results in a greatly
increased chance of outlining, resulting in a code size reduction.

Despite the above results, there are still size wins to be had that the
MachineOutliner does not provide due to the special handling X30/LR. To handle
the LR case, his patch custom-outlines prologs and epilogs in place. It does
this by doing the following:

  * Injects HOM_Prolog and HOM_Epilog pseudo instructions during a Prolog and
    Epilog Injection Pass.
  * Lowers and optimizes said pseudos in a AArchLowerHomogneousPrologEpilog Pass.
  * Outlined helpers are created on demand. Identical helpers are merged by the linker.
  * An opt-in flag is introduced to enable this feature. Another threshold flag
    is also introduced to control the aggressiveness of outlining for application's need.

This reduced an average of 4% of code size on LLVM-TestSuite/CTMark targeting arm64/-Oz.

Differential Revision: https://reviews.llvm.org/D76570
2021-02-02 00:26:51 -05:00
Bradley Smith 42635856ed [AArch64][SVE] Allow accesses to SVE stack objects to use frame pointer
The layout of the stack frame for SVE means that using the frame pointer
rather than the stack pointer for an access to an SVE stack object
removes the need for an additional add to jump over the non-SVE objects.

Likewise the opposite is true for non-SVE stack objects.

This patch allows for the former to be done by having HasFP return true
in the presence of both SVE and non-SVE stack objects, and also fixes a
minor issue whereby the later would not be done for certain offsets.
2021-01-28 12:39:57 +00:00
Hsiangkai Wang 914e2f5a02 [NFC] Use generic name for scalable vector stack ID.
Differential Revision: https://reviews.llvm.org/D94471
2021-01-13 10:57:43 +08:00
Mark Murray af7cce2fa4 [AArch64] Add +pauth archictecture option, allowing the v8.3a pointer authentication extension.
Differential Revision: https://reviews.llvm.org/D94083
2021-01-08 13:21:11 +00:00
Jay Foad 000400ca0a Fix speling in comments. NFC. 2020-11-23 14:43:24 +00:00
Sander de Smalen d57bba7cf8 [SVE] Return StackOffset for TargetFrameLowering::getFrameIndexReference.
To accommodate frame layouts that have both fixed and scalable objects
on the stack, describing a stack location or offset using a pointer + uint64_t
is not sufficient. For this reason, we've introduced the StackOffset class,
which models both the fixed- and scalable sized offsets.

The TargetFrameLowering::getFrameIndexReference is made to return a StackOffset,
so that this can be used in other interfaces, such as to eliminate frame indices
in PEI or to emit Debug locations for variables on the stack.

This patch is purely mechanical and doesn't change the behaviour of how
the result of this function is used for fixed-sized offsets. The patch adds
various checks to assert that the offset has no scalable component, as frame
offsets with a scalable component are not yet supported in various places.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D90018
2020-11-05 11:02:18 +00:00
Sander de Smalen 73b6cb67dc [NFCI] Replace AArch64StackOffset by StackOffset.
This patch replaces the AArch64StackOffset class by the generic one
defined in TypeSize.h.

Reviewed By: david-arm

Differential Revision: https://reviews.llvm.org/D88983
2020-11-04 08:49:00 +00:00
Evgenii Stepanov 2e794a46b5 [AArch64] Stack frame reordering.
Implement stack frame reordering in the AArch64 backend.

Unlike the X86 implementation, AArch64 does not seem to benefit from
"access density" based frame reordering, mainly because it has a much
smaller variety of addressing modes, and the fact that all instructions
are 4 bytes so each frame object is either in range of an instruction
(and then the access is "free") or not (and that has a code size cost
of 4 bytes).

This change improves Memory Tagging codegen by
* Placing an object that has been chosen as the base tagged pointer of
the function at SP + 0. This saves one instruction to setup the pointer
(IRG does not have an offset immediate), and more because that object
can now be referenced without materializing its tagged address in a
scratch register.
* Placing objects that go out of scope simultaneously together. This
exposes opportunities for instruction merging in tryMergeAdjacentSTG.

Differential Revision: https://reviews.llvm.org/D72366
2020-10-15 12:50:16 -07:00
Evgenii Stepanov 2f63e57fa5 [MTE] Pin the tagged base pointer to one of the stack slots.
Summary:
Pin the tagged base pointer to one of the stack slots, and (if
necessary) rewrite tag offsets so that an object that occupies that
slot has both address and tag offsets of 0. This allows ADDG
instructions for that object to be eliminated and their uses replaced
with the tagged base pointer itself.

This optimization must be done in machine instructions and not in the IR
instrumentation pass, because referring to a stack slot through an IRG
pointer would confuse the stack coloring pass.

The optimization makes a (pretty naive) attempt to find the slot that
would benefit the most by counting the uses of stack slots in the
function.

Reviewers: ostannard, pcc

Subscribers: merge_guards_bot, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D72365
2020-10-15 12:50:16 -07:00
Martin Storsjö 7d07405761 [AArch64] Prefer prologues with sp adjustments merged into stp/ldp for WinCFI, if optimizing for size
This makes the prologue match the windows canonical layout, for
cases without a frame pointer.

This can potentially be a slower (a longer dependency chain of the
sp register, and potentially one arithmetic operation more on some
cores), but gives notable size improvements.

The previous two commits shrinks a 166 KB xdata section by 49 KB,
and if the change from this commit is enabled, it shrinks the xdata
section by another 25 KB.

In total, since the start of the recent arm64 unwind info cleanups
and optimizations (since before commit 37ef743cbf), the xdata+pdata
sections of the same test DLL has shrunk from 407 KB in total
originally, to 163 KB now.

Differential Revision: https://reviews.llvm.org/D88701
2020-10-03 21:37:22 +03:00
Martin Storsjö 890af2f003 [AArch64] Allow pairing lr with other GPRs for WinCFI
This saves one instruction per prologue/epilogue for any function with
an odd number of callee-saved GPRs, but more importantly, allows such
functions to match the packed unwind format.

Differential Revision: https://reviews.llvm.org/D88699
2020-10-03 21:37:22 +03:00
Martin Storsjö 3780a4e568 [AArch64] Match the windows canonical callee saved register order
On windows, the callee saved registers in a canonical prologue are
ordered starting from a lower register number at a lower stack
address (with the possible gap for aligning the stack at the top);
this is the opposite order that llvm normally produces.

To achieve this, reverse the order of the registers in the
assignCalleeSavedSpillSlots callback, to get the stack objects
laid out by PrologEpilogInserter in the right order, and adjust
computeCalleeSaveRegisterPairs to lay them out from the bottom up.

This allows generated prologs more often to match the format that
allows the unwind info to be written as packed info.

Differential Revision: https://reviews.llvm.org/D88677
2020-10-03 21:37:22 +03:00
Martin Storsjö afb4e0f289 [AArch64] Omit SEH directives for the epilogue if none are needed
For these cases, we already omit the prologue directives, if
(!AFI->hasStackFrame() && !windowsRequiresStackProbe && !NumBytes).

When writing the epilogue (after the prolog has been written), if
the function doesn't have the WinCFI flag set (i.e. if no prologue
was generated), assume that no epilogue will be needed either,
and don't emit any epilog start pseudo instruction. After completing
the epilogue, make sure that it actually matched the prologue.

Previously, when epilogue start/end was generated, but no prologue,
the unwind info for such functions actually was huge; 12 bytes xdata
(4 bytes header, 4 bytes for one non-folded epilogue header, 4 bytes
for padded opcodes) and 8 bytes pdata. Because the epilog consisted of
one opcode (end) but the prolog was empty (no .seh_endprologue), the
epilogue couldn't be folded into the prologue, and thus couldn't be
considered for packed form either.

On a 6.5 MB DLL with 110 KB pdata and 166 KB xdata, this gets rid of
38 KB pdata and 62 KB xdata.

Differential Revision: https://reviews.llvm.org/D88641
2020-10-02 09:12:56 +03:00
Martin Storsjö 51e74e21aa [AArch64] Remove a duplicate call to setHasWinCFI. NFCI.
The function already has a cleanup scope that calls the same whenever
the function is exited. When reading the code, seeing that this return
codepath has an explicit call while other return paths lack it is
confusing.

In the hypothetical case of a function having a prologue that
set the HasWinCFI flag in the MF, but the epilogue containing no
WinCFI instructions, the HasWinCFI flag in the MF would end up reset back
to false.

Differential Revision: https://reviews.llvm.org/D88636
2020-10-01 19:03:27 +03:00
Momchil Velikov a88c722e68 [AArch64] PAC/BTI code generation for LLVM generated functions
PAC/BTI-related codegen in the AArch64 backend is controlled by a set
of LLVM IR function attributes, added to the function by Clang, based
on command-line options and GCC-style function attributes. However,
functions, generated in the LLVM middle end (for example,
asan.module.ctor or __llvm_gcov_write_out) do not get any attributes
and the backend incorrectly does not do any PAC/BTI code generation.

This patch record the default state of PAC/BTI codegen in a set of
LLVM IR module-level attributes, based on command-line options:

* "sign-return-address", with non-zero value means generate code to
  sign return addresses (PAC-RET), zero value means disable PAC-RET.

* "sign-return-address-all", with non-zero value means enable PAC-RET
  for all functions, zero value means enable PAC-RET only for
  functions, which spill LR.

* "sign-return-address-with-bkey", with non-zero value means use B-key
  for signing, zero value mean use A-key.

This set of attributes are always added for AArch64 targets (as
opposed, for example, to interpreting a missing attribute as having a
value 0) in order to be able to check for conflicts when combining
module attributed during LTO.

Module-level attributes are overridden by function level attributes.
All the decision making about whether to not to generate PAC and/or
BTI code is factored out into AArch64FunctionInfo, there shouldn't be
any places left, other than AArch64FunctionInfo, which directly
examine PAC/BTI attributes, except AArch64AsmPrinter.cpp, which
is/will-be handled by a separate patch.

Differential Revision: https://reviews.llvm.org/D85649
2020-09-25 11:47:14 +01:00
Eli Friedman b92d084910 [AArch64][SVE] Fix frame offset calculation when d8 is saved.
If d8 is saved, the fp is not actually adjacent to the SVE
spills/allocations.  Fix the offset calculation to account for this.

Differential Revision: https://reviews.llvm.org/D88117
2020-09-23 11:33:53 -07:00
Owen Anderson 5987da8764 Revert "Revert "Reapply D70800: Fix AArch64 AAPCS frame record chain""
This reverts commit bc9a29b9ee.

The reasoning that this patch was wrong was itself incorrect
(see discussion on llvm-commits). This patch does seem to be exposing
a latent SVE code generation bug on non-public tests, which should
not block a correctness fix for public, non-SVE use cases.
2020-09-01 19:29:03 +00:00
Paul Walker bc9a29b9ee Revert "Reapply D70800: Fix AArch64 AAPCS frame record chain"
This reverts commit e9d9a61208.

This patch was previously revert by 04879086b4
with the reapplication being done after breaking the assert used to
ensure SP is always 16-byte aligned, which is a requirement of the AAPCS.

For extra context the latest patch caused runtime failures when
building with "-march=armv8-a+sve -mllvm -aarch64-sve-vector-bits-min=256".
2020-09-01 16:09:37 +01:00
Owen Anderson e9d9a61208 Reapply D70800: Fix AArch64 AAPCS frame record chain
Original Commit Message:
After the commit r368987 (rG643adb55769e) was landed, the frame record (FP and LR register)
may be placed in the middle of a stack frame if a function has both callee-saved
general-purpose registers and floating point registers. This will break the stack unwinders
that simply walk through the frame records (based on the guarantee from AAPCS64
"The Frame Pointer" section). This commit fixes the problem by adding the frame record offset.

Patch By: logan
Differential Revision: D70800
2020-08-27 17:29:41 +00:00