This reverts commit bc9a29b9ee.
The reasoning that this patch was wrong was itself incorrect
(see discussion on llvm-commits). This patch does seem to be exposing
a latent SVE code generation bug on non-public tests, which should
not block a correctness fix for public, non-SVE use cases.
This reverts commit e9d9a61208.
This patch was previously revert by 04879086b4
with the reapplication being done after breaking the assert used to
ensure SP is always 16-byte aligned, which is a requirement of the AAPCS.
For extra context the latest patch caused runtime failures when
building with "-march=armv8-a+sve -mllvm -aarch64-sve-vector-bits-min=256".
Original Commit Message:
After the commit r368987 (rG643adb55769e) was landed, the frame record (FP and LR register)
may be placed in the middle of a stack frame if a function has both callee-saved
general-purpose registers and floating point registers. This will break the stack unwinders
that simply walk through the frame records (based on the guarantee from AAPCS64
"The Frame Pointer" section). This commit fixes the problem by adding the frame record offset.
Patch By: logan
Differential Revision: D70800
Original Commit Message:
After the commit r368987 (rG643adb55769e) was landed, the frame record (FP and LR register)
may be placed in the middle of a stack frame if a function has both callee-saved
general-purpose registers and floating point registers. This will break the stack unwinders
that simply walk through the frame records (based on the guarantee from AAPCS64
"The Frame Pointer" section). This commit fixes the problem by adding the frame record offset.
Patch By: logan
When floating point callee-saved registers were used, the frame pointer would
incorrectly point to the bottom of the CSR space (containing saved floating-point
registers), rather than to the frame record.
While all frame offsets were calculated consistently, resulting in working code,
this prevented stack walkers from being about to traverse the frame list.
This fixes an issue where the restore point of callee-saves in the
function epilogues was incorrectly calculated when the basic block
consisted of only a RET instruction. This caused dealloc instructions
to be inserted in between the block of callee-save restore instructions,
rather than before it.
Reviewed By: paulwalker-arm
Differential Revision: https://reviews.llvm.org/D86099
Introduced by fd6584a220
Following similar use of casts in AsmParser.cpp, for instance - ideally
this type would use unsigned chars as they're more representative of raw
data and don't get confused around implementation defined choices of
char's signedness, but this is what it is & the signed/unsigned
conversions are (so far as I understand) safe/bit preserving in this
usage and what's intended, given the API design here.
This patch adds a CFI entry for each SVE callee saved register
that needs unwind info at an offset from the CFA. The offset is
a DWARF expression because the offset is partly scalable.
The CFI entries only cover a subset of the SVE callee-saves and
only encodes the lower 64-bits, thus implementing the lowest
common denominator ABI. Existing unwinders may support VG but
only restore the lower 64-bits.
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D84044
The CFA is calculated as (SP/FP + offset), but when there are
SVE objects on the stack the SP offset is partly scalable and
should instead be expressed as the DWARF expression:
SP + offset + scalable_offset * VG
where VG is the Vector Granule register, containing the
number of 64bits 'granules' in a scalable vector.
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D84043
While deallocating the stackframe, the offset used to reload the
callee-saved registers was not pointing to the SVE callee-saves,
but rather to the whole SVE area.
+--------------+
| GRP callee |
| saves |
+--------------+ <- FP
| SVE callee |
| saves |
+--------------+ <- Should restore SVE callee saves from here
| SVE Spills |
| and Locals |
+--------------+ <- instead of from here.
| |
: :
| |
+--------------+ <- SP
Reviewed By: paulwalker-arm
Differential Revision: https://reviews.llvm.org/D84539
Instead of aligning the last callee-saved-register slot to the stack
alignment (16 bytes), just align the SVE callee-saved block. This also
simplifies the code that allocates space for the callee-saves.
This change is needed to make sure the offset to which the callee-saved
register is spilled, corresponds to the offset used for e.g. unwind call
frame instructions.
Reviewers: efriedma, paulwalker-arm, david-arm, rengolin
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D84042
Fixed stack objects are preallocated and defined to be allocated before
any of the regular stack objects. These are normally used to model stack
arguments.
The AAPCS does not support passing SVE registers on the stack by value
(only by reference). The current layout also doesn't place them before
all stack objects, but rather before all SVE objects. Removing this
simplifies the code that emits the allocation/deallocation
around callee-saved registers (D84042).
This patch also removes all uses of fixedStack from from
framelayout-sve.mir, where this was used purely for testing purposes.
Reviewers: paulwalker-arm, efriedma, rengolin
Reviewed By: paulwalker-arm
Differential Revision: https://reviews.llvm.org/D84538
The code already supports addressing a fixed-size stack object from
the frame-pointer, by first subtracting sizeof(SVE area) from FP.
Reviewers: efriedma, cameron.mcinally, david-arm, rengolin
Reviewed By: david-arm
Differential Revision: https://reviews.llvm.org/D83125
To make sure that no barrier gets placed on the architectural execution
path, each
BLR x<N>
instruction gets transformed to a
BL __llvm_slsblr_thunk_x<N>
instruction, with __llvm_slsblr_thunk_x<N> a thunk that contains
__llvm_slsblr_thunk_x<N>:
BR x<N>
<speculation barrier>
Therefore, the BLR instruction gets split into 2; one BL and one BR.
This transformation results in not inserting a speculation barrier on
the architectural execution path.
The mitigation is off by default and can be enabled by the
harden-sls-blr subtarget feature.
As a linker is allowed to clobber X16 and X17 on function calls, the
above code transformation would not be correct in case a linker does so
when N=16 or N=17. Therefore, when the mitigation is enabled, generation
of BLR x16 or BLR x17 is avoided.
As BLRA* indirect calls are not produced by LLVM currently, this does
not aim to implement support for those.
Differential Revision: https://reviews.llvm.org/D81402
Treat it as callee-saved, and always back it up. When windows code calls
entry points in unix code, marked with the windows calling convention,
that unix code can call other functions that isn't compiled with
-ffixed-x18 which may clobber x18 freely. By backing it up and restoring
it on return, we preserve the register across the function call,
fulfilling this part of the windows calling convention on another OS.
This isn't enough for making sure that x18 is preseved when non-windows
code does a callback to windows code, but is a clear improvement over
the current status quo. Additionally, wine is nowadays building many
modules as PE DLLs, which avoids the callback issue altogether for those
DLLs.
Differential Revision: https://reviews.llvm.org/D61892
The UnwindHelp object is used during exception handling by runtime
code. It must be findable from a fixed offset from FP.
This change allocates the UnwindHelp object as a fixed object (as is
done for x86_64) to ensure that both the generated code and runtime
agree on the location of the object.
Fixes https://bugs.llvm.org/show_bug.cgi?id=45346
Differential Revision: https://reviews.llvm.org/D77016
The generated code for a funclet can have an add to sp in the epilogue
for which there is no corresponding sub in the prologue.
This patch removes the early return from emitPrologue that was
preventing the sub to sp, and instead conditionalizes the appropriate
parts of the rest of the function.
Fixes https://bugs.llvm.org/show_bug.cgi?id=45345
Differential Revision: https://reviews.llvm.org/D77015
Summary:
This is patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790
Reviewers: courbet
Subscribers: jholewinski, arsenm, dschuff, jyknight, sdardis, nemanjai, jvesely, nhaehnle, sbc100, jgravelle-google, hiraditya, aheejin, kbarton, fedor.sergeev, asb, rbar, johnrusso, simoncook, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, PkmX, jocewei, Jim, lenary, s.egerton, pzheng, sameer.abuasal, apazos, luismarques, kerbowa, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D76348
Summary:
Detect a run of memory tagging instructions for adjacent stack frame slots,
and replace them with a shorter instruction sequence
* replace STG + STG with ST2G
* replace STGloop + STGloop with STGloop
This code needs to run when stack slot offsets are already known, but before
FrameIndex operands in STG instructions are eliminated; that's the
reason for the new hook in PrologueEpilogue.
This change modifies STGloop and STZGloop pseudos to take the size as an
immediate integer operand, and adds _untied variants of those pseudos
that are allowed to take the base address as a FI operand. This is needed to
simplify recognizing an STGloop instruction as operating on a stack slot
post-regalloc.
This improves memtag code size by ~0.25%, and it looks like an additional ~0.1%
is possible by rearranging the stack frame such that consecutive STG
instructions reference adjacent slots (patch pending).
Reviewers: pcc, ostannard
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D70286
Summary:
On Windows, when a function does not have an unwind table (for example, EH
filtering funclets), we don't correctly pair FP and LR to form the frame record
in all circumstances.
Fix this by invalidating a pair when the second register is FP when compiling
for Windows, even when CFI is not needed.
Fixes PR44271 introduced by D65653.
Reviewers: efriedma, sdesmalen, rovka, rengolin, t.p.northover, thegameg, greened
Reviewed By: rengolin
Subscribers: kristof.beyls, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D71754
Summary:
Detect a run of memory tagging instructions for adjacent stack frame slots,
and replace them with a shorter instruction sequence
* replace STG + STG with ST2G
* replace STGloop + STGloop with STGloop
This code needs to run when stack slot offsets are already known, but before
FrameIndex operands in STG instructions are eliminated; that's the
reason for the new hook in PrologueEpilogue.
This change modifies STGloop and STZGloop pseudos to take the size as an
immediate integer operand, and base address as a FI operand when
possible. This is needed to simplify recognizing an STGloop instruction
as operating on a stack slot post-regalloc.
This improves memtag code size by ~0.25%, and it looks like an additional ~0.1%
is possible by rearranging the stack frame such that consecutive STG
instructions reference adjacent slots (patch pending).
Reviewers: pcc, ostannard
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D70286
The commit r369122 may keep LR and FP register (aka. frame record) in
the middle of a frame, thus we must add the offsets to ensure the FP
register always points to innermost frame record on the stack.
According to AAPCS64[1], a conforming code shall construct a linked list
of stack frames that can be traversed with frame records. This commit
is also essential to frame-pointer-based stack unwinder (e.g. the stack
unwinder in linx-perf-tools.)
[1] https://github.com/ARM-software/software-standards/blob/master/abi/aapcs64/aapcs64.rst#the-frame-pointer
Test: llvm-lit ${LLVM_SRC}/test/CodeGen/AArch64/framelayout-frame-record.ll
Test: llvm-lit ${LLVM_SRC}/test/CodeGen/AArch64
Differential Revision: https://reviews.llvm.org/D70800
The change allows clang -mno-omit-leaf-frame-pointer to disable frame
pointer elimination. This behavior matches X86 and Mips, and also GCC
AArch64.
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D71168
Summary:
This patch fixes a few issues when large arrays are allocated on the
stack. Currently, clang has inconsistent behaviour, for debug builds
there is an assertion failure when the array size on stack is around 2GB
but there is no assertion when the stack is around 8GB. For release
builds there is no assertion, the compilation succeeds but generates
incorrect code. The incorrect code generated is due to using
int/unsigned int instead of their 64-bit counterparts. This patch,
1) Removes the assertion in frame legality check.
2) Converts int/unsigned int in some places to the 64-bit variants. This
helps in generating correct code and removes the inconsistent behaviour.
3) Adds a test which runs without optimisations.
Reviewers: sdesmalen, efriedma, fhahn, aemerson
Reviewed By: efriedma
Subscribers: eli.friedman, fpetrogalli, kristof.beyls, hiraditya,
llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D70496
This caused asserts (and perhaps also miscompiles) while building for Windows
on AArch64. See the discussion on D68530 for details and reproducer.
Reverting until this can be investigated and fixed.
> For arm64, D18619 introduced the ability to combine bumping the stack pointer
> upfront in case it needs to be bumped for both the callee-save area as well as
> the local stack area.
>
> That diff already remarks that "This change can cause an increase in
> instructions", but argues that even when that happens, it should be still be a
> performance benefit because the number of micro-ops is reduced.
>
> We have observed that this code-size increase can be significant in practice.
> This diff disables combining stack bumping for methods that are marked as
> optimize-for-size.
>
> Example of a prologue with the behavior before this diff (combining stack bumping when possible):
> sub sp, sp, #0x40
> stp d9, d8, [sp, #0x10]
> stp x20, x19, [sp, #0x20]
> stp x29, x30, [sp, #0x30]
> add x29, sp, #0x30
> [... compute x8 somehow ...]
> stp x0, x8, [sp]
>
> And after this diff, if the method is marked as optimize-for-size:
> stp d9, d8, [sp, #-0x30]!
> stp x20, x19, [sp, #0x10]
> stp x29, x30, [sp, #0x20]
> add x29, sp, #0x20
> [... compute x8 somehow ...]
> stp x0, x8, [sp, #-0x10]!
>
> Note that without combining the stack bump there are two auto-decrements,
> nicely folded into the stp instructions, whereas otherwise there is a single
> sub sp, ... instruction, but not folded.
>
> Patch by Nikolai Tillmann!
>
> Differential Revision: https://reviews.llvm.org/D68530
Summary:
The insertion of most CFI instructions during AArch64 frame lowering can
be disabled (e.g. using the function attribute `nounwind`).
This patch enables conditional insertion for one more CFI instruction.
Reviewers: t.p.northover, ostannard
Reviewed By: ostannard
Subscribers: kristof.beyls, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D70129
This patch adds a target interface to set the StackID for a given type,
which allows scalable vectors (e.g. `<vscale x 16 x i8>`) to be assigned a
'sve-vec' StackID, so it is allocated in the SVE area of the stack frame.
Reviewers: ostannard, efriedma, rengolin, cameron.mcinally
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D70080
This adds a flag to LLVM and clang to always generate a .debug_frame
section, even if other debug information is not being generated. In
situations where .eh_frame would normally be emitted, both .debug_frame
and .eh_frame will be used.
Differential Revision: https://reviews.llvm.org/D67216
llvm/test/DebugInfo/MIR/X86/live-debug-values-reg-copy.mir failed with
EXPENSIVE_CHECKS enabled, causing the patch to be reverted in
rG2c496bb5309c972d59b11f05aee4782ddc087e71.
This patch relands the patch with a proper fix to the
live-debug-values-reg-copy.mir tests, by ensuring the MIR encodes the
callee-saves correctly so that the CalleeSaved info is taken from MIR
directly, rather than letting it be recalculated by the PEI pass. I've
done this by running `llc -stop-before=prologepilog` on the LLVM
IR as captured in the test files, adding the extra MOV instructions
that were manually added in the original test file, then running `llc
-run-pass=prologepilog` and finally re-added the comments for the MOV
instructions.