llvm-project

Commit Graph

Author	SHA1	Message	Date
Saleem Abdulrasool	c31f0a0050	AArch64: correct epilogue/prologue emission for swift async The prologue and epilogue emission were unbalanced in light of different strategies of async frame context emission. Adjust the epilogue emission to match the prologue emission. This makes the elision work properly as well as the deployment based. Due to the fact that the epilogue always was clearing a bit (which should not be set in the first place), the client would not notice the behavioural issue unless the deployment version was in effect.	2022-03-09 18:41:10 +00:00
Hans Wennborg	85c53c7092	Revert "[AArch64] Async unwind - function prologues" It caused builds to assert with: (StackSize == 0 && "We already have the CFA offset!"), function generateCompactUnwindEncoding, file AArch64AsmBackend.cpp, line 624. when targeting iOS. See comment on the code review for reproducer. > This patch rearranges emission of CFI instructions, so the resulting > DWARF and `.eh_frame` information is precise at every instruction. > > The current state is that the unwind info is emitted only after the > function prologue. This is fine for synchronous (e.g. C++) exceptions, > but the information is generally incorrect when the program counter is > at an instruction in the prologue or the epilogue, for example: > > ``` > stp x29, x30, [sp, #-16]! // 16-byte Folded Spill > mov x29, sp > .cfi_def_cfa w29, 16 > ... > ``` > > after the `stp` is executed the (initial) rule for the CFA still says > the CFA is in the `sp`, even though it's already offset by 16 bytes > > A correct unwind info could look like: > ``` > stp x29, x30, [sp, #-16]! // 16-byte Folded Spill > .cfi_def_cfa_offset 16 > mov x29, sp > .cfi_def_cfa w29, 16 > ... > ``` > > Having this information precise up to an instruction is useful for > sampling profilers that would like to get a stack backtrace. The end > goal (towards this patch is just a step) is to have fully working > `-fasynchronous-unwind-tables`. > > Reviewed By: danielkiss, MaskRay > > Differential Revision: https://reviews.llvm.org/D111411 This reverts commit `32e8b550e5`.	2022-03-04 17:36:26 +01:00
Sander de Smalen	7c65d2288b	[AArch64] Improve access to fixed-width object when stack has SVE. When the stack has SVE objects, fixed-width objects are often better accessed from the SP, instead of the FP, because part/all of the fixed-width offset can be folded into the (non-scalable) addressing mode, where otherwise an ADDVL would be required. Reviewed By: david-arm Differential Revision: https://reviews.llvm.org/D120738	2022-03-04 09:33:59 +00:00
Momchil Velikov	63c9aca12a	Revert "[AArch64] Async unwind - function epilogues" This reverts commit `74319d6794`. It causes test failures that look like infinite loop in asan/hwasan unwinding.	2022-03-02 15:01:57 +00:00
Momchil Velikov	74319d6794	[AArch64] Async unwind - function epilogues Counterpart of https://reviews.llvm.org/D111411 this change makes the unwind information instruction precise in function epilogues. Reviewed By: MaskRay Differential Revision: https://reviews.llvm.org/D112330	2022-03-02 13:15:11 +00:00
Momchil Velikov	32e8b550e5	[AArch64] Async unwind - function prologues This patch rearranges emission of CFI instructions, so the resulting DWARF and `.eh_frame` information is precise at every instruction. The current state is that the unwind info is emitted only after the function prologue. This is fine for synchronous (e.g. C++) exceptions, but the information is generally incorrect when the program counter is at an instruction in the prologue or the epilogue, for example: ``` stp x29, x30, [sp, #-16]! // 16-byte Folded Spill mov x29, sp .cfi_def_cfa w29, 16 ... ``` after the `stp` is executed the (initial) rule for the CFA still says the CFA is in the `sp`, even though it's already offset by 16 bytes A correct unwind info could look like: ``` stp x29, x30, [sp, #-16]! // 16-byte Folded Spill .cfi_def_cfa_offset 16 mov x29, sp .cfi_def_cfa w29, 16 ... ``` Having this information precise up to an instruction is useful for sampling profilers that would like to get a stack backtrace. The end goal (towards this patch is just a step) is to have fully working `-fasynchronous-unwind-tables`. Reviewed By: danielkiss, MaskRay Differential Revision: https://reviews.llvm.org/D111411	2022-02-28 13:37:57 +00:00
Momchil Velikov	20a093e2bc	[AArch64] Async unwind - Refactor generation of shadow call stack prologue/epilogue This patch is in preparation for the async unwind CFI. Move the emission of the shadow call stack prologue/epilogue instructions to the `emitPrologue`/`emitEpilogue`. This greatly simplifies especially epilogue generation and makes unnecessary some quite fragile code, that tries to skip over those Reviewed By: MaskRay, efriedma Differential Revision: https://reviews.llvm.org/D112329	2022-02-25 11:09:23 +00:00
Momchil Velikov	17e85cd410	[AArch64] Async unwind - Always place the first LDP at the end when ReverseCSRRestoreSeq is true This patch is in preparation for the async unwind CFI. Put the first `LDP` the end, so that the load-store optimizer can run and merge the `LDP` and the `ADD` into a post-index `LDP`. Do this always and as early as at the time of the initial creation of the CSR restore instructions, even if that `LDP` is not guaranteed to be mergeable with a subsequent `SP` increment. This greatly simplifies the CFI generation for prologue, as otherwise we have to take extra steps to ensure reordering does not cross CFI instructions. Reviewed By: danielkiss Differential Revision: https://reviews.llvm.org/D112328	2022-02-24 18:48:07 +00:00
Momchil Velikov	25e92920c9	[AArch64] Async unwind - helper functions to decide on CFI emission Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D112327	2022-02-24 18:16:50 +00:00
Jim Lin	d6b0734837	[NFC] Use Register instead of unsigned	2022-01-19 20:17:04 +08:00
Tim Northover	581e855623	AArch64: don't claim to preserve registers used by prologue code	2022-01-10 12:27:04 +00:00
Daniel Kiss	131c06e6da	Revert "[AArch64] Emit .cfi_negate_ra_state for PAC-auth instructions." This reverts commit `f903c85055`.	2022-01-06 19:17:45 +01:00
John Brawn	dc9f65be45	[AArch64][SVE] Fix handling of stack protection with SVE Fix a couple of things that were causing stack protection to not work correctly in functions that have scalable vectors on the stack: * Use TypeSize when determining if accesses to a variable are considered out-of-bounds so that the behaviour is correct for scalable vectors. * When stack protection is enabled move the stack protector location to the top of the SVE locals, so that any overflow in them (or the other locals which are below that) will be detected. Fixes: https://github.com/llvm/llvm-project/issues/51137 Differential Revision: https://reviews.llvm.org/D111631	2021-12-14 11:30:48 +00:00
Serguei Katkov	3557f49353	[AARCH64] Teach AArch64FrameLowering::getFrameIndexReferencePreferSP really prefer SP. Do more efforts to use sp if it is possible to lower a frame index. Reviewers: reames, loicottet, ostannard, t.p.northover Reviewed By: reames Subscribers: arphaman, danilaml, hiraditya, kristof.beyls, llvm-commits, Matt, yrouban Differential Revision: https://reviews.llvm.org/D111133	2021-11-19 11:14:02 +07:00
Kazu Hirata	14d656b3d8	[Target] Use llvm::reverse (NFC)	2021-11-06 13:08:21 -07:00
Daniel Kiss	f903c85055	[AArch64] Emit .cfi_negate_ra_state for PAC-auth instructions. autiasp, autibsp instructions are the counterpart of paciasp/pacibsp instructions therefore let's emit .cfi_negate_ra_state for these too. In case of Armv8.3 instruction set the retaa/retbb will do the return and authentication in one step here we can't emit the . cfi_negate_ra_state because that would be point after the ret* instruction. Reviewed By: nickdesaulniers, MaskRay Differential Revision: https://reviews.llvm.org/D111780	2021-10-20 11:03:52 +02:00
Leonard Chan	4dc462b589	[AArch64] Emit CFI instruction for updating x18 when using ShadowCallStack with exception unwinding PR45875 notes an instance where exception handling crashes on aarch64-fuchsia where SCS is enabled by default. The underlying issue seems to be that within libunwind, various _Unwind_* functions, the x18 register is not updated if a function is marked with nounwind. This removes the check for nounwind and emits the CFI instruction that updates x18. Differential Revision: https://reviews.llvm.org/D79822	2021-10-08 14:20:26 -07:00
Doug Gregor	a773db7d76	Add a command-line flag to control the Swift extended async frame info. Introduce a new command-line flag `-swift-async-fp={auto\|always\|never}` that controls how code generation sets the Swift extended async frame info bit. There are three possibilities: * `auto`: which determines how to set the bit based on deployment target, either statically or dynamically via `swift_async_extendedFramePointerFlags`. * `always`: the default, always set the bit statically, regardless of deployment target. * `never`: never set the bit, regardless of deployment target. Patch by Doug Gregor <dgregor@apple.com> Reviewed By: doug.gregor Differential Revision: https://reviews.llvm.org/D109392	2021-09-16 06:57:45 -07:00
Tim Northover	5d070c8259	SwiftAsync: use runtime-provided flag for extended frame if back-deploying When back-deploying Swift async code we can't always toggle the flag showing an extended frame is present because it will confuse unwinders on systems released before this feature. So in cases where the code might run there, we `or` in a mask provided by the runtime (as an absolute symbol) telling us whether the unwinders can cope. When deploying only for newer OSs, we can still hard-code the bit-set for greater efficiency.	2021-09-13 13:54:46 +01:00
Fangrui Song	0e03450ae4	[AArch64] Remove an uneeded !NeedsWinCFI check. NFC	2021-09-05 21:02:56 -07:00
Kyungwoo Lee	6530ea4095	[AArch64] Fix Local Deallocation for Homogeneous Prolog/Epilog The stack adjustment for local deallocation was incorrectly ported. Reviewed By: MaskRay Differential Revision: https://reviews.llvm.org/D106760	2021-07-25 10:51:11 -07:00
Pablo Barrio	571c8c5263	[AArch64][v8.3A] Avoid inserting implicit landing pads (PACISP) PACISP have the advantage that they are in HINT space, meaning they can be run successfully in hardware without PAuth support - they will just behave as a NOP. However, PACISP are also implicit landing pads (think of an extra BTI jc). Therefore, they allow indirect jumps of all kinds into them, potentially inserting new gadgets. This patch replaces PACISP by PACI* LR, SP when compiling explicitly for hardware with full PAuth support. PACI* is not in the HINT space, therefore it will fault when run in hardware without PAuth support, but it is also not a landing pad, making programs safer in newer HW. Differential Revision: https://reviews.llvm.org/D101920	2021-06-24 18:24:32 +01:00
Tim Northover	769ced3d57	AArch64: mark x22 livein if it's an async context that gets stored. This fixes a crash with expensive checks enabled (the verifier was not happy).	2021-05-17 11:56:03 +01:00
Tim Northover	82a0e808bb	IR/AArch64/X86: add "swifttailcc" calling convention. Swift's new concurrency features are going to require guaranteed tail calls so that they don't consume excessive amounts of stack space. This would normally mean "tailcc", but there are also Swift-specific ABI desires that don't naturally go along with "tailcc" so this adds another calling convention that's the combination of "swiftcc" and "tailcc". Support is added for AArch64 and X86 for now.	2021-05-17 10:48:34 +01:00
Tim Northover	ea0eec69f1	IR+AArch64: add a "swiftasync" argument attribute. This extends any frame record created in the function to include that parameter, passed in X22. The new record looks like [X22, FP, LR] in memory, and FP is stored with 0b0001 in bits 63:60 (CodeGen assumes they are 0b0000 in normal operation). The effect of this is that tools walking the stack should expect to see one of three values there: * 0b0000 => a normal, non-extended record with just [FP, LR] * 0b0001 => the extended record [X22, FP, LR] * 0b1111 => kernel space, and a non-extended record. All other values are currently reserved. If compiling for arm64e this context pointer is address-discriminated with the discriminator 0xc31a and the DB (process-specific) key. There is also an "i8** @llvm.swift.async.context.addr()" intrinsic providing front-ends access to this slot (and forcing its creation initialized to nullptr if necessary).	2021-05-14 11:43:58 +01:00
Tomas Matheson	a9968c0a33	[NFC][CodeGen] Tidy up TargetRegisterInfo stack realignment functions Currently needsStackRealignment returns false if canRealignStack returns false. This means that the behavior of needsStackRealignment does not correspond to it's name and description; a function might need stack realignment, but if it is not possible then this function returns false. Furthermore, needsStackRealignment is not virtual and therefore some backends have made use of canRealignStack to indicate whether a function needs stack realignment. This patch attempts to clarify the situation by separating them and introducing new names: - shouldRealignStack - true if there is any reason the stack should be realigned - canRealignStack - true if we are still able to realign the stack (e.g. we can still reserve/have reserved a frame pointer) - hasStackRealignment = shouldRealignStack && canRealignStack (not target customisable) Targets can now override shouldRealignStack to indicate that stack realignment is required. This change will make it easier in a future change to handle the case where we need to realign the stack but can't do so (for example when the register allocator creates an aligned spill after the frame pointer has been eliminated). Differential Revision: https://reviews.llvm.org/D98716 Change-Id: Ib9a4d21728bf9d08a545b4365418d3ffe1af4d87	2021-03-30 17:31:39 +01:00
Bradley Smith	ea834c8365	Revert "[AArch64][SVE] Allow accesses to SVE stack objects to use frame pointer" This patch introduced codegen faults. An attempt to fix this was done in https://reviews.llvm.org/D97193, but ultimately it was decided to approach this differently. This reverts commit `42635856ed`. Differential Revision: https://reviews.llvm.org/D98350	2021-03-11 13:32:35 +00:00
Oliver Stannard	8d632ca436	[ARM] Add comment explaining stack frame layout Add a comment explaining how we lay out stack frames for ARM targets, based on the existing one for AArch64. Also expand the comment to explain reserved call frames for both architectures. Differential revision: https://reviews.llvm.org/D98258	2021-03-09 15:20:32 +00:00
Amara Emerson	0146d20631	[AArch64] Do not fold SP adjustments into pre-increment addr modes if it overflows the redzone. Instead of outright disabling this completely with the noredzone attribute, we only avoid doing the optimization if there are memory operations between the adjustment and the load/store that the adjustment would be folded into. This avoids the case of something like a stack cookie being corrupted if an exception happens before the pre-increment to the SP occurs. This also prevents the folding happening if we have a redzone, but the offset being folded is above the redzone amount (128 bytes in this case). rdar://73269336 Differential Revision: https://reviews.llvm.org/D95179	2021-02-24 09:55:48 -08:00
Kyungwoo Lee	4f58b1bd29	[AArch64] Homogeneous Prolog and Epilog Size Optimization Second land attempt. MachineVerifier DefRegState expensive check errors fixed. Prologs and epilogs handle callee-save registers and tend to be irregular with different immediate offsets that are not often handled by the MachineOutliner. Commit D18619/a5335647d5e8 (combining stack operations) stretched irregularity further. This patch tries to emit homogeneous stores and loads with the same offset for prologs and epilogs respectively. We have observed that this canonicalizes (homogenizes) prologs and epilogs significantly and results in a greatly increased chance of outlining, resulting in a code size reduction. Despite the above results, there are still size wins to be had that the MachineOutliner does not provide due to the special handling X30/LR. To handle the LR case, his patch custom-outlines prologs and epilogs in place. It does this by doing the following: * Injects HOM_Prolog and HOM_Epilog pseudo instructions during a Prolog and Epilog Injection Pass. * Lowers and optimizes said pseudos in a AArchLowerHomogneousPrologEpilog Pass. * Outlined helpers are created on demand. Identical helpers are merged by the linker. * An opt-in flag is introduced to enable this feature. Another threshold flag is also introduced to control the aggressiveness of outlining for application's need. This reduced an average of 4% of code size on LLVM-TestSuite/CTMark targeting arm64/-Oz. Differential Revision: https://reviews.llvm.org/D76570	2021-02-02 14:57:26 -08:00
Puyan Lotfi	8f7f2c4211	Revert "[AArch64] Homogeneous Prolog and Epilog Size Optimization" This reverts commit `0426be3df6`. Reverting due to some expensive-checks failures in tests.	2021-02-02 02:33:44 -05:00
Kyungwoo Lee	0426be3df6	[AArch64] Homogeneous Prolog and Epilog Size Optimization Prologs and epilogs handle callee-save registers and tend to be irregular with different immediate offsets that are not often handled by the MachineOutliner. Commit D18619/a5335647d5e8 (combining stack operations) stretched irregularity further. This patch tries to emit homogeneous stores and loads with the same offset for prologs and epilogs respectively. We have observed that this canonicalizes (homogenizes) prologs and epilogs significantly and results in a greatly increased chance of outlining, resulting in a code size reduction. Despite the above results, there are still size wins to be had that the MachineOutliner does not provide due to the special handling X30/LR. To handle the LR case, his patch custom-outlines prologs and epilogs in place. It does this by doing the following: * Injects HOM_Prolog and HOM_Epilog pseudo instructions during a Prolog and Epilog Injection Pass. * Lowers and optimizes said pseudos in a AArchLowerHomogneousPrologEpilog Pass. * Outlined helpers are created on demand. Identical helpers are merged by the linker. * An opt-in flag is introduced to enable this feature. Another threshold flag is also introduced to control the aggressiveness of outlining for application's need. This reduced an average of 4% of code size on LLVM-TestSuite/CTMark targeting arm64/-Oz. Differential Revision: https://reviews.llvm.org/D76570	2021-02-02 00:26:51 -05:00
Bradley Smith	42635856ed	[AArch64][SVE] Allow accesses to SVE stack objects to use frame pointer The layout of the stack frame for SVE means that using the frame pointer rather than the stack pointer for an access to an SVE stack object removes the need for an additional add to jump over the non-SVE objects. Likewise the opposite is true for non-SVE stack objects. This patch allows for the former to be done by having HasFP return true in the presence of both SVE and non-SVE stack objects, and also fixes a minor issue whereby the later would not be done for certain offsets.	2021-01-28 12:39:57 +00:00
Hsiangkai Wang	914e2f5a02	[NFC] Use generic name for scalable vector stack ID. Differential Revision: https://reviews.llvm.org/D94471	2021-01-13 10:57:43 +08:00
Mark Murray	af7cce2fa4	[AArch64] Add +pauth archictecture option, allowing the v8.3a pointer authentication extension. Differential Revision: https://reviews.llvm.org/D94083	2021-01-08 13:21:11 +00:00
Jay Foad	000400ca0a	Fix speling in comments. NFC.	2020-11-23 14:43:24 +00:00
Sander de Smalen	d57bba7cf8	[SVE] Return StackOffset for TargetFrameLowering::getFrameIndexReference. To accommodate frame layouts that have both fixed and scalable objects on the stack, describing a stack location or offset using a pointer + uint64_t is not sufficient. For this reason, we've introduced the StackOffset class, which models both the fixed- and scalable sized offsets. The TargetFrameLowering::getFrameIndexReference is made to return a StackOffset, so that this can be used in other interfaces, such as to eliminate frame indices in PEI or to emit Debug locations for variables on the stack. This patch is purely mechanical and doesn't change the behaviour of how the result of this function is used for fixed-sized offsets. The patch adds various checks to assert that the offset has no scalable component, as frame offsets with a scalable component are not yet supported in various places. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D90018	2020-11-05 11:02:18 +00:00
Sander de Smalen	73b6cb67dc	[NFCI] Replace AArch64StackOffset by StackOffset. This patch replaces the AArch64StackOffset class by the generic one defined in TypeSize.h. Reviewed By: david-arm Differential Revision: https://reviews.llvm.org/D88983	2020-11-04 08:49:00 +00:00
Evgenii Stepanov	2e794a46b5	[AArch64] Stack frame reordering. Implement stack frame reordering in the AArch64 backend. Unlike the X86 implementation, AArch64 does not seem to benefit from "access density" based frame reordering, mainly because it has a much smaller variety of addressing modes, and the fact that all instructions are 4 bytes so each frame object is either in range of an instruction (and then the access is "free") or not (and that has a code size cost of 4 bytes). This change improves Memory Tagging codegen by * Placing an object that has been chosen as the base tagged pointer of the function at SP + 0. This saves one instruction to setup the pointer (IRG does not have an offset immediate), and more because that object can now be referenced without materializing its tagged address in a scratch register. * Placing objects that go out of scope simultaneously together. This exposes opportunities for instruction merging in tryMergeAdjacentSTG. Differential Revision: https://reviews.llvm.org/D72366	2020-10-15 12:50:16 -07:00
Evgenii Stepanov	2f63e57fa5	[MTE] Pin the tagged base pointer to one of the stack slots. Summary: Pin the tagged base pointer to one of the stack slots, and (if necessary) rewrite tag offsets so that an object that occupies that slot has both address and tag offsets of 0. This allows ADDG instructions for that object to be eliminated and their uses replaced with the tagged base pointer itself. This optimization must be done in machine instructions and not in the IR instrumentation pass, because referring to a stack slot through an IRG pointer would confuse the stack coloring pass. The optimization makes a (pretty naive) attempt to find the slot that would benefit the most by counting the uses of stack slots in the function. Reviewers: ostannard, pcc Subscribers: merge_guards_bot, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D72365	2020-10-15 12:50:16 -07:00
Martin Storsjö	7d07405761	[AArch64] Prefer prologues with sp adjustments merged into stp/ldp for WinCFI, if optimizing for size This makes the prologue match the windows canonical layout, for cases without a frame pointer. This can potentially be a slower (a longer dependency chain of the sp register, and potentially one arithmetic operation more on some cores), but gives notable size improvements. The previous two commits shrinks a 166 KB xdata section by 49 KB, and if the change from this commit is enabled, it shrinks the xdata section by another 25 KB. In total, since the start of the recent arm64 unwind info cleanups and optimizations (since before commit `37ef743cbf`), the xdata+pdata sections of the same test DLL has shrunk from 407 KB in total originally, to 163 KB now. Differential Revision: https://reviews.llvm.org/D88701	2020-10-03 21:37:22 +03:00
Martin Storsjö	890af2f003	[AArch64] Allow pairing lr with other GPRs for WinCFI This saves one instruction per prologue/epilogue for any function with an odd number of callee-saved GPRs, but more importantly, allows such functions to match the packed unwind format. Differential Revision: https://reviews.llvm.org/D88699	2020-10-03 21:37:22 +03:00
Martin Storsjö	3780a4e568	[AArch64] Match the windows canonical callee saved register order On windows, the callee saved registers in a canonical prologue are ordered starting from a lower register number at a lower stack address (with the possible gap for aligning the stack at the top); this is the opposite order that llvm normally produces. To achieve this, reverse the order of the registers in the assignCalleeSavedSpillSlots callback, to get the stack objects laid out by PrologEpilogInserter in the right order, and adjust computeCalleeSaveRegisterPairs to lay them out from the bottom up. This allows generated prologs more often to match the format that allows the unwind info to be written as packed info. Differential Revision: https://reviews.llvm.org/D88677	2020-10-03 21:37:22 +03:00
Martin Storsjö	afb4e0f289	[AArch64] Omit SEH directives for the epilogue if none are needed For these cases, we already omit the prologue directives, if (!AFI->hasStackFrame() && !windowsRequiresStackProbe && !NumBytes). When writing the epilogue (after the prolog has been written), if the function doesn't have the WinCFI flag set (i.e. if no prologue was generated), assume that no epilogue will be needed either, and don't emit any epilog start pseudo instruction. After completing the epilogue, make sure that it actually matched the prologue. Previously, when epilogue start/end was generated, but no prologue, the unwind info for such functions actually was huge; 12 bytes xdata (4 bytes header, 4 bytes for one non-folded epilogue header, 4 bytes for padded opcodes) and 8 bytes pdata. Because the epilog consisted of one opcode (end) but the prolog was empty (no .seh_endprologue), the epilogue couldn't be folded into the prologue, and thus couldn't be considered for packed form either. On a 6.5 MB DLL with 110 KB pdata and 166 KB xdata, this gets rid of 38 KB pdata and 62 KB xdata. Differential Revision: https://reviews.llvm.org/D88641	2020-10-02 09:12:56 +03:00
Martin Storsjö	51e74e21aa	[AArch64] Remove a duplicate call to setHasWinCFI. NFCI. The function already has a cleanup scope that calls the same whenever the function is exited. When reading the code, seeing that this return codepath has an explicit call while other return paths lack it is confusing. In the hypothetical case of a function having a prologue that set the HasWinCFI flag in the MF, but the epilogue containing no WinCFI instructions, the HasWinCFI flag in the MF would end up reset back to false. Differential Revision: https://reviews.llvm.org/D88636	2020-10-01 19:03:27 +03:00
Momchil Velikov	a88c722e68	[AArch64] PAC/BTI code generation for LLVM generated functions PAC/BTI-related codegen in the AArch64 backend is controlled by a set of LLVM IR function attributes, added to the function by Clang, based on command-line options and GCC-style function attributes. However, functions, generated in the LLVM middle end (for example, asan.module.ctor or __llvm_gcov_write_out) do not get any attributes and the backend incorrectly does not do any PAC/BTI code generation. This patch record the default state of PAC/BTI codegen in a set of LLVM IR module-level attributes, based on command-line options: * "sign-return-address", with non-zero value means generate code to sign return addresses (PAC-RET), zero value means disable PAC-RET. * "sign-return-address-all", with non-zero value means enable PAC-RET for all functions, zero value means enable PAC-RET only for functions, which spill LR. * "sign-return-address-with-bkey", with non-zero value means use B-key for signing, zero value mean use A-key. This set of attributes are always added for AArch64 targets (as opposed, for example, to interpreting a missing attribute as having a value 0) in order to be able to check for conflicts when combining module attributed during LTO. Module-level attributes are overridden by function level attributes. All the decision making about whether to not to generate PAC and/or BTI code is factored out into AArch64FunctionInfo, there shouldn't be any places left, other than AArch64FunctionInfo, which directly examine PAC/BTI attributes, except AArch64AsmPrinter.cpp, which is/will-be handled by a separate patch. Differential Revision: https://reviews.llvm.org/D85649	2020-09-25 11:47:14 +01:00
Eli Friedman	b92d084910	[AArch64][SVE] Fix frame offset calculation when d8 is saved. If d8 is saved, the fp is not actually adjacent to the SVE spills/allocations. Fix the offset calculation to account for this. Differential Revision: https://reviews.llvm.org/D88117	2020-09-23 11:33:53 -07:00
Owen Anderson	5987da8764	Revert "Revert "Reapply D70800: Fix AArch64 AAPCS frame record chain"" This reverts commit `bc9a29b9ee`. The reasoning that this patch was wrong was itself incorrect (see discussion on llvm-commits). This patch does seem to be exposing a latent SVE code generation bug on non-public tests, which should not block a correctness fix for public, non-SVE use cases.	2020-09-01 19:29:03 +00:00
Paul Walker	bc9a29b9ee	Revert "Reapply D70800: Fix AArch64 AAPCS frame record chain" This reverts commit `e9d9a61208`. This patch was previously revert by `04879086b4` with the reapplication being done after breaking the assert used to ensure SP is always 16-byte aligned, which is a requirement of the AAPCS. For extra context the latest patch caused runtime failures when building with "-march=armv8-a+sve -mllvm -aarch64-sve-vector-bits-min=256".	2020-09-01 16:09:37 +01:00
Owen Anderson	e9d9a61208	Reapply D70800: Fix AArch64 AAPCS frame record chain Original Commit Message: After the commit r368987 (rG643adb55769e) was landed, the frame record (FP and LR register) may be placed in the middle of a stack frame if a function has both callee-saved general-purpose registers and floating point registers. This will break the stack unwinders that simply walk through the frame records (based on the guarantee from AAPCS64 "The Frame Pointer" section). This commit fixes the problem by adding the frame record offset. Patch By: logan Differential Revision: D70800	2020-08-27 17:29:41 +00:00

1 2 3 4 5 ...

255 Commits