llvm-project

Commit Graph

Author	SHA1	Message	Date
Guozhi Wei	ddc9e8861c	[MachineCombiner, AArch64] Add a new pattern A-(B+C) => (A-B)-C to reduce latency Add a new pattern A - (B + C) ==> (A - B) - C to give machine combiner a chance to evaluate which instruction sequence has lower latency. Differential Revision: https://reviews.llvm.org/D124564	2022-06-28 21:42:51 +00:00
Serguei Katkov	163c77b2e0	[AARCH64 folding] Do not fold any copy with NZCV There is no instruction to fold NZCV, so, just do not do it. Without the fix the added test case crashes with an assert "Mismatched register size in non subreg COPY" Reviewed By: danilaml Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D127294	2022-06-21 10:38:49 +07:00
Kazu Hirata	129b531c9c	[llvm] Use value_or instead of getValueOr (NFC)	2022-06-18 23:07:11 -07:00
zhongyunde	c42a225545	[MachineScheduler] Order more stores by ascending address According D125377, we order STP Q's by ascending address. While on some targets, paired 128 bit loads and stores are slow, so the STP will split into STRQ and STUR, so I hope these stores will also be ordered. Also add subtarget feature ascend-store-address to control the aggressive order. Reviewed By: dmgreen, fhahn Differential Revision: https://reviews.llvm.org/D126700	2022-06-13 17:33:50 +08:00
Eli Friedman	0ff51d5dde	Fix interaction of CFI instructions with MachineOutliner. 1. When checking if a candidate contains a CFI instruction, actually iterate over all of the instructions, instead of stopping halfway through. 2. Make sure copied CFI directives refer to the correct instruction. Fixes https://github.com/llvm/llvm-project/issues/55842 Differential Revision: https://reviews.llvm.org/D126930	2022-06-10 13:37:49 -07:00
Kazu Hirata	3b9707dbc0	[llvm] Convert for_each to range-based for loops (NFC)	2022-06-05 12:07:14 -07:00
Sander de Smalen	9c38fc111b	[AArch64] Remove references to Streaming SVE from target features. Following discussion on D120261 and D121208 it seems better to remove the concept of Streaming SVE from the subtarget/assembler predicates and instead reason about 'SVE' and 'SME' as its higher level features, rather than trying to model this runtime mode through explicit feature flags. This patch is largely NFC. Reviewed By: paulwalker-arm, david-arm Differential Revision: https://reviews.llvm.org/D125977	2022-05-31 16:25:01 +02:00
David Green	5cb14dc5a3	[AArch64] Look through copy in MachineCombiner FMUL patterns. This is a small addition to D99662, which added machine combiner patterns for FMUL(DUP(..)). Due to the way these are generated from ISel, they may also be FMUL(COPY(DUP(..))), which this patch now ignores the no-op COPY in. Differential Revision: https://reviews.llvm.org/D126632	2022-05-31 09:28:00 +01:00
Daniel Kiss	de07cde67b	[AArch64] Emit .cfi_negate_ra_state for PAC-auth instructions. autiasp, autibsp instructions are the counterpart of paciasp/pacibsp instructions therefore let's emit .cfi_negate_ra_state for these too. In case of Armv8.3 instruction set the retaa/retbb will do the return and authentication in one step here we can't emit the . cfi_negate_ra_state because that would be point after the ret* instruction. Reviewed By: nickdesaulniers, MaskRay Differential Revision: https://reviews.llvm.org/D111780	2022-04-22 13:25:57 +02:00
Momchil Velikov	d0ea42a7c1	[AArch64] Async unwind - function epilogues Reviewed By: MaskRay, chill Differential Revision: https://reviews.llvm.org/D112330	2022-04-12 16:50:50 +01:00
Momchil Velikov	50a97aacac	[AArch64] Async unwind - function prologues Re-commit of `32e8b550e5` This patch rearranges emission of CFI instructions, so the resulting DWARF and `.eh_frame` information is precise at every instruction. The current state is that the unwind info is emitted only after the function prologue. This is fine for synchronous (e.g. C++) exceptions, but the information is generally incorrect when the program counter is at an instruction in the prologue or the epilogue, for example: ``` stp x29, x30, [sp, #-16]! // 16-byte Folded Spill mov x29, sp .cfi_def_cfa w29, 16 ... ``` after the `stp` is executed the (initial) rule for the CFA still says the CFA is in the `sp`, even though it's already offset by 16 bytes A correct unwind info could look like: ``` stp x29, x30, [sp, #-16]! // 16-byte Folded Spill .cfi_def_cfa_offset 16 mov x29, sp .cfi_def_cfa w29, 16 ... ``` Having this information precise up to an instruction is useful for sampling profilers that would like to get a stack backtrace. The end goal (towards this patch is just a step) is to have fully working `-fasynchronous-unwind-tables`. Reviewed By: danielkiss, MaskRay Differential Revision: https://reviews.llvm.org/D111411	2022-03-24 16:16:44 +00:00
Shengchen Kan	37b378386e	[NFC][CodeGen] Rename some functions in MachineInstr.h and remove duplicated comments	2022-03-16 20:25:42 +08:00
Hans Wennborg	85c53c7092	Revert "[AArch64] Async unwind - function prologues" It caused builds to assert with: (StackSize == 0 && "We already have the CFA offset!"), function generateCompactUnwindEncoding, file AArch64AsmBackend.cpp, line 624. when targeting iOS. See comment on the code review for reproducer. > This patch rearranges emission of CFI instructions, so the resulting > DWARF and `.eh_frame` information is precise at every instruction. > > The current state is that the unwind info is emitted only after the > function prologue. This is fine for synchronous (e.g. C++) exceptions, > but the information is generally incorrect when the program counter is > at an instruction in the prologue or the epilogue, for example: > > ``` > stp x29, x30, [sp, #-16]! // 16-byte Folded Spill > mov x29, sp > .cfi_def_cfa w29, 16 > ... > ``` > > after the `stp` is executed the (initial) rule for the CFA still says > the CFA is in the `sp`, even though it's already offset by 16 bytes > > A correct unwind info could look like: > ``` > stp x29, x30, [sp, #-16]! // 16-byte Folded Spill > .cfi_def_cfa_offset 16 > mov x29, sp > .cfi_def_cfa w29, 16 > ... > ``` > > Having this information precise up to an instruction is useful for > sampling profilers that would like to get a stack backtrace. The end > goal (towards this patch is just a step) is to have fully working > `-fasynchronous-unwind-tables`. > > Reviewed By: danielkiss, MaskRay > > Differential Revision: https://reviews.llvm.org/D111411 This reverts commit `32e8b550e5`.	2022-03-04 17:36:26 +01:00
Cullen Rhodes	e4fa8291a2	[AArch64] Allow copying of SVE registers in Streaming SVE Reviewed By: sdesmalen Differential Revision: https://reviews.llvm.org/D118562	2022-03-03 09:51:14 +00:00
Momchil Velikov	63c9aca12a	Revert "[AArch64] Async unwind - function epilogues" This reverts commit `74319d6794`. It causes test failures that look like infinite loop in asan/hwasan unwinding.	2022-03-02 15:01:57 +00:00
Momchil Velikov	74319d6794	[AArch64] Async unwind - function epilogues Counterpart of https://reviews.llvm.org/D111411 this change makes the unwind information instruction precise in function epilogues. Reviewed By: MaskRay Differential Revision: https://reviews.llvm.org/D112330	2022-03-02 13:15:11 +00:00
Momchil Velikov	32e8b550e5	[AArch64] Async unwind - function prologues This patch rearranges emission of CFI instructions, so the resulting DWARF and `.eh_frame` information is precise at every instruction. The current state is that the unwind info is emitted only after the function prologue. This is fine for synchronous (e.g. C++) exceptions, but the information is generally incorrect when the program counter is at an instruction in the prologue or the epilogue, for example: ``` stp x29, x30, [sp, #-16]! // 16-byte Folded Spill mov x29, sp .cfi_def_cfa w29, 16 ... ``` after the `stp` is executed the (initial) rule for the CFA still says the CFA is in the `sp`, even though it's already offset by 16 bytes A correct unwind info could look like: ``` stp x29, x30, [sp, #-16]! // 16-byte Folded Spill .cfi_def_cfa_offset 16 mov x29, sp .cfi_def_cfa w29, 16 ... ``` Having this information precise up to an instruction is useful for sampling profilers that would like to get a stack backtrace. The end goal (towards this patch is just a step) is to have fully working `-fasynchronous-unwind-tables`. Reviewed By: danielkiss, MaskRay Differential Revision: https://reviews.llvm.org/D111411	2022-02-28 13:37:57 +00:00
Momchil Velikov	25e92920c9	[AArch64] Async unwind - helper functions to decide on CFI emission Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D112327	2022-02-24 18:16:50 +00:00
Momchil Velikov	fd7e59f0e7	[AArch64] Async unwind - do not schedule frame setup/destroy The PostRA scheduler can reorder non-CFI instructions in a way that makes the unwind info not instruction precise. Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D112326	2022-02-24 17:24:04 +00:00
Jessica Paquette	68c718c8f4	Revert "[MachineOutliner][AArch64] NFC: Split MBBs into "outlinable ranges"" This reverts commit `d97f997eb7`. This commit was not NFC. (See: https://reviews.llvm.org/rGd97f997eb79d91b2872ac13619f49cb3a7120781)	2022-02-23 10:35:52 -08:00
Jessica Paquette	d97f997eb7	[MachineOutliner][AArch64] NFC: Split MBBs into "outlinable ranges" We found a case in the Swift benchmarks where the MachineOutliner introduces about a 20% compile time overhead in comparison to building without the MachineOutliner. The origin of this slowdown is that the benchmark has long blocks which incur lots of LRU checks for lots of candidates. Imagine a case like this: ``` bb: i1 i2 i3 ... i123456 ``` Now imagine that all of the outlining candidates appear early in the block, and that something like, say, NZCV is defined at the end of the block. The outliner has to check liveness for certain registers across all candidates, because outlining from areas where those registers are used is unsafe at call boundaries. This is fairly wasteful because in the previously-described case, the outlining candidates will never appear in an area where those registers are live. To avoid this, precalculate areas where we will consider outlining from. Anything outside of these areas is mapped to illegal and not included in the outlining search space. This allows us to reduce the size of the outliner's suffix tree as well, giving us a potential memory win. By precalculating areas, we can also optimize other checks too, like whether or not LR is live across an outlining candidate. Doing all of this is about a 16% compile time improvement on the case. This is likely useful for other targets (e.g. ARM + RISCV) as well, but for now, this only implements the AArch64 path. The original "is the MBB safe" method still works as before.	2022-02-21 15:29:16 -08:00
Micah Weston	c69af70f02	[AArch64] Adds SUBS and ADDS instructions to the MIPeepholeOpt. Implements ADDS/SUBS 24-bit immediate optimization using the MIPeepholeOpt pass. This follows the pattern: Optimize ([adds\|subs] r, imm) -> ([ADDS\|SUBS] ([ADD\|SUB] r, #imm0, lsl #12), #imm1), if imm == (imm0<<12)+imm1. and both imm0 and imm1 are non-zero 12-bit unsigned integers. Optimize ([adds\|subs] r, imm) -> ([SUBS\|ADDS] ([SUB\|ADD] r, #imm0, lsl #12), #imm1), if imm == -(imm0<<12)-imm1, and both imm0 and imm1 are non-zero 12-bit unsigned integers. The SplitAndOpcFunc type had to change the return type to an Opcode pair so that the first add/sub is the regular instruction and the second is the flag setting instruction. This required updating the code in the AND case. Testing: I ran a two stage bootstrap with this code. Using the second stage compiler, I verified that the negation of an ADDS to SUBS or vice versa is a valid optimization. Example V == -0x111111. Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D118663	2022-02-19 15:35:53 +00:00
Kerry McLaughlin	fc1b21228e	[AArch64][SVE] Add structured load/store opcodes to getMemOpInfo Currently, loading from or storing to a stack location with a structured load or store crashes in isAArch64FrameOffsetLegal as the opcodes are not handled by getMemOpInfo. This patch adds the opcodes for structured load/store instructions with an immediate index to getMemOpInfo & getLoadStoreImmIdx, setting appropriate values for the scale, width & min/max offsets. Reviewed By: sdesmalen, david-arm Differential Revision: https://reviews.llvm.org/D119338	2022-02-17 17:09:17 +00:00
Pavel Kosov	f3809b20f2	[AArch64][SchedModels] Handle virtual registers in FP/NEON predicates Current implementation of Check[HSDQ]Form predicates doesn’t handle virtual registers and therefore isn’t useful for pre-RA scheduling. Patch fixes this implementing two function predicates: CheckQForm for checking that instruction writes 128-bit NEON register and CheckFpOrNEON which checks that instruction writes FP register (any width). The latter supersedes Check[HSD]Form predicates which are not used individually. OS Laboratory. Huawei Russian Research Institute. Saint-Petersburg Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D114642	2022-02-17 13:41:05 +03:00
Jessica Paquette	6d58f4ab07	[MachineOutliner] NFC: Hide LRU-related stuff behind helper functions It's not particularly user-friendly to have to call `initLRU` everywhere. Also, it wasn't particularly great that the LRU for registers used in a sequence was also initialized by `initLRU`. This patch hides this stuff behind some helper functions: * `isAvailableAcrossAndOutOfSeq` * `isAnyUnavailableAcrossOrOutOfSeq` * `isAvailableInsideSeq` This allows the user to avoid calling `initLRU` explicitly. Also, it allows us to separate initializing the used-in-sequence LRU from the main LRU. Since both ARM and AArch64 check LR liveness in `insertOutlinedCall`, this refactor requires that we de-const the Candidate there. Some other quality-of-code improvements: * LRUs in outliner::Candidate now have more descriptive names * Use `Register` instead of `unsigned` in some places * Improve readability in some places by using ranges rather than `std::for_each` This is a preparatory commit for a larger compile time related change for the AArch64 outliner.	2022-02-16 11:39:07 -08:00
Nathan Chancellor	22eb1dae3f	Revert "[AArch64] Adds SUBS and ADDS instructions to the MIPeepholeOpt." This reverts commit `af45d0fd94`. This causes assertions failures when compiling the Linux kernel. See https://reviews.llvm.org/D118663 for a reduced reproducer.	2022-02-13 10:40:23 -07:00
Micah Weston	af45d0fd94	[AArch64] Adds SUBS and ADDS instructions to the MIPeepholeOpt. Implements ADDS/SUBS 24-bit immediate optimization using the MIPeepholeOpt pass. This follows the pattern: Optimize ([adds\|subs] r, imm) -> ([ADDS\|SUBS] ([ADD\|SUB] r, #imm0, lsl #12), #imm1), if imm == (imm0<<12)+imm1. and both imm0 and imm1 are non-zero 12-bit unsigned integers. Optimize ([adds\|subs] r, imm) -> ([SUBS\|ADDS] ([SUB\|ADD] r, #imm0, lsl #12), #imm1), if imm == -(imm0<<12)-imm1, and both imm0 and imm1 are non-zero 12-bit unsigned integers. The SplitAndOpcFunc type had to change the return type to an Opcode pair so that the first add/sub is the regular instruction and the second is the flag setting instruction. This required updating the code in the AND case. Testing: I ran a two stage bootstrap with this code. Using the second stage compiler, I verified that the negation of an ADDS to SUBS or vice versa is a valid optimization. Example V == -0x111111. Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D118663	2022-02-12 03:13:14 +00:00
John Brawn	94843ea7d7	[AArch64] Make machine combiner patterns preserve MIFlags This is mainly done so that we don't lose the nofpexcept flag once we start emitting it. Differential Revision: https://reviews.llvm.org/D118621	2022-02-03 11:58:59 +00:00
tyb0807	dd88f40c80	[AArch64] Make getInstSizeInBytes() use instruction size from InstrInfo.td Currently, AArch64InstrInfo::getInstSizeInBytes() uses hard-coded instruction size for some pseudo-instructions, while this information should ideally be found in AArch64InstrInfo.td file (which can be accessed via MCInstrDesc). Hence, the .td file should be updated and no hard-coded instruction sizes should be used by getInstSizeInBytes() anymore. Differential Revision: https://reviews.llvm.org/D117970	2022-02-01 10:39:14 +00:00
tyb0807	78fd413cf7	[AArch64][GlobalISel] CodeGen for Armv8.8/9.3 MOPS This implements codegen for Armv8.8/9.3 Memory Operations extension (MOPS). Any memcpy/memset/memmov intrinsics will always be emitted as a series of three consecutive instructions P, M and E which perform the operation. The SelectionDAG implementation is split into a separate patch. AArch64LegalizerInfo will now consider the following generic opcodes if +mops is available, instead of legalising by expanding them to libcalls: G_BZERO, G_MEMCPY_INLINE, G_MEMCPY, G_MEMMOVE, G_MEMSET The s8 value of memset is legalised to s64 to match the pseudos. AArch64O0PreLegalizerCombinerInfo will still be able to combine G_MEMCPY_INLINE even if +mops is present, as it is unclear whether it is better to generate fixed length copies or MOPS instructions for the inline code of small or zero-sized memory operations, so we choose to be conservative for now. AArch64InstructionSelector will select the above as new pseudo instructions: AArch64::MOPSMemory{Copy/Move/Set/SetTagging} These are each expanded to a series of three instructions (e.g. SETP/SETM/SETE) which must be emitted together during code emission to avoid scheduler reordering. This is part 3/4 of a series of patches split from https://reviews.llvm.org/D117405 to facilitate reviewing. Patch by Tomas Matheson and Son Tuan Vu Differential Revision: https://reviews.llvm.org/D117763	2022-01-31 20:54:41 +00:00
Vitaly Buka	bd9ae596d8	Initialize ExtAddrMode::Scale Detected by msan with: -Xclang -enable-noundef-analysis -mllvm -msan-eager-checks=1 Differential Revision: https://reviews.llvm.org/D116830	2022-01-07 15:21:22 -08:00
Daniel Kiss	131c06e6da	Revert "[AArch64] Emit .cfi_negate_ra_state for PAC-auth instructions." This reverts commit `f903c85055`.	2022-01-06 19:17:45 +01:00
Fangrui Song	a6a07a514b	[MachineOutliner] Don't outline functions starting with PATCHABLE_FUNCTION_ENTER/FENTRL_CALL MachineOutliner may outline a "patchable-function-entry" function whose body has a TargetOpcode::PATCHABLE_FUNCTION_ENTER MachineInstr. This is incorrect because the special code sequence must stay unchanged to be used at run-time. Avoid outlining PATCHABLE_FUNCTION_ENTER. While here, avoid outlining FENTRY_CALL too (which doesn't reproduce currently) to allow phase ordering flexibility. Fixes #52635 Reviewed By: paquette Differential Revision: https://reviews.llvm.org/D115614	2021-12-13 13:24:29 -08:00
Andrew Savonichev	e201232ece	[NFC][AArch64] Handle processLogicalImmediate error If processLogicalImmediate fails, we should return from the function without changing InsInstrs or DelInstrs. This happens for CodeGen/AArch64/urem-seteq-nonzero.ll LIT test as described in https://reviews.llvm.org/D99662#2662296. Callers of genAlternativeCodeSequence skip patterns where InsInstrs stays empty, so this does not cause any issues now. Differential Revision: https://reviews.llvm.org/D100047	2021-11-10 16:57:24 +03:00
Alexander Shaposhnikov	b705e13341	[CodeGen][Outliner] Clean up dead code Clean up dead code in X86InstrInfo.cpp and AArch64InstrInfo.cpp Test plan: make check-all Differential revision: https://reviews.llvm.org/D111151	2021-11-09 21:39:38 +00:00
Andrew Savonichev	b702276ad0	[AArch64] Add Machine InstCombiner patterns for FMUL indexed variant This patch adds DUP+FMUL => FMUL_indexed pattern to InstCombiner. FMUL_indexed is normally selected during instruction selection, but it does not work in cases when VDUP and VMUL are in different basic blocks. Differential Revision: https://reviews.llvm.org/D99662	2021-11-09 15:30:19 +03:00
Bradley Smith	86972f1114	[AArch64][SVE] Use TargetFrameIndex in more SVE load/store addressing modes Add support for generating TargetFrameIndex in complex patterns for indexed addressing modes in SVE. Additionally, add missing load/stores to getMemOpInfo and getLoadStoreImmIdx. Differential Revision: https://reviews.llvm.org/D112617	2021-10-29 14:44:16 +00:00
Danila Malyutin	2d9ee590b6	[AArch64] Handle ST1iN instructions in isAArch64FrameOffsetLegal Before the code would crash with "unhandled opcode in isAArch64FrameOffsetLegal" when there was a spill from extractelement. Fixes pr52249 Differential Revision: https://reviews.llvm.org/D112311	2021-10-25 17:05:12 +03:00
Daniel Kiss	f903c85055	[AArch64] Emit .cfi_negate_ra_state for PAC-auth instructions. autiasp, autibsp instructions are the counterpart of paciasp/pacibsp instructions therefore let's emit .cfi_negate_ra_state for these too. In case of Armv8.3 instruction set the retaa/retbb will do the return and authentication in one step here we can't emit the . cfi_negate_ra_state because that would be point after the ret* instruction. Reviewed By: nickdesaulniers, MaskRay Differential Revision: https://reviews.llvm.org/D111780	2021-10-20 11:03:52 +02:00
Jessica Paquette	22a64d4a14	[MachineOutliner][AArch64] Ensure LR is live-in when inserting reg-save calls Similar to other code which handles creating the function frame. If LR isn't live-in to the block that we're inserting the call into, we'll get a MachineVerifier error.	2021-09-08 17:44:27 -07:00
Nikita Popov	0529e2e018	[InstrInfo] Use 64-bit immediates for analyzeCompare() (NFCI) The backend generally uses 64-bit immediates (e.g. what MachineOperand::getImm() returns), so use that for analyzeCompare() and optimizeCompareInst() as well. This avoids truncation for targets that support immediates larger 32-bit. In particular, we can avoid the bugprone value normalization hack in the AArch64 target. This is a followup to D108076. Differential Revision: https://reviews.llvm.org/D108875	2021-08-30 19:46:04 +02:00
Nikita Popov	81b106584f	[AArch64] Fix comparison peephole opt with non-0/1 immediate (PR51476) This is a non-intrusive fix for https://bugs.llvm.org/show_bug.cgi?id=51476 intended for backport to the 13.x release branch. It expands on the current hack by distinguishing between CmpValue of 0, 1 and 2, where 0 and 1 have the obvious meaning and 2 means "anything else". The new optimization from D98564 should only be performed for CmpValue of 0 or 1. For main, I think we should switch the analyzeCompare() and optimizeCompare() APIs to use int64_t instead of int, which is in line with MachineOperand's notion of an immediate, and avoids this problem altogether. Differential Revision: https://reviews.llvm.org/D108076	2021-08-15 12:35:52 +02:00
David Green	bd07c2e266	[AArch64] Prefer fmov over orr v.16b when copying f32/f64 This changes the lowering of f32 and f64 COPY from a 128bit vector ORR to a fmov of the appropriate type. At least on some CPU's with 64bit NEON data paths this is expected to be faster, and shouldn't be slower on any CPU that treats fmov as a register rename. Differential Revision: https://reviews.llvm.org/D106365	2021-08-03 17:25:40 +01:00
Jon Roelofs	6611fbc62a	[AArch64] Dump a little more info about unimplemented reg-to-reg copies. NFC	2021-07-12 15:37:11 -07:00
Peter Waller	c5dfee44b9	[CodeGen][AArch64][SVE] Use ld1r[bhsd] for vector splat from memory This avoids the use of the vector unit for copying from scalar to vector. There is an extra ptrue instruction, but a predicate register with the ptrue pattern populated is likely to be free in the context of real code. Tests were generated from a template to cover the axes mentioned at the top of the test file. Co-authored-by: Francesco Petrogalli <francesco.petrogalli@arm.com> Differential Revision: https://reviews.llvm.org/D103170	2021-07-06 12:03:54 +00:00
Pablo Barrio	571c8c5263	[AArch64][v8.3A] Avoid inserting implicit landing pads (PACISP) PACISP have the advantage that they are in HINT space, meaning they can be run successfully in hardware without PAuth support - they will just behave as a NOP. However, PACISP are also implicit landing pads (think of an extra BTI jc). Therefore, they allow indirect jumps of all kinds into them, potentially inserting new gadgets. This patch replaces PACISP by PACI* LR, SP when compiling explicitly for hardware with full PAuth support. PACI* is not in the HINT space, therefore it will fault when run in hardware without PAuth support, but it is also not a landing pad, making programs safer in newer HW. Differential Revision: https://reviews.llvm.org/D101920	2021-06-24 18:24:32 +01:00
Nick Desaulniers	033138ea45	[IR] make stack-protector-guard-* flags into module attrs D88631 added initial support for: - -mstack-protector-guard= - -mstack-protector-guard-reg= - -mstack-protector-guard-offset= flags, and D100919 extended these to AArch64. Unfortunately, these flags aren't retained for LTO. Make them module attributes rather than TargetOptions. Link: https://github.com/ClangBuiltLinux/linux/issues/1378 Reviewed By: tejohnson Differential Revision: https://reviews.llvm.org/D102742	2021-05-21 15:53:30 -07:00
Nick Desaulniers	0f41778919	[AArch64] Support customizing stack protector guard Follow up to D88631 but for aarch64; the Linux kernel uses the command line flags: 1. -mstack-protector-guard=sysreg 2. -mstack-protector-guard-reg=sp_el0 3. -mstack-protector-guard-offset=0 to use the system register sp_el0 for the stack canary, enabling the kernel to have a unique stack canary per task (like a thread, but not limited to userspace as the kernel can preempt itself). Address pr/47341 for aarch64. Fixes: https://github.com/ClangBuiltLinux/linux/issues/289 Signed-off-by: Nick Desaulniers <ndesaulniers@google.com> Reviewed By: xiangzhangllvm, DavidSpickett, dmgreen Differential Revision: https://reviews.llvm.org/D100919	2021-05-17 11:49:22 -07:00
Tim Northover	82a0e808bb	IR/AArch64/X86: add "swifttailcc" calling convention. Swift's new concurrency features are going to require guaranteed tail calls so that they don't consume excessive amounts of stack space. This would normally mean "tailcc", but there are also Swift-specific ABI desires that don't naturally go along with "tailcc" so this adds another calling convention that's the combination of "swiftcc" and "tailcc". Support is added for AArch64 and X86 for now.	2021-05-17 10:48:34 +01:00
Tim Northover	ea0eec69f1	IR+AArch64: add a "swiftasync" argument attribute. This extends any frame record created in the function to include that parameter, passed in X22. The new record looks like [X22, FP, LR] in memory, and FP is stored with 0b0001 in bits 63:60 (CodeGen assumes they are 0b0000 in normal operation). The effect of this is that tools walking the stack should expect to see one of three values there: * 0b0000 => a normal, non-extended record with just [FP, LR] * 0b0001 => the extended record [X22, FP, LR] * 0b1111 => kernel space, and a non-extended record. All other values are currently reserved. If compiling for arm64e this context pointer is address-discriminated with the discriminator 0xc31a and the DB (process-specific) key. There is also an "i8** @llvm.swift.async.context.addr()" intrinsic providing front-ends access to this slot (and forcing its creation initialized to nullptr if necessary).	2021-05-14 11:43:58 +01:00

1 2 3 4 5 ...

466 Commits