Commit Graph

426 Commits

Author SHA1 Message Date
Nikita Popov 0529e2e018 [InstrInfo] Use 64-bit immediates for analyzeCompare() (NFCI)
The backend generally uses 64-bit immediates (e.g. what
MachineOperand::getImm() returns), so use that for analyzeCompare()
and optimizeCompareInst() as well. This avoids truncation for
targets that support immediates larger 32-bit. In particular, we
can avoid the bugprone value normalization hack in the AArch64
target.

This is a followup to D108076.

Differential Revision: https://reviews.llvm.org/D108875
2021-08-30 19:46:04 +02:00
Nikita Popov 81b106584f [AArch64] Fix comparison peephole opt with non-0/1 immediate (PR51476)
This is a non-intrusive fix for
https://bugs.llvm.org/show_bug.cgi?id=51476 intended for backport
to the 13.x release branch. It expands on the current hack by
distinguishing between CmpValue of 0, 1 and 2, where 0 and 1 have
the obvious meaning and 2 means "anything else". The new optimization
from D98564 should only be performed for CmpValue of 0 or 1.

For main, I think we should switch the analyzeCompare() and
optimizeCompare() APIs to use int64_t instead of int, which is in
line with MachineOperand's notion of an immediate, and avoids this
problem altogether.

Differential Revision: https://reviews.llvm.org/D108076
2021-08-15 12:35:52 +02:00
David Green bd07c2e266 [AArch64] Prefer fmov over orr v.16b when copying f32/f64
This changes the lowering of f32 and f64 COPY from a 128bit vector ORR to
a fmov of the appropriate type. At least on some CPU's with 64bit NEON
data paths this is expected to be faster, and shouldn't be slower on any
CPU that treats fmov as a register rename.

Differential Revision: https://reviews.llvm.org/D106365
2021-08-03 17:25:40 +01:00
Jon Roelofs 6611fbc62a [AArch64] Dump a little more info about unimplemented reg-to-reg copies. NFC 2021-07-12 15:37:11 -07:00
Peter Waller c5dfee44b9 [CodeGen][AArch64][SVE] Use ld1r[bhsd] for vector splat from memory
This avoids the use of the vector unit for copying from scalar to
vector. There is an extra ptrue instruction, but a predicate register
with the ptrue pattern populated is likely to be free in the context of
real code.

Tests were generated from a template to cover the axes mentioned at the
top of the test file.

Co-authored-by: Francesco Petrogalli <francesco.petrogalli@arm.com>

Differential Revision: https://reviews.llvm.org/D103170
2021-07-06 12:03:54 +00:00
Pablo Barrio 571c8c5263 [AArch64][v8.3A] Avoid inserting implicit landing pads (PACI*SP)
PACI*SP have the advantage that they are in HINT space, meaning
they can be run successfully in hardware without PAuth support -
they will just behave as a NOP. However, PACI*SP are also implicit
landing pads (think of an extra BTI jc). Therefore, they allow
indirect jumps of all kinds into them, potentially inserting new
gadgets. This patch replaces PACI*SP by PACI* LR, SP when
compiling explicitly for hardware with full PAuth support. PACI*
is not in the HINT space, therefore it will fault when run in
hardware without PAuth support, but it is also not a landing pad,
making programs safer in newer HW.

Differential Revision: https://reviews.llvm.org/D101920
2021-06-24 18:24:32 +01:00
Nick Desaulniers 033138ea45 [IR] make stack-protector-guard-* flags into module attrs
D88631 added initial support for:

- -mstack-protector-guard=
- -mstack-protector-guard-reg=
- -mstack-protector-guard-offset=

flags, and D100919 extended these to AArch64. Unfortunately, these flags
aren't retained for LTO. Make them module attributes rather than
TargetOptions.

Link: https://github.com/ClangBuiltLinux/linux/issues/1378

Reviewed By: tejohnson

Differential Revision: https://reviews.llvm.org/D102742
2021-05-21 15:53:30 -07:00
Nick Desaulniers 0f41778919 [AArch64] Support customizing stack protector guard
Follow up to D88631 but for aarch64; the Linux kernel uses the command
line flags:

1. -mstack-protector-guard=sysreg
2. -mstack-protector-guard-reg=sp_el0
3. -mstack-protector-guard-offset=0

to use the system register sp_el0 for the stack canary, enabling the
kernel to have a unique stack canary per task (like a thread, but not
limited to userspace as the kernel can preempt itself).

Address pr/47341 for aarch64.

Fixes: https://github.com/ClangBuiltLinux/linux/issues/289
Signed-off-by: Nick Desaulniers <ndesaulniers@google.com>

Reviewed By: xiangzhangllvm, DavidSpickett, dmgreen

Differential Revision: https://reviews.llvm.org/D100919
2021-05-17 11:49:22 -07:00
Tim Northover 82a0e808bb IR/AArch64/X86: add "swifttailcc" calling convention.
Swift's new concurrency features are going to require guaranteed tail calls so
that they don't consume excessive amounts of stack space. This would normally
mean "tailcc", but there are also Swift-specific ABI desires that don't
naturally go along with "tailcc" so this adds another calling convention that's
the combination of "swiftcc" and "tailcc".

Support is added for AArch64 and X86 for now.
2021-05-17 10:48:34 +01:00
Tim Northover ea0eec69f1 IR+AArch64: add a "swiftasync" argument attribute.
This extends any frame record created in the function to include that
parameter, passed in X22.

The new record looks like [X22, FP, LR] in memory, and FP is stored with 0b0001
in bits 63:60 (CodeGen assumes they are 0b0000 in normal operation). The effect
of this is that tools walking the stack should expect to see one of three
values there:

  * 0b0000 => a normal, non-extended record with just [FP, LR]
  * 0b0001 => the extended record [X22, FP, LR]
  * 0b1111 => kernel space, and a non-extended record.

All other values are currently reserved.

If compiling for arm64e this context pointer is address-discriminated with the
discriminator 0xc31a and the DB (process-specific) key.

There is also an "i8** @llvm.swift.async.context.addr()" intrinsic providing
front-ends access to this slot (and forcing its creation initialized to nullptr
if necessary).
2021-05-14 11:43:58 +01:00
Peter Waller 3fa6510f6e [CodeGen][AArch64][SVE] Fold [rdffr, ptest] => rdffrs; bugfix for optimizePTestInstr
When a ptest is used to set flags from the output of rdffr, the ptest
can be eliminated, using a flags-setting rdffrs instead.

Additionally, check that nothing consumes flags between rdffr and ptest;
this case appears to have been missed previously.

* There is no unpredicated RDFFRS instruction.
* If substituting RDFFR_PP, require that the mask argument of the
  PTEST matches that of the RDFFR_PP.
* Move some precondition code up inside optimizePTestInstr, so that it
  covers the new code paths for RDFFR which return earlier.
  * Only consider RDFFR, PTEST in same basic block.
  * Check for other flag setting instructions between the two, abort if
    found.
  * Drop an old TODO comment about removing dead PTEST instructions.

RDFFR_P to follow in later patch.

Differential Revision: https://reviews.llvm.org/D101357
2021-05-12 15:06:22 +01:00
Stelios Ioannou 936c777e2b [AArch64] Adds a pre-indexed paired Load/Store optimization for LDR-STR.
This patch merges STR<S,D,Q,W,X>pre-STR<S,D,Q,W,X>ui and
LDR<S,D,Q,W,X>pre-LDR<S,D,Q,W,X>ui instruction pairs into a single
STP<S,D,Q,W,X>pre and LDP<S,D,Q,W,X>pre instruction, respectively.
For each pair, there is a MIR test that verifies this optimization.

Differential Revision: https://reviews.llvm.org/D99272

Change-Id: Ie97a20c8c716c08492fe229c22e14e3c98ef08b7
2021-04-30 17:29:58 +01:00
Pavel Iliin 2ec16103c6 [AArch64] Peephole rule to remove redundant cmp after cset.
Comparisons to zero or one after cset instructions can be safely
removed in examples like:

cset w9, eq          cset w9, eq
cmp  w9, #1   --->   <removed>
b.ne    .L1          b.ne    .L1

cset w9, eq          cset w9, eq
cmp  w9, #0   --->   <removed>
b.ne    .L1          b.eq    .L1

Peephole optimization to detect suitable cases and get rid of that
comparisons added.

Differential Revision: https://reviews.llvm.org/D98564
2021-04-19 19:58:38 +01:00
Andrew Savonichev f037b07b5c Revert "[AArch64] Add Machine InstCombiner patterns for FMUL indexed variant"
This reverts commit cca9b5985c.

Buildbot reported an error for CodeGen/AArch64/machine-combiner-fmul-dup.mir:

*** Bad machine code: Virtual register killed in block, but needed live out. ***
- function:    indexed_2s
- basic block: %bb.0 entry (0x640fee8)
Virtual register %7 is used after the block.

*** Bad machine code: Virtual register defs don't dominate all uses. ***
- function:    indexed_2s
- v. register: %7
LLVM ERROR: Found 2 machine code errors.
2021-04-12 16:28:49 +03:00
Andrew Savonichev cca9b5985c [AArch64] Add Machine InstCombiner patterns for FMUL indexed variant
This patch adds DUP+FMUL => FMUL_indexed pattern to InstCombiner.
FMUL_indexed is normally selected during instruction selection, but it
does not work in cases when VDUP and VMUL are in different basic
blocks.

Differential Revision: https://reviews.llvm.org/D99662
2021-04-12 16:08:39 +03:00
Fangrui Song 5d44c92bf8 Change void getNoop(MCInst &NopInst) to MCInst getNop()
Prefer (self-documenting) return values to output parameters (which are
liable to be used).
While here, rename Noop to Nop which is more widely used and improves
consistency with hasEmitNops/setEmitNops/emitNop/etc.
2021-03-15 12:05:34 -07:00
Amara Emerson 5d6d9b63a3 [GlobalISel] Propagate extends through G_PHIs into the incoming value blocks.
This combine tries to do inter-block hoisting of extends of G_PHIs, into the
originating blocks of the phi's incoming value. The idea is to expose further
optimization opportunities that are normally obscured by the PHI.

Some basic heuristics, and a target hook for AArch64 is added, to allow tuning.
E.g. if the extend is used by a G_PTR_ADD, it doesn't perform this combine
since it may be folded into the addressing mode during selection.

There are very minor code size improvements on AArch64 -Os, but the real benefit
is that it unlocks optimizations like AArch64 conditional compares on some
benchmarks.

Differential Revision: https://reviews.llvm.org/D95703
2021-02-12 11:52:52 -08:00
Nicholas Guy cd880442ae [CodeGen][AArch64] Add TargetInstrInfo hook to modify the TailDuplicateSize default threshold
Different targets might handle branch performance differently, so this patch allows for
targets to specify the TailDuplicateSize threshold. Said threshold defines how small a branch
can be and still be duplicated to generate straight-line code instead.
This patch also specifies said override values for the AArch64 subtarget.

Differential Revision: https://reviews.llvm.org/D95631
2021-02-08 13:28:00 +00:00
Jordan Rupprecht 752fafda3d [NFC] Fix -Wsometimes-uninitialized
After 49142991a6, clang detects that MUL may be uninitialized. Set it to nullptr to suppress this check.

Adding an assert to check that it is ultimately set fails two test cases. Since this is not a new issue, leave the assertion commented out until a code owner can fix the bug. The two failing test cases are noted in the assertion comment.
2021-01-13 20:32:38 -08:00
Kazu Hirata 4c1617dac8 [llvm] Use std::any_of (NFC) 2021-01-13 19:14:44 -08:00
Hsiangkai Wang 914e2f5a02 [NFC] Use generic name for scalable vector stack ID.
Differential Revision: https://reviews.llvm.org/D94471
2021-01-13 10:57:43 +08:00
Mark Murray af7cce2fa4 [AArch64] Add +pauth archictecture option, allowing the v8.3a pointer authentication extension.
Differential Revision: https://reviews.llvm.org/D94083
2021-01-08 13:21:11 +00:00
Bradley Smith c73ae747cb [AArch64][SVE] Add optimization to remove redundant ptest instructions
Co-Authored-by: Graham Hunter <graham.hunter@arm.com>
Co-Authored-by: Paul Walker <paul.walker@arm.com>

Differential Revision: https://reviews.llvm.org/D93292
2021-01-05 15:28:36 +00:00
Kazu Hirata eb198f4c3c [llvm] Use llvm::any_of (NFC) 2021-01-04 11:42:47 -08:00
Kazu Hirata 966f1431de [Target] Use llvm::erase_if (NFC) 2020-12-20 17:43:22 -08:00
Chen Zheng 4830d458dd [MachineCombiner][NFC] Add MustReduceRegisterPressure goal
add a new goal MustReduceRegisterPressure for machine combiner pass.

PowerPC will use this new goal to do some register pressure related optimization.

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D92068
2020-12-14 00:02:42 -05:00
Sander de Smalen 73b6cb67dc [NFCI] Replace AArch64StackOffset by StackOffset.
This patch replaces the AArch64StackOffset class by the generic one
defined in TypeSize.h.

Reviewed By: david-arm

Differential Revision: https://reviews.llvm.org/D88983
2020-11-04 08:49:00 +00:00
Anna Thomas 35cb45c533 [ImplicitNullChecks] Support complex addressing mode
The pass is updated to handle loads through complex addressing mode,
specifically, when we have a scaled register and a scale.
It requires two API updates in TII which have been implemented for X86.

See added IR and MIR testcases.

Tests-Run: make check
Reviewed-By: reames, danstrushin
Differential Revision: https://reviews.llvm.org/D87148
2020-10-07 20:55:38 -04:00
Momchil Velikov a88c722e68 [AArch64] PAC/BTI code generation for LLVM generated functions
PAC/BTI-related codegen in the AArch64 backend is controlled by a set
of LLVM IR function attributes, added to the function by Clang, based
on command-line options and GCC-style function attributes. However,
functions, generated in the LLVM middle end (for example,
asan.module.ctor or __llvm_gcov_write_out) do not get any attributes
and the backend incorrectly does not do any PAC/BTI code generation.

This patch record the default state of PAC/BTI codegen in a set of
LLVM IR module-level attributes, based on command-line options:

* "sign-return-address", with non-zero value means generate code to
  sign return addresses (PAC-RET), zero value means disable PAC-RET.

* "sign-return-address-all", with non-zero value means enable PAC-RET
  for all functions, zero value means enable PAC-RET only for
  functions, which spill LR.

* "sign-return-address-with-bkey", with non-zero value means use B-key
  for signing, zero value mean use A-key.

This set of attributes are always added for AArch64 targets (as
opposed, for example, to interpreting a missing attribute as having a
value 0) in order to be able to check for conflicts when combining
module attributed during LTO.

Module-level attributes are overridden by function level attributes.
All the decision making about whether to not to generate PAC and/or
BTI code is factored out into AArch64FunctionInfo, there shouldn't be
any places left, other than AArch64FunctionInfo, which directly
examine PAC/BTI attributes, except AArch64AsmPrinter.cpp, which
is/will-be handled by a separate patch.

Differential Revision: https://reviews.llvm.org/D85649
2020-09-25 11:47:14 +01:00
Philip Reames e1a3271ebb [AArch64] Teach analyzeBranch to remove branch equivelent to fallthrough
The motivation here is that MachineBlockPlacement relies on analyzeBranch to remove branches to fallthrough blocks when the branch is not fully analyzeable. With the introduction of the FAULTING_OP psuedo for implicit null checking (see D87861), this case becomes important. Note that it's hard to otherwise exercise this path as BranchFolding handle's any fully analyzeable branch sequence without using this interface.

p.s. For anyone who saw my comment in the original review, what I thought was an issue in BranchFolding originally turned out to simply be a bug in my patch. (Now fixed.)

Differential Revision: https://reviews.llvm.org/D88035
2020-09-22 14:38:27 -07:00
Philip Reames b04c181ed7 [AArch64] Enable implicit null check transformation
This change enables the generic implicit null transformation for the AArch64 target. As background for those unfamiliar with our implicit null check support:

    An implicit null check is the use of a signal handler to catch and redirect to a handler a null pointer. Specifically, it's replacing an explicit conditional branch with such a redirect. This is only done for very cold branches under frontend control w/appropriate metadata.
    FAULTING_OP is used to wrap the faulting instruction. It is modelled as being a conditional branch to reflect the fact it can transfer control in the CFG.
    FAULTING_OP does not need to be an analyzable branch to achieve it's purpose. (Or at least, that's the x86 model. I find this slightly questionable.)
    When lowering to MC, we convert the FAULTING_OP back into the actual instruction, record the labels, and lower the original instruction.

As can be seen in the test changes, currently the AArch64 backend does not eliminate the unconditional branch to the fallthrough block. I've tried two approaches, neither of which worked. I plan to return to this in a separate change set once I've wrapped my head around the interactions a bit better. (X86 handles this via AllowModify on analyzeBranch, but adding the obvious code causing BranchFolding to crash. I haven't yet figured out if it's a latent bug in BranchFolding, or something I'm doing wrong.)

Differential Revision: https://reviews.llvm.org/D87851
2020-09-17 16:00:19 -07:00
Philip Reames e6bc7037d3 [AArch64] Statepoint support for AArch64.
Differential Revision: https://reviews.llvm.org/D66012
Patch By: loicottet (with major rebase by me)
2020-09-14 16:43:08 -07:00
Owen Anderson 5987da8764 Revert "Revert "Reapply D70800: Fix AArch64 AAPCS frame record chain""
This reverts commit bc9a29b9ee.

The reasoning that this patch was wrong was itself incorrect
(see discussion on llvm-commits). This patch does seem to be exposing
a latent SVE code generation bug on non-public tests, which should
not block a correctness fix for public, non-SVE use cases.
2020-09-01 19:29:03 +00:00
Paul Walker bc9a29b9ee Revert "Reapply D70800: Fix AArch64 AAPCS frame record chain"
This reverts commit e9d9a61208.

This patch was previously revert by 04879086b4
with the reapplication being done after breaking the assert used to
ensure SP is always 16-byte aligned, which is a requirement of the AAPCS.

For extra context the latest patch caused runtime failures when
building with "-march=armv8-a+sve -mllvm -aarch64-sve-vector-bits-min=256".
2020-09-01 16:09:37 +01:00
Owen Anderson e9d9a61208 Reapply D70800: Fix AArch64 AAPCS frame record chain
Original Commit Message:
After the commit r368987 (rG643adb55769e) was landed, the frame record (FP and LR register)
may be placed in the middle of a stack frame if a function has both callee-saved
general-purpose registers and floating point registers. This will break the stack unwinders
that simply walk through the frame records (based on the guarantee from AAPCS64
"The Frame Pointer" section). This commit fixes the problem by adding the frame record offset.

Patch By: logan
Differential Revision: D70800
2020-08-27 17:29:41 +00:00
Puyan Lotfi 7bc03f5553 [MachineOutliner][AArch64] WA for multiple stack fixup cases in MachineOutliner.
In cases where MachineOutliner candidates either are:

  * noreturn
  * have calls with no available LR or free regs
  * Don't use SP

we can end up hitting stack fixup code for the caller and the callee for
a FrameID of MachineOutlinerDefault. This triggers the assert:

  `assert(OF.FrameConstructionID != MachineOutlinerDefault &&
          "Can only fix up stack references once");`

in AArch64InstrInfo.cpp. This assert exists for now because a lot of the
fixup code is not tested to handle fixing up more than once and needs
some better checks and enhancements to avoid potentially generating
illegal code.

I've filed a Bugzilla report to track this until these cases are handled
by the AArch64 MachineOutliner: https://bugs.llvm.org/show_bug.cgi?id=46767

This diff detects cases that will cause these multiple stack fixups and
prune the Candidates from `RepeatedSequenceLocs`.

    Differential Revision: https://reviews.llvm.org/D83923
2020-08-10 15:43:30 -04:00
Florian Hahn f7658241cb [AArch64] Consider instruction-level contract FMFs in combiner patterns.
Currently, instruction level fast math flags are not considered when
generating patterns for the machine combiner.

This currently leads to some missed opportunities to generate FMAs in
combination with `#pragma clang fp contract (fast)`.

For example, when building the example below with -O3 for AArch64, no
FMADD is generated. If built with -O2 and the DAGCombiner is used
instead of the MachineCombiner for FMAs, an FMADD is generated.

With this patch, the same code is generated in both cases.

    float madd_contract(float a, float b, float c) {
    #pragma clang fp contract (fast)
      return (a * b) + c;
    }

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D84930
2020-08-04 10:25:16 +01:00
Tim Northover 0f1494be43 AArch64: avoid UB shift of negative value
Left shifting a negative value is undefined behaviour, so this just moves the
negation afterwards to avoid it.
2020-07-27 13:49:50 +01:00
Eli Friedman 993c1a3219 [AArch64][SVE] Teach copyPhysReg to copy ZPR2/3/4.
It's sort of tricky to hit this in practice, but not impossible. I have
a synthetic C testcase if anyone is interested.

The implementation is identical to the equivalent NEON register copies.

Differential Revision: https://reviews.llvm.org/D84373
2020-07-23 16:41:37 -07:00
Eric Christopher cf23852587 [Target] As part of using inclusive language within the llvm project,
migrate away from the use of blacklist and whitelist.

This change affects an internal llvm command line option.
2020-06-20 00:06:39 -07:00
Kristof Beyls c35ed40f4f [AArch64] Extend AArch64SLSHardeningPass to harden BLR instructions.
To make sure that no barrier gets placed on the architectural execution
path, each
  BLR x<N>
instruction gets transformed to a
  BL __llvm_slsblr_thunk_x<N>
instruction, with __llvm_slsblr_thunk_x<N> a thunk that contains
__llvm_slsblr_thunk_x<N>:
  BR x<N>
  <speculation barrier>

Therefore, the BLR instruction gets split into 2; one BL and one BR.
This transformation results in not inserting a speculation barrier on
the architectural execution path.

The mitigation is off by default and can be enabled by the
harden-sls-blr subtarget feature.

As a linker is allowed to clobber X16 and X17 on function calls, the
above code transformation would not be correct in case a linker does so
when N=16 or N=17. Therefore, when the mitigation is enabled, generation
of BLR x16 or BLR x17 is avoided.

As BLRA* indirect calls are not produced by LLVM currently, this does
not aim to implement support for those.

Differential Revision:  https://reviews.llvm.org/D81402
2020-06-12 07:34:33 +01:00
Kristof Beyls 0ee176edc8 [AArch64] Introduce AArch64SLSHardeningPass, implementing hardening of RET and BR instructions.
Some processors may speculatively execute the instructions immediately
following RET (returns) and BR (indirect jumps), even though
control flow should change unconditionally at these instructions.
To avoid a potential miss-speculatively executed gadget after these
instructions leaking secrets through side channels, this pass places a
speculation barrier immediately after every RET and BR instruction.

Since these barriers are never on the correct, architectural execution
path, performance overhead of this is expected to be low.

On targets that implement that Armv8.0-SB Speculation Barrier extension,
a single SB instruction is emitted that acts as a speculation barrier.
On other targets, a DSB SYS followed by a ISB is emitted to act as a
speculation barrier.

These speculation barriers are implemented as pseudo instructions to
avoid later passes to analyze them and potentially remove them.

Even though currently LLVM does not produce BRAA/BRAB/BRAAZ/BRABZ
instructions, these are also mitigated by the pass and tested through a
MIR test.

The mitigation is off by default and can be enabled by the
harden-sls-retbr subtarget feature.

Differential Revision:  https://reviews.llvm.org/D81400
2020-06-11 07:51:17 +01:00
hsmahesha 0ed2c04636 [AMDGPU/MemOpsCluster] Let mem ops clustering logic also consider number of clustered bytes
Summary:
While clustering mem ops, AMDGPU target needs to consider number of clustered bytes
to decide on max number of mem ops that can be clustered. This patch adds support to pass
number of clustered bytes to target mem ops clustering logic.

Reviewers: foad, rampitec, arsenm, vpykhtin, javedabsar

Reviewed By: foad

Subscribers: MatzeB, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, javed.absar, kerbowa, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D80545
2020-06-01 22:52:34 +05:30
Tim Northover dace8224f3 AArch64: materialize large stack offset into xzr correctly.
When a stack offset was too big to materialize in a single instruction, we were
trying to do it in stages:

    adds xD, sp, #imm
    adds xD, xD, #imm

Unfortunately, if xD is xzr then the second instruction doesn't exist and
wouldn't do what was needed if it did. Instead we can use a temporary register
for all but the last addition.
2020-06-01 09:30:05 +01:00
Cullen Rhodes 8a397b66b2 [AArch64][SVE] Add support for spilling/filling ZPR2/3/4
Summary:
This patch enables the register allocator to spill/fill lists of 2, 3
and 4 SVE vectors registers to/from the stack. This is implemented with
pseudo instructions that get expanded to individual LDR_ZXI/STR_ZXI
instructions in AArch64ExpandPseudoInsts.

Patch by Sander de Smalen.

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D75988
2020-05-28 10:02:57 +00:00
Fangrui Song 0840d725c4 [MC] Change MCCFIInstruction::createDefCfaOffset to cfiDefCfaOffset which does not negate Offset
The negative Offset has caused a bunch of problems and confused quite a
few call sites. Delete the unneeded negation and fix all call sites.
2020-05-22 17:07:11 -07:00
Eli Friedman f09d220c71 [AArch64][SVE] Fill out missing unpredicated load/store patterns.
The set of patterns for unpredicated load/store was incomplete: it only
included non-extending stores.  Fill out the remaining patterns for
extending stores, and add the corresponding support to frame offset
lowering.

Differential Revision: https://reviews.llvm.org/D80349
2020-05-21 13:29:30 -07:00
Eli Friedman b4f9b34701 [AArch64] Fix unwind info generated by outliner.
The offsets were wrong. The result is now the same as what the compiler
would generate for a function that spills lr normally.

Differential Revision: https://reviews.llvm.org/D80238
2020-05-20 16:39:00 -07:00
Eli Friedman 5d2c3a0b8c [AArch64] Disable MachineOutliner on Windows.
The handling of unwind info is broken, so disable it for now.
2020-05-19 13:49:03 -07:00
Vedant Kumar c0fa447e02 AArch64: Remove reversedInstructionsWithoutDebug helper
When using reversedInstructionsWithoutDebug to construct a range from a
pair of MachineInstrBundleIterators, the range unexpectedly leaves out an
element. This results in mis-optimization as @mstorsjo points out in
https://reviews.llvm.org/D78157.

The problem is that when we convert a MachineInstrBundleIterator to a
reverse iterator, the result gets incremented:

  MachineInstrBundleIterator(++I.getReverse())

The comment there explains that the "resulting iterator will dereference
... to the previous node, which is somewhat unexpected; but converting
the two endpoints in a range will give the same range in reverse". This
makes it hard to understand what reversedInstructionsWithoutDebug will
do: I've removed the helper to prevent similar mistakes in the future.
2020-04-24 11:28:17 -07:00