Commit Graph

370 Commits

Author SHA1 Message Date
David Spickett e428baf001 [LLVM][ARM] Remove options for armv2, 2A, 3 and 3M
Fixes #57486

These pre v4 architectures are not specifically supported
by codegen. As demonstrated in the linked issue.

GCC has not supported 3M since GCC 9 and presumably
2 and 2A earlier than that. So we are aligned in that sense.

(see https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=2abd6e34fcf3bd9f9ffafcaa47cdc3ed443f9add)

This removes the options and associated testing.

The Pre_v4 build attribute remains mainly because its absence
would be more confusing. It will not be used other than to
complete the list of build attributes as shown in the ABI.

https://github.com/ARM-software/abi-aa/blob/main/addenda32/addenda32.rst#3352the-target-related-attributes

Reviewed By: nickdesaulniers, peter.smith, rengolin

Differential Revision: https://reviews.llvm.org/D133109
2022-09-08 09:49:48 +00:00
Lucas Prates 70a5c52534 [ARM][Thumb] Command-line option to ensure AAPCS compliant Frame Records
Currently the a AAPCS compliant frame record is not always created for
functions when it should. Although a consistent frame record might not
be required in some cases, there are still scenarios where applications
may want to make use of the call hierarchy made available trough it.

In order to enable the use of AAPCS compliant frame records whilst keep
backwards compatibility, this patch introduces a new command-line option
(`-mframe-chain=[none|aapcs|aapcs+leaf]`) for Aarch32 and Thumb backends.
The option allows users to explicitly select when to use it, and is also
useful to ensure the extra overhead introduced by the frame records is
only introduced when necessary, in particular for Thumb targets.

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D125094
2022-06-27 14:08:48 +01:00
Krasimir Georgiev 8f2ba36336 Revert "[ARM][Thumb] Command-line option to ensure AAPCS compliant Frame Records AND [NFC][Thumb] Update frame-chain codegen test to use thumbv6m"
This reverts commit 7625e01d66 and
dependent cbcce82ef6.

Commit 7625e01d66 causes some new codegen test
failures under asan, e.g., CodeGen/ARM/execute-only.ll:
https://lab.llvm.org/buildbot/#/builders/5/builds/24659/steps/15/logs/stdio.
2022-06-15 16:10:02 +02:00
Lucas Prates 7625e01d66 [ARM][Thumb] Command-line option to ensure AAPCS compliant Frame Records
Currently the a AAPCS compliant frame record is not always created for
functions when it should. Although a consistent frame record might not
be required in some cases, there are still scenarios where applications
may want to make use of the call hierarchy made available trough it.

In order to enable the use of AAPCS compliant frame records whilst keep
backwards compatibility, this patch introduces a new command-line option
(`-mframe-chain=[none|aapcs|aapcs+leaf]`) for Aarch32 and Thumb backends.
The option allows users to explicitly select when to use it, and is also
useful to ensure the extra overhead introduced by the frame records is
only introduced when necessary, in particular for Thumb targets.

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D125094
2022-06-14 13:37:51 +01:00
Lucas Prates 33b9ad647e Revert "[ARM][Thumb] Command-line option to ensure AAPCS compliant Frame Records"
Reverting change due to test failure.

This reverts commit 6119053dab.
2022-06-13 11:00:49 +01:00
Lucas Prates 6119053dab [ARM][Thumb] Command-line option to ensure AAPCS compliant Frame Records
Currently the a AAPCS compliant frame record is not always created for
functions when it should. Although a consistent frame record might not
be required in some cases, there are still scenarios where applications
may want to make use of the call hierarchy made available trough it.

In order to enable the use of AAPCS compliant frame records whilst keep
backwards compatibility, this patch introduces a new command-line option
(`-mframe-chain=[none|aapcs|aapcs+leaf]`) for Aarch32 and Thumb backends.
The option allows users to explicitly select when to use it, and is also
useful to ensure the extra overhead introduced by the frame records is
only introduced when necessary, in particular for Thumb targets.

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D125094
2022-06-13 10:21:06 +01:00
Martin Storsjö 2ab19bfa41 [ARM] Adjust the frame pointer when it's needed for SEH unwinding
For functions that require restoring SP from FP (e.g. that need to
align the stack, or that have variable sized allocations), the prologue
and epilogue previously used to look like this:

    push {r4-r5, r11, lr}
    add r11, sp, #8
    ...
    sub r4, r11, #8
    mov sp, r4
    pop {r4-r5, r11, pc}

This is problematic, because this unwinding operation (restoring sp
from r11 - offset) can't be expressed with the SEH unwind opcodes
(probably because this unwind procedure doesn't map exactly to
individual instructions; note the detour via r4 in the epilogue too).

To make unwinding work, the GPR push is split into two; the first one
pushing all other registers, and the second one pushing r11+lr, so that
r11 can be set pointing at this spot on the stack:

    push {r4-r5}
    push {r11, lr}
    mov r11, sp
    ...
    mov sp, r11
    pop {r11, lr}
    pop {r4-r5}
    bx lr

For the same setup, MSVC generates code that uses two registers;
r11 still pointing at the {r11,lr} pair, but a separate register
used for restoring the stack at the end:

    push {r4-r5, r7, r11, lr}
    add r11, sp, #12
    mov r7, sp
    ...
    mov sp, r7
    pop {r4-r5, r7, r11, pc}

For cases with clobbered float/vector registers, they are pushed
after the GPRs, before the {r11,lr} pair.

Differential Revision: https://reviews.llvm.org/D125649
2022-06-02 12:28:46 +03:00
David Penry dcb77643e3 Reapply [CodeGen][ARM] Enable Swing Module Scheduling for ARM
Fixed "private field is not used" warning when compiled
with clang.

original commit: 28d09bbbc3
reverted in: fa49021c68

------

This patch permits Swing Modulo Scheduling for ARM targets
turns it on by default for the Cortex-M7.  The t2Bcc
instruction is recognized as a loop-ending branch.

MachinePipeliner is extended by adding support for
"unpipelineable" instructions.  These instructions are
those which contribute to the loop exit test; in the SMS
papers they are removed before creating the dependence graph
and then inserted into the final schedule of the kernel and
prologues. Support for these instructions was not previously
necessary because current targets supporting SMS have only
supported it for hardware loop branches, which have no
loop-exit-contributing instructions in the loop body.

The current structure of the MachinePipeliner makes it difficult
to remove/exclude these instructions from the dependence graph.
Therefore, this patch leaves them in the graph, but adds a
"normalization" method which moves them in the schedule to
stage 0, which causes them to appear properly in kernel and
prologues.

It was also necessary to be more careful about boundary nodes
when iterating across successors in the dependence graph because
the loop exit branch is now a non-artificial successor to
instructions in the graph. In additional, schedules with physical
use/def pairs in the same cycle should be treated as creating an
invalid schedule because the scheduling logic doesn't respect
physical register dependence once scheduled to the same cycle.

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D122672
2022-04-29 10:54:39 -07:00
David Penry fa49021c68 Revert "[CodeGen][ARM] Enable Swing Module Scheduling for ARM"
This reverts commit 28d09bbbc3
while I investigate a buildbot failure.
2022-04-28 13:29:27 -07:00
David Penry 28d09bbbc3 [CodeGen][ARM] Enable Swing Module Scheduling for ARM
This patch permits Swing Modulo Scheduling for ARM targets
turns it on by default for the Cortex-M7.  The t2Bcc
instruction is recognized as a loop-ending branch.

MachinePipeliner is extended by adding support for
"unpipelineable" instructions.  These instructions are
those which contribute to the loop exit test; in the SMS
papers they are removed before creating the dependence graph
and then inserted into the final schedule of the kernel and
prologues. Support for these instructions was not previously
necessary because current targets supporting SMS have only
supported it for hardware loop branches, which have no
loop-exit-contributing instructions in the loop body.

The current structure of the MachinePipeliner makes it difficult
to remove/exclude these instructions from the dependence graph.
Therefore, this patch leaves them in the graph, but adds a
"normalization" method which moves them in the schedule to
stage 0, which causes them to appear properly in kernel and
prologues.

It was also necessary to be more careful about boundary nodes
when iterating across successors in the dependence graph because
the loop exit branch is now a non-artificial successor to
instructions in the graph. In additional, schedules with physical
use/def pairs in the same cycle should be treated as creating an
invalid schedule because the scheduling logic doesn't respect
physical register dependence once scheduled to the same cycle.

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D122672
2022-04-28 13:01:18 -07:00
Eli Friedman 2f497ec3a0 [ARM] Fix ARM backend to correctly use atomic expansion routines.
Without this patch, clang would generate calls to __sync_* routines on
targets where it does not make sense; we can't assume the routines exist
on unknown targets. Linux has special implementations of the routines
that work on old ARM targets; other targets have no such routines. In
general, atomics operations which aren't natively supported should go
through libatomic (__atomic_*) APIs, which can support arbitrary atomics
through locks.

ARM targets older than v6, where this patch makes a difference, are rare
in practice, but not completely extinct. See, for example, discussion on
D116088.

This also affects Cortex-M0, but I don't think __sync_* routines
actually exist in any Cortex-M0 libraries. So in practice this just
leads to a slightly different linker error for those cases, I think.

Mechanically, this patch does the following:

- Ensures we run atomic expansion unconditionally; it never makes sense to
completely skip it.
- Fixes getMaxAtomicSizeInBitsSupported() so it returns an appropriate
number on all ARM subtargets.
- Fixes shouldExpandAtomicRMWInIR() and shouldExpandAtomicCmpXchgInIR() to
correctly handle subtargets that don't have atomic instructions.

Differential Revision: https://reviews.llvm.org/D120026
2022-03-18 12:43:57 -07:00
Tomas Matheson 831ab35b2f [ARM][AArch64] generate subtarget feature flags
Reland of D120906 after sanitizer failures.

This patch aims to reduce a lot of the boilerplate around adding new subtarget
features. From the SubtargetFeatures tablegen definitions, a series of calls to
the macro GET_SUBTARGETINFO_MACRO are generated in
ARM/AArch64GenSubtargetInfo.inc.  ARMSubtarget/AArch64Subtarget can then use
this macro to define bool members and the corresponding getter methods.

Some naming inconsistencies have been fixed to allow this, and one unused
member removed.

This implementation only applies to boolean members; in future both BitVector
and enum members could also be generated.

Differential Revision: https://reviews.llvm.org/D120906
2022-03-18 16:07:00 +00:00
Tomas Matheson 62c481542e Revert "[ARM][AArch64] generate subtarget feature flags"
This reverts commit dd8b0fecb9.
2022-03-18 11:58:20 +00:00
Tomas Matheson dd8b0fecb9 [ARM][AArch64] generate subtarget feature flags
This patch aims to reduce a lot of the boilerplate around adding new subtarget
features. From the SubtargetFeatures tablegen definitions, a series of calls to
the macro GET_SUBTARGETINFO_MACRO are generated in
ARM/AArch64GenSubtargetInfo.inc.  ARMSubtarget/AArch64Subtarget can then use
this macro to define bool members and the corresponding getter methods.

Some naming inconsistencies have been fixed to allow this, and one unused
member removed.

This implementation only applies to boolean members; in future both BitVector
and enum members could also be generated.

Differential Revision: https://reviews.llvm.org/D120906
2022-03-18 11:48:20 +00:00
Mircea Trofin cb2160760e [nfc][codegen] Move RegisterBank[Info].h under CodeGen
This wraps up from D119053. The 2 headers are moved as described,
fixed file headers and include guards, updated all files where the old
paths were detected (simple grep through the repo), and `clang-format`-ed it all.

Differential Revision: https://reviews.llvm.org/D119876
2022-03-01 21:53:25 -08:00
Egor Zhdan 3a1cb36237 Add DriverKit support
This patch is the first in a series of patches to upstream the support for Apple's DriverKit. Once complete, it will allow targeting DriverKit platform with Clang similarly to AppleClang.

This code was originally authored by JF Bastien.

Differential Revision: https://reviews.llvm.org/D118046
2022-02-22 13:42:53 +00:00
Mark Murray 3d7662142d [ARM] Undeprecate complex IT blocks
AArch32/Armv8A  introduced the performance deprecation of certain patterns
of IT instructions.  After some debate internal to ARM, this is now being
reverted; i.e. no IT instruction patterns are performance deprecated
anymore, as the perfomance degredation is not significant enough.

This reverts the following:

"ARMv8-A deprecates some uses of the T32 IT instruction. All uses of
IT that apply to instructions other than a single subsequent 16-bit
instruction from a restricted set are deprecated, as are explicit
references to the PC within that single 16-bit instruction. This permits
the non-deprecated forms of IT and subsequent instructions to be treated
as a single 32-bit conditional instruction."

The deprecation no longer applies, but the behaviour may be controlled
by the -arm-restrict-it and -arm-no-restrict-it command-line options,
with the latter being the default. No warnings about complex IT blocks
will be generated.

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D118044
2022-02-07 15:47:53 +00:00
Ties Stuij 6b1e844b69 [ARM] Add Cortex-X1C Support for Clang and LLVM
This patch upstreams support for the Arm-v8 Cortex-X1C processor for AArch64 and
ARM.

For more information, see:
- https://community.arm.com/arm-community-blogs/b/announcements/posts/arm-cortex-x1c
- https://developer.arm.com/documentation/101968/0002/Functional-description/Technical-overview/Components

The following people contributed to this patch:
- Simon Tatham
- Ties Stuij

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D117202
2022-01-31 14:23:35 +00:00
Lucas Prates cd7f621a0a [ARM][AArch64] Introduce Armv9.3-A
This patch introduces support for targetting the Armv9.3-A architecture,
which should map to the existing Armv8.8-A extensions.

Differential Revision: https://reviews.llvm.org/D116158
2022-01-03 12:40:43 +00:00
Simon Tatham d50072f74e [ARM] Introduce an empty "armv8.8-a" architecture.
This is the first commit in a series that implements support for
"armv8.8-a" architecture. This should contain all the necessary
boilerplate to make the 8.8-A architecture exist from LLVM and Clang's
point of view: it adds the new arch as a subtarget feature, a definition
in TargetParser, a name on the command line, an appropriate set of
predefined macros, and adds appropriate tests. The new architecture name
is supported in both AArch32 and AArch64.

However, in this commit, no actual _functionality_ is added as part of
the new architecture. If you specify -march=armv8.8a, the compiler
will accept it and set the right predefines, but generate no code any
differently.

Differential Revision: https://reviews.llvm.org/D115694
2021-12-31 16:43:53 +00:00
Ties Stuij 63eb7ff47d [ARM] Implement PAC return address signing mechanism for PACBTI-M
This patch implements PAC return address signing for armv8-m. This patch roughly
accomplishes the following things:

- PAC and AUT instructions are generated.
- They're part of the stack frame setup, so that shrink-wrapping can move them
inwards to cover only part of a function
- The auth code generated by PAC is saved across subroutine calls so that AUT
can find it again to check
- PAC is emitted before stacking registers (so that the SP it signs is the one
on function entry).
- The new pseudo-register ra_auth_code is mentioned in the DWARF frame data
- With CMSE also in use: PAC is emitted before stacking FPCXTNS, and AUT
validates the corresponding value of SP
- Emit correct unwind information when PAC is replaced by PACBTI
- Handle tail calls correctly

Some notes:

We make the assembler accept the `.save {ra_auth_code}` directive that is
emitted by the compiler when it saves a register that contains a
return address authentication code.

For EHABI we need to have the `FrameSetup` flag on the instruction and
handle the `t2PACBTI` opcode (identically to `t2PAC`), so we can emit
`.save {ra_auth_code}`, instead of `.save {r12}`.

For PACBTI-M, the instruction which computes return address PAC should use SP
value before adjustment for the argument registers save are (used for variadic
functions and when a parameter is is split between stack and register), but at
the same it should be after the instruction that saves FPCXT when compiling a
CMSE entry function.

This patch moves the varargs SP adjustment after the FPCXT save (they are never
enabled at the same time), so in a following patch handling of the `PAC`
instruction can be placed between them.

Epilogue emission code adjusted in a similar manner.

PACBTI-M code generation should not emit any instructions for architectures
v6-m, v8-m.base, and for A- and R-class cores. Diagnostic message for such cases
is handled separately by a future ticket.

note on tail calls:

If the called function has four arguments that occupy registers `r0`-`r3`, the
only option for holding the function pointer itself is `r12`, but this register
is used to keep the PAC during function/prologue epilogue and clobbers the
function pointer.

When we do the tail call we need the five registers (`r0`-`r3` and `r12`) to
keep six values - the four function arguments, the function pointer and the PAC,
which is obviously impossible.

One option would be to authenticate the return address before all callee-saved
registers are restored, so we have a scratch register to temporarily keep the
value of `r12`. The issue with this approach is that it violates a fundamental
invariant that PAC is computed using CFA as a modifier. It would also mean using
separate instructions to pop `lr` and the rest of the callee-saved registers,
which would offset the advantages of doing a tail call.

Instead, this patch disables indirect tail calls when the called function take
four or more arguments and the return address sign and authentication is enabled
for the caller function, conservatively assuming the caller function would spill
LR.

This patch is part of a series that adds support for the PACBTI-M extension of
the Armv8.1-M architecture, as detailed here:

https://community.arm.com/arm-community-blogs/b/architectures-and-processors-blog/posts/armv8-1-m-pointer-authentication-and-branch-target-identification-extension

The PACBTI-M specification can be found in the Armv8-M Architecture Reference
Manual:

https://developer.arm.com/documentation/ddi0553/latest

The following people contributed to this patch:

- Momchil Velikov
- Ties Stuij

Reviewed By: danielkiss

Differential Revision: https://reviews.llvm.org/D112429
2021-12-07 10:15:19 +00:00
Ties Stuij 0fbb17458a [ARM] Implement setjmp BTI placement for PACBTI-M
This patch intends to guard indirect branches performed by longjmp
by inserting BTI instructions after calls to setjmp.

Calls with 'returns-twice' are lowered to a new pseudo-instruction
named t2CALL_BTI that is later expanded to a bundle of {tBL,t2BTI}.

This patch is part of a series that adds support for the PACBTI-M extension of
the Armv8.1-M architecture, as detailed here:

https://community.arm.com/arm-community-blogs/b/architectures-and-processors-blog/posts/armv8-1-m-pointer-authentication-and-branch-target-identification-extension

The PACBTI-M specification can be found in the Armv8-M Architecture Reference
Manual:

https://developer.arm.com/documentation/ddi0553/latest

The following people contributed to this patch:

- Alexandros Lamprineas
- Ties Stuij

Reviewed By: labrinea

Differential Revision: https://reviews.llvm.org/D112427
2021-12-06 11:07:10 +00:00
Ties Stuij 5cff77c23f [clang][ARM] PACBTI-M assembly support
Introduce assembly support for Armv8.1-M PACBTI extension. This is an optional
extension in v8.1-M.

There are 10 new system registers and 5 new instructions, all predicated on the
feature.

The attribute for llvm-mc is called "pacbti". For armclang, an architecture
extension also called "pacbti" was created.

This patch is part of a series that adds support for the PACBTI-M extension of
the Armv8.1-M architecture, as detailed here:

https://community.arm.com/arm-community-blogs/b/architectures-and-processors-blog/posts/armv8-1-m-pointer-authentication-and-branch-target-identification-extension

The PACBTI-M specification can be found in the Armv8-M Architecture Reference
Manual:

https://developer.arm.com/documentation/ddi0553/latest

The following people contributed to this patch:

- Victor Campos
- Ties Stuij

Reviewed By: labrinea

Differential Revision: https://reviews.llvm.org/D112420
2021-11-30 09:28:18 +00:00
Mubashar Ahmad 8e47b83ec9 [AArch64][ARM] Enablement of Cortex-A710 Support
Phabricator review: https://reviews.llvm.org/D113256
2021-11-18 10:58:05 +00:00
Daniel Kiss d8075e8781 Reland "[ARM] __cxa_end_cleanup should be called instead of _UnwindResume."
This is relanding commit da1d1a0869 .
This patch additionally addresses failures found in buildbots & post review comments.

ARM EHABI[1] specifies the __cxa_end_cleanup to be called after cleanup.
It will call the UnwindResume.
__cxa_begin_cleanup will be called from libcxxabi while __cxa_end_cleanup is never called.
This will trigger a termination when a foreign exception is processed while UnwindResume is called
because the global state will be wrong due to the missing __cxa_end_cleanup call.

Additional test here: D109856
[1] https://github.com/ARM-software/abi-aa/blob/main/ehabi32/ehabi32.rst#941compiler-helper-functions

Reviewed By: logan

Differential Revision: https://reviews.llvm.org/D111703
2021-10-28 21:45:09 +02:00
Daniel Kiss 66e03db814 Revert "Reland "[ARM] __cxa_end_cleanup should be called instead of _UnwindResume.""
This reverts commit b6420e575f.
2021-10-28 17:24:53 +02:00
Daniel Kiss b6420e575f Reland "[ARM] __cxa_end_cleanup should be called instead of _UnwindResume."
This is relanding commit da1d1a0869 .
This patch additionally addresses failures found in buildbots & post review comments.

ARM EHABI[1] specifies the __cxa_end_cleanup to be called after cleanup.
It will call the UnwindResume.
__cxa_begin_cleanup will be called from libcxxabi while __cxa_end_cleanup is never called.
This will trigger a termination when a foreign exception is processed while UnwindResume is called
because the global state will be wrong due to the missing __cxa_end_cleanup call.

Additional test here: D109856
[1] https://github.com/ARM-software/abi-aa/blob/main/ehabi32/ehabi32.rst#941compiler-helper-functions

Reviewed By: logan

Differential Revision: https://reviews.llvm.org/D111703
2021-10-28 16:49:19 +02:00
Daniel Kiss 894ddba1c9 Revert "[ARM] __cxa_end_cleanup should be called instead of _UnwindResume."
This reverts commit da1d1a0869.
2021-10-27 14:29:35 +02:00
Daniel Kiss da1d1a0869 [ARM] __cxa_end_cleanup should be called instead of _UnwindResume.
ARM EHABI[1] specifies the __cxa_end_cleanup to be called after cleanup.
It will call the UnwindResume.
__cxa_begin_cleanup will be called from libcxxabi while __cxa_end_cleanup is never called.
This will trigger a termination when a foreign exception is processed while UnwindResume is called
because the global state will be wrong due to the missing __cxa_end_cleanup call.

Additional test here: D109856
[1] https://github.com/ARM-software/abi-aa/blob/main/ehabi32/ehabi32.rst#941compiler-helper-functions

Reviewed By: logan

Differential Revision: https://reviews.llvm.org/D111703
2021-10-27 10:40:00 +02:00
Victor Campos 3550e242fa [Clang][ARM][AArch64] Add support for Armv9-A, Armv9.1-A and Armv9.2-A
armv9-a, armv9.1-a and armv9.2-a can be targeted using the -march option
both in ARM and AArch64.

 - Armv9-A maps to Armv8.5-A.
 - Armv9.1-A maps to Armv8.6-A.
 - Armv9.2-A maps to Armv8.7-A.
 - The SVE2 extension is enabled by default on these architectures.
 - The cryptographic extensions are disabled by default on these
 architectures.

The Armv9-A architecture is described in the Arm® Architecture Reference
Manual Supplement Armv9, for Armv9-A architecture profile
(https://developer.arm.com/documentation/ddi0608/latest).

Reviewed By: SjoerdMeijer

Differential Revision: https://reviews.llvm.org/D109517
2021-10-11 17:44:09 +01:00
Alexandros Lamprineas 1bd5ea968e [ARM] Mitigate the cve-2021-35465 security vulnurability.
Recently a vulnerability issue is found in the implementation of VLLDM
instruction in the Arm Cortex-M33, Cortex-M35P and Cortex-M55. If the
VLLDM instruction is abandoned due to an exception when it is partially
completed, it is possible for subsequent non-secure handler to access
and modify the partial restored register values. This vulnerability is
identified as CVE-2021-35465.

The mitigation sequence varies between v8-m and v8.1-m as follows:

v8-m.main
---------
mrs        r5, control
tst        r5, #8       /* CONTROL_S.SFPA */
it         ne
.inst.w    0xeeb00a40   /* vmovne s0, s0 */
1:
vlldm      sp           /* Lazy restore of d0-d16 and FPSCR. */

v8.1-m.main
-----------
vscclrm    {vpr}        /* Clear VPR. */
vlldm      sp           /* Lazy restore of d0-d16 and FPSCR. */

More details on
developer.arm.com/support/arm-security-updates/vlldm-instruction-security-vulnerability

Differential Revision: https://reviews.llvm.org/D109157
2021-09-16 12:56:43 +01:00
Tomas Matheson 18dbe68978 [ARM][NFC] Tidy up subtarget frame pointer routines
getFramePointerReg only depends on information in ARMSubtarget,
so move it in there so it can be accessed from more places.

Make use of ARMSubtarget::getFramePointerReg to remove duplicated code.

The main use of useR7AsFramePointer is getFramePointerReg, so inline it.

Differential Revision: https://reviews.llvm.org/D104476
2021-06-19 17:00:45 +01:00
Daniel Kiss 801ab71032 [ARM][AArch64] SLSHardening: make non-comdat thunks possible
Linker scripts might not handle COMDAT sections. SLSHardeing adds
new section for each __llvm_slsblr_thunk_xN. This new option allows
the generation of the thunks into the normal text section to handle these
exceptional cases.
,comdat or ,noncomdat can be added to harden-sls to control the codegen.
-mharden-sls=[all|retbr|blr],nocomdat.

Reviewed By: kristof.beyls

Differential Revision: https://reviews.llvm.org/D100546
2021-05-20 17:07:05 +02:00
Malhar Jajoo dfe3ffaa4a [ARM] Transforming memset to Tail predicated Loop
This patch converts llvm.memset intrinsic into Tail Predicated
Hardware loops for a target that supports the Arm M-profile
Vector Extension (MVE).

The llvm.memset is converted to a TP loop for both
constant and non-constant input sizes (of llvm.memset).

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D100435
2021-05-07 13:35:53 +01:00
Malhar Jajoo 9ff38e2d9d [ARM] Transforming memcpy to Tail predicated Loop
This patch converts llvm.memcpy intrinsic into Tail Predicated
Hardware loops for a target that supports the Arm M-profile
Vector Extension (MVE).

From an implementation point of view, the patch

- adds an ARM specific SDAG Node (to which the llvm.memcpy intrinsic is lowered to, during first phase of ISel)
- adds a corresponding TableGen entry to generate a pseudo instruction, with a custom inserter,
  on matching the above node.
- Adds a custom inserter function that expands the pseudo instruction into MIR suitable
   to be (by later passes) into a WLSTP loop.

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D99723
2021-05-06 23:21:28 +01:00
Malhar Jajoo fc690777fc Revert "[ARM] Transforming memcpy to Tail predicated Loop"
Reverting commit since it causes failure (10462).
This reverts commit b856f4a232.
2021-05-06 12:39:08 +01:00
Malhar Jajoo b856f4a232 [ARM] Transforming memcpy to Tail predicated Loop
This patch converts llvm.memcpy intrinsic into Tail Predicated
Hardware loops for a target that supports the Arm M-profile
Vector Extension (MVE).

From an implementation point of view, the patch

- adds an ARM specific SDAG Node (to which the llvm.memcpy intrinsic is lowered to, during first phase of ISel)
- adds a corresponding TableGen entry to generate a pseudo instruction, with a custom inserter,
  on matching the above node.
- Adds a custom inserter function that expands the pseudo instruction into MIR suitable
   to be (by later passes) into a WLSTP loop.

Note: A cli option is used to control the conversion of memcpy to TP
loop and this option is currently disabled by default. It may be enabled
in the future after further downstream testing.

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D99723
2021-05-06 09:34:09 +01:00
David Green b1ef919aad [ARM] Add CostKind to getMVEVectorCostFactor.
This adds the CostKind to getMVEVectorCostFactor, so that it can
automatically account for CodeSize costs, where it returns a cost of 1
not the MVEFactor used for Throughput/Latency. This helps simplify the
caller code and allows us to get the codesize cost more correct in more
cases.
2021-02-11 15:33:59 +00:00
Simon Tatham 69f1a7ad82 [ARM] Copy-paste error in ARMv87a architecture definition.
In the tablegen architecture definition, the Name field for the
ARMv87a record read "ARMv86a". All the other records contain their own
names.

Corrected it to "ARMv87a", and added the necessary value in
ARMArchEnum for that to refer to.

Reviewed By: pratlucas

Differential Revision: https://reviews.llvm.org/D96493
2021-02-11 13:35:56 +00:00
Mark Murray 5abfeccf10 [ARM][AArch64] Add Cortex-A78C Support for Clang and LLVM
This patch upstreams support for the Armv8-a Cortex-A78C
processor for AArch64 and ARM.

In detail:

Adding cortex-a78c as cpu option for aarch64 and arm targets in clang
Adding Cortex-A78C CPU name and ProcessorModel in llvm
Details of the CPU can be found here:
https://www.arm.com/products/silicon-ip-cpu/cortex-a/cortex-a78c
2020-12-29 10:18:59 +00:00
David Penry a9f14cdc62 [ARM] Add bank conflict hazarding
Adds ARMBankConflictHazardRecognizer. This hazard recognizer
looks for a few situations where the same base pointer is used and
then checks whether the offsets lead to a bank conflict. Two
parameters are also added to permit overriding of the target
assumptions:

arm-data-bank-mask=<int> - Mask of bits which are to be checked for
conflicts.  If all these bits are equal in the offsets, there is a
conflict.
arm-assume-itcm-bankconflict=<bool> - Assume that there will be bank
conflicts on any loads to a constant pool.

This hazard recognizer is enabled for Cortex-M7, where the Technical
Reference Manual states that there are two DTCM banks banked using bit
2 and one ITCM bank.

Differential Revision: https://reviews.llvm.org/D93054
2020-12-23 14:00:59 +00:00
Kristof Beyls a4c1f5160e [ARM] Harden indirect calls against SLS
To make sure that no barrier gets placed on the architectural execution
path, each indirect call calling the function in register rN, it gets
transformed to a direct call to __llvm_slsblr_thunk_mode_rN.  mode is
either arm or thumb, depending on the mode of where the indirect call
happens.

The llvm_slsblr_thunk_mode_rN thunk contains:

bx rN
<speculation barrier>

Therefore, the indirect call gets split into 2; one direct call and one
indirect jump.
This transformation results in not inserting a speculation barrier on
the architectural execution path.

The mitigation is off by default and can be enabled by the
harden-sls-blr subtarget feature.

As a linker is allowed to clobber r12 on function calls, the
above code transformation is not correct in case a linker does so.
Similarly, the transformation is not correct when register lr is used.
Avoiding r12/lr being used is done in a follow-on patch to make
reviewing this code easier.

Differential Revision: https://reviews.llvm.org/D92468
2020-12-19 12:33:42 +00:00
Kristof Beyls 195f44278c [ARM] Implement harden-sls-retbr for ARM mode
Some processors may speculatively execute the instructions immediately
following indirect control flow, such as returns, indirect jumps and
indirect function calls.

To avoid a potential miss-speculatively executed gadget after these
instructions leaking secrets through side channels, this pass places a
speculation barrier immediately after every indirect control flow where
control flow doesn't return to the next instruction, such as returns and
indirect jumps, but not indirect function calls.

Hardening of indirect function calls will be done in a later,
independent patch.

This patch is implementing the same functionality as the AArch64 counter
part implemented in https://reviews.llvm.org/D81400.
For AArch64, returns and indirect jumps only occur on RET and BR
instructions and hence the function attribute to control the hardening
is called "harden-sls-retbr" there. On AArch32, there is a much wider
variety of instructions that can trigger an indirect unconditional
control flow change.  I've decided to stick with the name
"harden-sls-retbr" as introduced for the corresponding AArch64
mitigation.

This patch implements this for ARM mode. A future patch will extend this
to also support Thumb mode.

The inserted barriers are never on the correct, architectural execution
path, and therefore performance overhead of this is expected to be low.
To ensure these barriers are never on an architecturally executed path,
when the harden-sls-retbr function attribute is present, indirect
control flow is never conditionalized/predicated.

On targets that implement that Armv8.0-SB Speculation Barrier extension,
a single SB instruction is emitted that acts as a speculation barrier.
On other targets, a DSB SYS followed by a ISB is emitted to act as a
speculation barrier.

These speculation barriers are implemented as pseudo instructions to
avoid later passes to analyze them and potentially remove them.

The mitigation is off by default and can be enabled by the
harden-sls-retbr subtarget feature.

Differential Revision: https://reviews.llvm.org/D92395
2020-12-19 11:42:39 +00:00
Lucas Prates 42b92b31b8 [ARM][AArch64] Adding basic support for the v8.7-A architecture
This introduces support for the v8.7-A architecture through a new
subtarget feature called "v8.7a". It adds two new "WFET" and "WFIT"
instructions, the nXS limited-TLB-maintenance qualifier for DSB and TLBI
instructions, a new CPU id register, ID_AA64ISAR2_EL1, and the new
HCRX_EL2 system register.

Based on patches written by Simon Tatham and Victor Campos.

Reviewed By: ostannard

Differential Revision: https://reviews.llvm.org/D91772
2020-12-17 13:45:08 +00:00
Mark Murray 2b6691894a [ARM][AArch64] Adding Neoverse N2 CPU support
Add support for the Neoverse N2 CPU to the ARM and AArch64 backends.

Differential Revision: https://reviews.llvm.org/D91695
2020-11-25 11:42:54 +00:00
Lucas Prates c2c2cc1360 [ARM][AArch64] Adding Neoverse V1 CPU support
Add support for the Neoverse V1 CPU to the ARM and AArch64 backends.

This is based on patches from Mark Murray and Victor Campos.

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D90765
2020-11-09 13:15:40 +00:00
Craig Topper c7a0b2684f [X86][MC][Target] Initial backend support a tune CPU to support -mtune
This patch implements initial backend support for a -mtune CPU controlled by a "tune-cpu" function attribute. If the attribute is not present X86 will use the resolved CPU from target-cpu attribute or command line.

This patch adds MC layer support a tune CPU. Each CPU now has two sets of features stored in their GenSubtargetInfo.inc tables . These features lists are passed separately to the Processor and ProcessorModel classes in tablegen. The tune list defaults to an empty list to avoid changes to non-X86. This annoyingly increases the size of static tables on all target as we now store 24 more bytes per CPU. I haven't quantified the overall impact, but I can if we're concerned.

One new test is added to X86 to show a few tuning features with mismatched tune-cpu and target-cpu/target-feature attributes to demonstrate independent control. Another new test is added to demonstrate that the scheduler model follows the tune CPU.

I have not added a -mtune to llc/opt or MC layer command line yet. With no attributes we'll just use the -mcpu for both. MC layer tools will always follow the normal CPU for tuning.

Differential Revision: https://reviews.llvm.org/D85165
2020-08-14 15:31:50 -07:00
Luke Geeson 954db63cd1 [ARM] Add Cortex-A78 and Cortex-X1 Support for Clang and LLVM
This patch upstreams support for the Arm-v8 Cortex-A78 and Cortex-X1
processors for AArch64 and ARM.

In detail:
- Adding cortex-a78 and cortex-x1 as cpu options for aarch64 and arm targets in clang
- Adding Cortex-A78 and Cortex-X1 CPU names and ProcessorModels in llvm

details of the CPU can be found here:
https://www.arm.com/products/cortex-x

https://www.arm.com/products/silicon-ip-cpu/cortex-a/cortex-a78

The following people contributed to this patch:
- Luke Geeson
- Mikhail Maltsev

Reviewers: t.p.northover, dmgreen

Reviewed By: dmgreen

Subscribers: dmgreen, kristof.beyls, hiraditya, danielkiss, cfe-commits,
llvm-commits, miyuki

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D83206
2020-07-10 18:24:11 +01:00
Luke Geeson 8bf99f1e6f [ARM] Add Cortex-A77 Support for Clang and LLVM
This patch upstreams support for the Arm-v8 Cortex-A77
processor for AArch64 and ARM.

In detail:
- Adding cortex-a77 as a cpu option for aarch64 and arm targets in clang
- Cortex-A77 CPU name and ProcessorModel in llvm

details of the CPU can be found here:
https://www.arm.com/products/silicon-ip-cpu/cortex-a/cortex-a77

and a similar submission to GCC can be found here:
e0664b7a63

The following people contributed to this patch:
- Luke Geeson
- Mikhail Maltsev

Reviewers: t.p.northover, dmgreen, ostannard, SjoerdMeijer

Reviewed By: dmgreen

Subscribers: dmgreen, kristof.beyls, hiraditya, danielkiss, cfe-commits,
llvm-commits, miyuki

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D82887
2020-07-03 13:00:54 +01:00
Alexandros Lamprineas ecdf48f15b [ARM] Basic bfloat support
This patch adds basic support for BFloat in the Arm backend.
For now the code generation relies on fullfp16 being present.

Briefly:
* adds the bfloat scalar and vector types in the necessary register classes,
* adjusts the calling convention to cope with bfloat argument passing and return,
* adds codegen patterns for moves, loads and stores.

It's tested mostly by the intrinsic patches that depend on it (load/store, convert/copy).

The following people contributed to this patch:

 * Alexandros Lamprineas
 * Ties Stuij

Differential Revision: https://reviews.llvm.org/D81373
2020-06-18 17:26:24 +01:00