Fix prefix emission order to emit REX immediately before the opcode (SDM vol2,
2.1, Figure 2-1). According to SDM vol2 2.2.1, "Other placements are ignored".
This fix has a side effect of outputting segment override prefix in a different
order than previously (benign).
Follow-up to https://reviews.llvm.org/D120592
Reviewed By: skan, craig.topper
Differential Revision: https://reviews.llvm.org/D120871
Print and emit redundant Address-Size override prefix if it's set on the
instruction.
Reviewed By: skan
Differential Revision: https://reviews.llvm.org/D120592
The namespaces of HWREGs is now overlapping with gfx10. Thus the
patch is longer than necessary to just support new names. It also
need to handle proper error messages, i.e. to issue a "specified
hardware register is not supported on this GPU" message.
This may need a major refactoring in the future.
Differential Revision: https://reviews.llvm.org/D121418
It was included in HasV8_0rOps when D88660 first introduced that
architecture definition. In D118045 I moved it out of there and into
ProcessorFeatures.R82, so that -mcpu=cortex-r82 would continue to
behave the same as before but -march=armv8-r would include only the
mandatory parts of the architecture.
In fact, that was a mistake. Firstly, Cortex-R82 _doesn't_ implement
that feature, so it makes no sense to deliberately enable it for that
CPU in particular. But also, it's an extension that only adds system
registers, and we're generally more relaxed about where we enable
those (because kernel developers find it useful to write sysreg-access
instructions after runtime checking, and because sysreg accesses
aren't manufactured during code generation so the risk is small).
So, in line with that usual AArch64 policy, FeatureSpecRestrict ought
to be considered part of 8.0-R for LLVM purposes. So I'm moving it
back into HasV8_0rOps, where it started out.
Reviewed By: lenary
Differential Revision: https://reviews.llvm.org/D120830
Add the Power 10 instruction LXVKQ.
This patch was taken from an original patch by: Yi-Hong Lyu
Reviewed By: lei
Differential Revision: https://reviews.llvm.org/D117507
Refactor the instructions in M68kInstrControl.td to use the VarLenCodeEmitter.
This patch is tested by the existing test cases.
Reviewed By: myhsu, ricky26
Differential Revision: https://reviews.llvm.org/D119665
The LDMX instruction was to be potentially added in P9 but it was never added
in either ISA 3.0 or ISA 3.1. This patch removes that instruction as it is
currently still an invalid instruction.
Reviewed By: lei
Differential Revision: https://reviews.llvm.org/D118074
This patch refactors all the existing M68k arithmetic instructions
to use the new VarLenCodeEmitterGen infrastructure.
This patch is tested by the existing MC test cases.
Note that one of the codegen tests needed to be updated because the
ordering of two equivalent instructions were switched.
Differential Revision: https://reviews.llvm.org/D115234
Add support for the 'pause' hint instruction as an alias for
'fence w, 0'. To do this allow the 'fence' operands pred and succ
to be set to 0 (the empty set). This will also allow future hints
to be encoded as 'fence 0, <x>' and 'fence <x>, 0'.
This patch revised from @mundaym's D93019.
Reviewed By: asb
Differential Revision: https://reviews.llvm.org/D117789
The following SubtargetFeatures are removed from the definition of
HasV8_0rOps, on the grounds that they are optional in Armv8.4-A, and
therefore (by the definition of Armv8.0-R) also optional in v8.0-R:
* performance monitoring: FeaturePerfMon
* cryptography: FeatureSM4 and FeatureSHA3
* half-precision FP: FeatureFullFP16, FeatureFP16FML
* speculation control: FeatureSSBS, FeaturePredRes, FeatureSB,
FeatureSpecRestrict
This isn't the full set of features that are listed as optional in the
spec. FeatureCCIDX and FeatureTRACEV8_4 are also optional. But LLVM
includes those in HasV8_3aOps and HasV8_4aOps respectively (I think on
the grounds that the system registers they enable are useful to be
able to access after a runtime check), and so for consistency, I've
left those in HasV8_0rOps too.
After this commit, HasV8_0rOps is a strict subset of HasV8_4aOps (but
missing features that are not in Armv8.0-R at all).
The definition of Cortex-R82 is correspondingly updated to add most of
the features that I've removed from base Armv8.0-R (with the exception
of the cryptography ones), since that particular implementation of
v8.0-R does have them.
Reviewed By: dmgreen
Differential Revision: https://reviews.llvm.org/D118045
The Armv8-R.64 architecture defines numbered MPU region registers with
indices 1-15, not 0-15. So there's no such register as PRBAR0_EL2 or
PRLAR0_EL1 (for example). The encodings that they would occupy are
used for the unnumbered PRBAR_ELn and PRLAR_ELn registers.
Reviewed By: labrinea
Differential Revision: https://reviews.llvm.org/D117755
This family of instructions includes CPYF (copy forward), CPYB (copy
backward), SET (memset) and SETG (memset + initialise MTE tags), with
some sub-variants to indicate whether address translation is done in a
privileged or unprivileged way. For the copy instructions, you can
separately specify the read and write translations (so that kernels
can safely use these instructions in syscall handlers, to memcpy
between the calling process's user-space memory map and the kernel's
own privileged one).
The unusual thing about these instructions is that they write back to
multiple registers, because they perform an implementation-defined
amount of copying each time they run, and write back to _all_ the
address and size registers to indicate how much remains to be done
(and the code is expected to loop on them until the size register
becomes zero). But this is no problem in LLVM - you just define each
instruction to have multiple outputs, multiple inputs, and a set of
constraints tying their register numbers together appropriately.
This commit introduces a special subtarget feature called MOPS (after
the name the spec gives to the CPU id field), which is a dependency of
the top-level 8.8-A feature, and uses that to enable most of the new
instructions. The SETMG instructions also depend on MTE (and the test
checks that).
Differential Revision: https://reviews.llvm.org/D116157
This patch implements PAC return address signing for armv8-m. This patch roughly
accomplishes the following things:
- PAC and AUT instructions are generated.
- They're part of the stack frame setup, so that shrink-wrapping can move them
inwards to cover only part of a function
- The auth code generated by PAC is saved across subroutine calls so that AUT
can find it again to check
- PAC is emitted before stacking registers (so that the SP it signs is the one
on function entry).
- The new pseudo-register ra_auth_code is mentioned in the DWARF frame data
- With CMSE also in use: PAC is emitted before stacking FPCXTNS, and AUT
validates the corresponding value of SP
- Emit correct unwind information when PAC is replaced by PACBTI
- Handle tail calls correctly
Some notes:
We make the assembler accept the `.save {ra_auth_code}` directive that is
emitted by the compiler when it saves a register that contains a
return address authentication code.
For EHABI we need to have the `FrameSetup` flag on the instruction and
handle the `t2PACBTI` opcode (identically to `t2PAC`), so we can emit
`.save {ra_auth_code}`, instead of `.save {r12}`.
For PACBTI-M, the instruction which computes return address PAC should use SP
value before adjustment for the argument registers save are (used for variadic
functions and when a parameter is is split between stack and register), but at
the same it should be after the instruction that saves FPCXT when compiling a
CMSE entry function.
This patch moves the varargs SP adjustment after the FPCXT save (they are never
enabled at the same time), so in a following patch handling of the `PAC`
instruction can be placed between them.
Epilogue emission code adjusted in a similar manner.
PACBTI-M code generation should not emit any instructions for architectures
v6-m, v8-m.base, and for A- and R-class cores. Diagnostic message for such cases
is handled separately by a future ticket.
note on tail calls:
If the called function has four arguments that occupy registers `r0`-`r3`, the
only option for holding the function pointer itself is `r12`, but this register
is used to keep the PAC during function/prologue epilogue and clobbers the
function pointer.
When we do the tail call we need the five registers (`r0`-`r3` and `r12`) to
keep six values - the four function arguments, the function pointer and the PAC,
which is obviously impossible.
One option would be to authenticate the return address before all callee-saved
registers are restored, so we have a scratch register to temporarily keep the
value of `r12`. The issue with this approach is that it violates a fundamental
invariant that PAC is computed using CFA as a modifier. It would also mean using
separate instructions to pop `lr` and the rest of the callee-saved registers,
which would offset the advantages of doing a tail call.
Instead, this patch disables indirect tail calls when the called function take
four or more arguments and the return address sign and authentication is enabled
for the caller function, conservatively assuming the caller function would spill
LR.
This patch is part of a series that adds support for the PACBTI-M extension of
the Armv8.1-M architecture, as detailed here:
https://community.arm.com/arm-community-blogs/b/architectures-and-processors-blog/posts/armv8-1-m-pointer-authentication-and-branch-target-identification-extension
The PACBTI-M specification can be found in the Armv8-M Architecture Reference
Manual:
https://developer.arm.com/documentation/ddi0553/latest
The following people contributed to this patch:
- Momchil Velikov
- Ties Stuij
Reviewed By: danielkiss
Differential Revision: https://reviews.llvm.org/D112429
This change introduces subtarget features to predicate certain
instructions and system registers that are available only on
'A' profile targets. Those features are not present when
targeting a generic CPU, which is the default processor.
In other words the generic CPU now means the intersection of
'A' and 'R' profiles. To maintain backwards compatibility we
enable the features that correspond to -march=armv8-a when the
architecture is not explicitly specified on the command line.
References: https://developer.arm.com/documentation/ddi0600/latest
Differential Revision: https://reviews.llvm.org/D110065
These instructions should allow src0 to be a literal with the same
value as the mandatory other literal. Enable it by introducing an
operand that defers adding its value to the MI when decoding till
the mandatory literal is parsed.
Reviewed By: dp, foad
Differential Revision: https://reviews.llvm.org/D111067
Change-Id: I22b0ae0d35bad17b6f976808e48bffe9a6af70b7
FeaturePMU was created in AArch64 to accommodate one missing system
register, PMMIR_EL1, in commit ffcd7698ae.
However, the Performance Monitors extension already had a target
feature, which is called FeaturePerfMon. Therefore, FeaturePMU is
redundant.
This patch removes FeaturePMU and merges its contents into
FeaturePerfMon.
Reviewed By: dnsampaio
Differential Revision: https://reviews.llvm.org/D109246
Suffix opcodes with _gfx10.
Remove direct references to architecture specific opcodes.
Add a BVH flag and apply this to diassembly.
Fix a number of disassembly errors on gfx90a target caused by
previous incorrect BVH detection code.
Reviewed By: rampitec
Differential Revision: https://reviews.llvm.org/D108117
This enables printing of the mnemonics that contain the predicate
in the Intel printer. This requires accounting for the memory size
that is explicitly printed in Intel syntax. Those changes have been
synced to the ATT printer as well.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D108093