Fixes https://github.com/llvm/llvm-project/issues/56484
H registers are 16 bit views of AArch64's Neon registers and
B are the 8 bit views.
msvc does not support 16 bit float (some mention in DirectX but I
couldn't find a way to get to it) so for lack of a better reference
I'm using:
85c9b41b33/server/references/dia/include/cvconst.h
(the other microsoft-pdb repo is no longer up to date)
Luckily clang does support fp16 so a test is added for that.
There is no 8 bit float type so I had to get creative with the
test case. We're not testing for correct debug info here just
that we can select the B register and not crash in the process.
For FPCR it's never going to be passed as an argument so I've
not added a test for it. It is included to keep our list looking
the same as the reference.
Reviewed By: majnemer
Differential Revision: https://reviews.llvm.org/D129774
D25618 added a method to verify the instruction predicates for an
emitted instruction, through verifyInstructionPredicates added into
<Target>MCCodeEmitter::encodeInstruction. This is a very useful idea,
but the implementation inside MCCodeEmitter made it only fire for object
files, not assembly which most of the llvm test suite uses.
This patch moves the code into the <Target>_MC::verifyInstructionPredicates
method, inside the InstrInfo. The allows it to be called from other
places, such as in this patch where it is called from the
<Target>AsmPrinter::emitInstruction methods which should trigger for
both assembly and object files. It can also be called from other places
such as verifyInstruction, but that is not done here (it tends to catch
errors earlier, but in reality just shows all the mir tests that have
incorrect feature predicates). The interface was also simplified
slightly, moving computeAvailableFeatures into the function so that it
does not need to be called externally.
The ARM, AMDGPU (but not R600), AVR, Mips and X86 backends all currently
show errors in the test-suite, so have been disabled with FIXME
comments.
Recommitted with some fixes for the leftover MCII variables in release
builds.
Differential Revision: https://reviews.llvm.org/D129506
This reverts commit e2fb8c0f4b as it does
not build for Release builds, and some buildbots are giving more warning
than I saw locally. Reverting to fix those issues.
D25618 added a method to verify the instruction predicates for an
emitted instruction, through verifyInstructionPredicates added into
<Target>MCCodeEmitter::encodeInstruction. This is a very useful idea,
but the implementation inside MCCodeEmitter made it only fire for object
files, not assembly which most of the llvm test suite uses.
This patch moves the code into the <Target>_MC::verifyInstructionPredicates
method, inside the InstrInfo. The allows it to be called from other
places, such as in this patch where it is called from the
<Target>AsmPrinter::emitInstruction methods which should trigger for
both assembly and object files. It can also be called from other places
such as verifyInstruction, but that is not done here (it tends to catch
errors earlier, but in reality just shows all the mir tests that have
incorrect feature predicates). The interface was also simplified
slightly, moving computeAvailableFeatures into the function so that it
does not need to be called externally.
The ARM, AMDGPU (but not R600), AVR, Mips and X86 backends all currently
show errors in the test-suite, so have been disabled with FIXME
comments.
Differential Revision: https://reviews.llvm.org/D129506
We can remove the MatrixZADRegisterTable table of tile registers and
just calculate the register index directly.
Differential Revision: https://reviews.llvm.org/D127757
For ARM SEH, the epilogs will need a little more associated data than
just the plain list of opcodes.
This is a preparatory refactoring for D125645.
Differential Revision: https://reviews.llvm.org/D125879
This patch simplifies the switch statement in getInstruction to add
implicit operands (register ZA and Immediate equal to zero)
in the SME operands when disassembly.
The register ZA and the zero immediate can be added by checking the operand
in MCInstDesc.
Differential Revision: https://reviews.llvm.org/D125534
Big endian Nops were being generated as d5 03 20 1f fnmadd s21, s30,
s0, s0, getting the bytes of the NOP in the wrong order. This switches
the bytes to not be dependant on the endianness.
Differential Revision: https://reviews.llvm.org/D125980
CodeGen will only produce fixed formwat prologues, but hand-written assembly
can have .cfi directives in any combination they want. This should cause a
fallback to DWARF rather than an assertion failure (or an incorrect compact
unwind if assertions are disabled).
These labels aren't needed in the ARM version of WinEH tables, as each
unwind opcode maps to a specific instruction (each opcode is assumed
to represent one instruction), and the written tables don't contain
offsets like on x86_64.
Differential Revision: https://reviews.llvm.org/D125369
Since D110065, the 'R' profile support is added to LLVM. It turns the
`generic` cpu into the intersection of v8-a and v8-r. However, this
makes some backward compatibility problems. The original patch makes
the clang driver implicitly pass -march=armv8-a when only the triple
is specified. Since it only applies to clang, other tools like
llvm-objdump still faces the backward compatibility problem.
This patch applies the same idea to MC related tools by enabling '+v8a'
feature when nothing is specified (both CPU and FS are empty) for
MCSubtargetInfo creation.
This patch should fix PR53956.
Reviewed by: labrinea
Differential Revision: https://reviews.llvm.org/D124319
Asynchronous exception support for the prologue means that there can be
multiple .cfi_def_cfa_offset instructions in a single function, which tripped
up an assertion in the compact unwind generator.
In reality the compact unwind format is far too restrictive to represent
asynchronous frames so if we ever wanted that on Darwin we'd fall back to DWARF
(possibly keeping compact unwind around for synchronous users). So the compact
format should continue to represent the synchronous situation, and the
assertion can be removed.
glibc sysdeps/aarch64/tst-vpcs-mod.S has something like:
```
.variant_pcs vpcs_call
.global vpcs_call
```
This is supported by GNU as but leads to an error in MC. Use getOrCreateSymbol
to support a not-yet-registered symbol: call `registerSymbol` to ensure the
symbol exists even if there is no binding directive/label, to match GNU as.
While here, improve tests to check (1) a local symbol can get
STO_AARCH64_VARIANT_PCS (2) undefined .variant_pcs (3) an alias does not
inherit STO_AARCH64_VARIANT_PCS.
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D122507
Instead of asserting, fallback to emitting DWARF unwind info when an
attempt is made to output compact unwind info for a function with
multiple adjustments to the CFA offset.
Multiple adjustments of SP are common and with instruction precise
unwind tables these may translate into multiple `.cfi_def_cfa_offset`
directives.
Fixes https://bugs.chromium.org/p/chromium/issues/detail?id=1302998
Reviewed By: dmgreen
Differential Revision: https://reviews.llvm.org/D121017
In Streaming SVE mode full NEON is not available, even though this is
implied from armv8-a. LLVM previously inferred that NEON needed to be
disabled when setting +streaming-sve, but there is no need to infer
this from +streaming-sve, because we can explicitly disable NEON using
LLVM's attribute mechanism. This is specifically relevant because
+streaming-sve is not a user-facing feature, but rather an LLVM internal
feature.
Reviewed By: paulwalker-arm
Differential Revision: https://reviews.llvm.org/D120809
Current implementation of Check[HSDQ]Form predicates doesn’t handle virtual registers and therefore isn’t useful for pre-RA scheduling. Patch fixes this implementing two function predicates: CheckQForm for checking that instruction writes 128-bit NEON register and CheckFpOrNEON which checks that instruction writes FP register (any width). The latter supersedes Check[HSD]Form predicates which are not used individually.
OS Laboratory. Huawei Russian Research Institute. Saint-Petersburg
Reviewed By: dmgreen
Differential Revision: https://reviews.llvm.org/D114642
There's a few relevant forward declarations in there that may require downstream
adding explicit includes:
llvm/MC/MCContext.h no longer includes llvm/BinaryFormat/ELF.h, llvm/MC/MCSubtargetInfo.h, llvm/MC/MCTargetOptions.h
llvm/MC/MCObjectStreamer.h no longer include llvm/MC/MCAssembler.h
llvm/MC/MCAssembler.h no longer includes llvm/MC/MCFixup.h, llvm/MC/MCFragment.h
Counting preprocessed lines required to rebuild llvm-project on my setup:
before: 1052436830
after: 1049293745
Which is significant and backs up the change in addition to the usual benefits of
decreasing coupling between headers and compilation units.
Discourse thread: https://discourse.llvm.org/t/include-what-you-use-include-cleanup
Differential Revision: https://reviews.llvm.org/D119244
This reverts commit ef82063207.
- It conflicts with the existing llvm::size in STLExtras, which will now
never be called.
- Calling it without llvm:: breaks C++17 compat
This change introduces subtarget features to predicate certain
instructions and system registers that are available only on
'A' profile targets. Those features are not present when
targeting a generic CPU, which is the default processor.
In other words the generic CPU now means the intersection of
'A' and 'R' profiles. To maintain backwards compatibility we
enable the features that correspond to -march=armv8-a when the
architecture is not explicitly specified on the command line.
References: https://developer.arm.com/documentation/ddi0600/latest
Differential Revision: https://reviews.llvm.org/D110065
This moves the registry higher in the LLVM library dependency stack.
Every client of the target registry needs to link against MC anyway to
actually use the target, so we might as well move this out of Support.
This allows us to ensure that Support doesn't have includes from MC/*.
Differential Revision: https://reviews.llvm.org/D111454
Changes in architecture revision 00eac1:
* Tile slice index offset no longer prefixed with '#'.
* The syntax for 128-bit (.Q) ZA tile slice accesses must now include
an explicit zero index.
The reference can be found here:
https://developer.arm.com/documentation/ddi0602/2021-09
Reviewed By: david-arm
Differential Revision: https://reviews.llvm.org/D111212
Note that createAArch64ObjectTargetStreamer is declared in
AArch64TargetStreamer.h and defined in AArch64TargetStreamer.cpp.
Identified with readability-redundant-declaration.
This reverts the revert commit f85d8a5bed
with bug fixes.
Original message:
MOVi32imm + ANDWrr ==> ANDWri + ANDWri
MOVi64imm + ANDXrr ==> ANDXri + ANDXri
The mov pseudo instruction could be expanded to multiple mov instructions later.
In this case, try to split the constant operand of mov instruction into two
bitmask immediates. It makes only two AND instructions intead of multiple
mov + and instructions.
Added a peephole optimization pass on MIR level to implement it.
Differential Revision: https://reviews.llvm.org/D109963
MOVi32imm + ANDWrr ==> ANDWri + ANDWri
MOVi64imm + ANDXrr ==> ANDXri + ANDXri
The mov pseudo instruction could be expanded to multiple mov instructions later.
In this case, try to split the constant operand of mov instruction into two
bitmask immediates. It makes only two AND instructions intead of multiple
mov + and instructions.
Added a peephole optimization pass on MIR level to implement it.
Differential Revision: https://reviews.llvm.org/D109963
On some architectures such as Arm and X86 the encoding for a nop may
change depending on the subtarget in operation at the time of
encoding. This change replaces the per module MCSubtargetInfo retained
by the targets AsmBackend in favour of passing through the local
MCSubtargetInfo in operation at the time.
On Arm using the architectural NOP instruction can have a performance
benefit on some implementations.
For Arm I've deleted the copy of the AsmBackend's MCSubtargetInfo to
limit the chances of this causing problems in the future. I've not
done this for other targets such as X86 as there is more frequent use
of the MCSubtargetInfo and it looks to be for stable properties that
we would not expect to vary per function.
This change required threading STI through MCNopsFragment and
MCBoundaryAlignFragment.
I've attempted to take into account the in tree experimental backends.
Differential Revision: https://reviews.llvm.org/D45962
In streaming mode most of the NEON instruction set is illegal, disable
NEON when compiling with `+streaming-sve`, unless NEON is explictly
requested.
Subsequent patches will add support for the small subset of NEON
instructions that are legal in streaming mode.
Reviewed By: paulwalker-arm, david-arm
Differential Revision: https://reviews.llvm.org/D107902
Previously we would emit constant pool entries for ldr inline asm at the
very end of AsmPrinter::doFinalization(). However, if we're emitting
dwarf aranges, that would end all sections with aranges. Then if we have
constant pool entries to be emitted in those same sections, we'd hit an
assert that the section has already been ended.
We want to emit constant pool entries before emitting dwarf aranges.
This patch splits out arm32/64's constant pool entry emission into its
own MCTargetStreamer virtual method.
Fixes PR51208
Reviewed By: MaskRay
Differential Revision: https://reviews.llvm.org/D107314
Add a comment when there is a shifted value,
add x9, x0, #291, lsl #12 ; =1191936
but not when the immediate value is unshifted,
subs x9, x0, #256 ; =256
when the comment adds nothing additional to the reader.
Differential Revision: https://reviews.llvm.org/D107196
This patch adds the zero instruction for zeroing a list of 64-bit
element ZA tiles. The instruction takes a list of up to eight tiles
ZA0.D-ZA7.D, which must be in order, e.g.
zero {za0.d,za1.d,za2.d,za3.d,za4.d,za5.d,za6.d,za7.d}
zero {za1.d,za3.d,za5.d,za7.d}
The assembler also accepts 32-bit, 16-bit and 8-bit element tiles which
are mapped to corresponding 64-bit element tiles in accordance with the
architecturally defined mapping between different element size tiles,
e.g.
* Zeroing ZA0.B, or the entire array name ZA, is equivalent to zeroing
all eight 64-bit element tiles ZA0.D to ZA7.D.
* Zeroing ZA0.S is equivalent to zeroing ZA0.D and ZA4.D.
The preferred disassembly of this instruction uses the shortest list of
tile names that represent the encoded immediate mask, e.g.
* An immediate which encodes 64-bit element tiles ZA0.D, ZA1.D, ZA4.D and
ZA5.D is disassembled as {ZA0.S, ZA1.S}.
* An immediate which encodes 64-bit element tiles ZA0.D, ZA2.D, ZA4.D and
ZA6.D is disassembled as {ZA0.H}.
* An all-ones immediate is disassembled as {ZA}.
* An all-zeros immediate is disassembled as an empty list {}.
This patch adds the MatrixTileList asm operand and related parsing to support
this.
Depends on D105570.
The reference can be found here:
https://developer.arm.com/documentation/ddi0602/2021-06
Reviewed By: david-arm
Differential Revision: https://reviews.llvm.org/D105575
This patch adds the new system registers introduced in SME:
- ID_AA64SMFR0_EL1 (ro) SME feature identifier.
- SMCR_ELx (r/w) streaming mode control register for configuring
effective SVE Streaming SVE Vector length when the PE is in
Streaming SVE mode.
- SVCR (r/w) streaming vector control register, visible at all
exception levels. Provides access to PSTATE.SM and PSTATE.ZA
using MSR and MRS instructions.
- SMPRI_EL1 (r/w) streaming mode execution priority register.
- SMPRIMAP_EL2 (r/w) streaming mode priority mapping register.
- SMIDR_EL1 (ro) streaming mode identification register.
- TPIDR2_EL0 (r/w) for use by SME software to manage per-thread
SME context.
- MPAMSM_EL1 (r/w) MPAM (v8.4) streaming mode register, for
labelling memory accesses performed in streaming mode.
Also added in this patch are the SME mode change instructions.
Three MSR immediate instructions are implemented to set or clear
PSTATE.SM, PSTATE.ZA, or both respectively:
- MSR SVCRSM, #<imm1>
- MSR SVCRZA, #<imm1>
- MSR SVCRSMZA, #<imm1>
The following smstart/smstop aliases are also implemented for
convenience:
smstart -> MSR SVCRSMZA, #1
smstart sm -> MSR SVCRSM, #1
smstart za -> MSR SVCRZA, #1
smstop -> MSR SVCRSMZA, #0
smstop sm -> MSR SVCRSM, #0
smstop za -> MSR SVCRZA, #0
The reference can be found here:
https://developer.arm.com/documentation/ddi0602/2021-06
Reviewed By: david-arm
Differential Revision: https://reviews.llvm.org/D105576
This patch adds support for following contiguous load and store
instructions:
* LD1B, LD1H, LD1W, LD1D, LD1Q
* ST1B, ST1H, ST1W, ST1D, ST1Q
A new register class and operand is added for the 32-bit vector select
register W12-W15. The differences in the following tests which have been
re-generated are caused by the introduction of this register class:
* llvm/test/CodeGen/AArch64/GlobalISel/irtranslator-inline-asm.ll
* llvm/test/CodeGen/AArch64/GlobalISel/regbank-inlineasm.mir
* llvm/test/CodeGen/AArch64/stp-opt-with-renaming-reserved-regs.mir
* llvm/test/CodeGen/AArch64/stp-opt-with-renaming.mir
D88663 attempts to resolve the issue with the store pair test
differences in the AArch64 load/store optimizer.
The GlobalISel differences are caused by changes in the enum values of
register classes, tests have been updated with the new values.
The reference can be found here:
https://developer.arm.com/documentation/ddi0602/2021-06
Reviewed By: CarolineConcatto
Differential Revision: https://reviews.llvm.org/D105572
SME introduces the ZA array, a new piece of architectural register state
consisting of a matrix of [SVLb x SVLb] bytes, where SVL is the
implementation defined Streaming SVE vector length and SVLb is the
number of 8-bit elements in a vector of SVL bits.
SME instructions consist of three types of matrix operands:
* Tiles: a ZA tile is a square, two-dimensional sub-array of elements
within the ZA array. These tiles make up the larger accumulator array
and the granularity varies based on the element size, i.e.
- ZAQ0..ZAQ15 (smallest tile granule)
- ZAD0..ZAD7
- ZAS0..ZAS3
- ZAH0..ZAH1
or ZAB0 (largest tile granule, single tile)
* Tile vectors: similar to regular tiles, but have an extra 'h' or 'v'
to tell how the vector at [reg+offset] is layed out in the tile,
horizontally or vertically. E.g. za1h.h or za15v.q, which corresponds
to vectors in registers ZAH1 and ZAQ15, respectively.
* Accumulator matrix: this is the entire accumulator array ZA.
This patch adds the register classes and related operands and parsing
for SME instructions operating on the accumulator array.
The ADDHA and ADDVA instructions which operate on tiles are also added
in this patch to make some use of the code added, later patches will
make use of the other operands introduced here.
The reference can be found here:
https://developer.arm.com/documentation/ddi0602/2021-06
Co-authored by: Sander de Smalen (@sdesmalen)
Reviewed By: david-arm
Differential Revision: https://reviews.llvm.org/D105570