Commit Graph

1995 Commits

Author SHA1 Message Date
Joe Nash 729467acef [AMDGPU] gfx11 LDSDIR instructions MC support
Contributors:
Carl Ritson <carl.ritson@amd.com>

Patch 8/N for upstreaming of AMDGPU gfx11 architecture.

Depends on D125498

Reviewed By: critson, rampitec, #amdgpu

Differential Revision: https://reviews.llvm.org/D125820
2022-05-19 10:08:47 -04:00
Sheng a5d618b393 [M68k][Disassembler] Fix decoding conflict
This diff fixes decoding conflict between these pair of instructions:

ADD(16|32)dd / ADD(16|32)dr
SUB(16|32)dd / SUB(16|32)dr
AND(16|32)dd / AND(16|32)dr
OR(16|32)dd  / OR(16|32)dr

Reviewed By: ricky26

Differential Revision: https://reviews.llvm.org/D125861
2022-05-19 09:10:50 +08:00
Dmitry Preobrazhensky 32ca9bd7b5 [AMDGPU][MC][GFX940] Correct tied operand decoding for smfmac opcodes
Differential Revision: https://reviews.llvm.org/D125790
2022-05-18 15:39:30 +03:00
Ivan Kosarev 140ad30b24 [AMDGPU][MC][GFX10] Add missing s_scratch_load tests.
Completes
https://reviews.llvm.org/D125117

Reviewed By: dp, arsenm

Differential Revision: https://reviews.llvm.org/D125753
2022-05-18 11:11:10 +01:00
Stanislav Mekhanoshin a09af86693 [AMDGPU] Enable FLAT LDS DMA on gfx9/10 before gfx940
We always had global and scratch loads to LDS in the gfx9,
but did not handle it. These were available via the 'lds'
encoding bit. In gfx940 this bit was reused as 'svs' which
resulted in new '_lds' opcodes effectively pushing this
bit into the opcode, but functionally it is the same. These
instructions are also available on gfx10.

Differential Revision: https://reviews.llvm.org/D125126
2022-05-17 12:16:37 -07:00
Joe Nash d21b9b4946 [AMDGPU] gfx11 scalar alu instructions
MC layer support for SOP(scalar alu operations) including encoding
support for s_delay_alu and s_sendmsg_rtn.

Contributors:
Jay Foad <jay.foad@amd.com>

Patch 7/N for upstreaming of AMDGPU gfx11 architecture.

Depends on D125319

Reviewed By: #amdgpu, arsenm

Differential Revision: https://reviews.llvm.org/D125498
2022-05-17 13:35:41 -04:00
Joe Nash c70259405c [AMDGPU] gfx11 BUF Instructions
Includes MachineCode layer support and tests, and MIR tests not requiring
CodeGen pass changes.
Includes a small change in SMInstructions.td to correct encoded bits.

Contributors:
Petar Avramovic <Petar.Avramovic@amd.com>
Dmitry Preobrazhensky <dmitry.preobrazhensky@amd.com>

Depends on D125316

Patch 6/N for upstreaming of AMDGPU gfx11 architecture.

Reviewed By: dp, Petar.Avramovic

Differential Revision: https://reviews.llvm.org/D125319
2022-05-16 09:41:40 -04:00
Sheng cf0b6df6db [M68k][Disassembler] Adopt the new variable length decoder
This is an example usage of D120958.

After these patches are landed, we can strip off the codebeads officially.

Reviewed By: myhsu

Differential Revision: https://reviews.llvm.org/D120960
2022-05-15 08:44:58 +08:00
Ivan Kosarev cb67b2ccc4 [AMDGPU][GFX10] Support base+soffset+offset SMEM stores.
Also makes another step towards resolving
https://github.com/llvm/llvm-project/issues/38652

Reviewed By: foad, dp

Differential Revision: https://reviews.llvm.org/D125380
2022-05-12 08:48:05 +01:00
Ivan Kosarev 88f04bdbd8 [AMDGPU][GFX10] Support base+soffset+offset SMEM loads.
Also makes a step towards resolving
https://github.com/llvm/llvm-project/issues/38652

Reviewed By: foad, dp

Differential Revision: https://reviews.llvm.org/D125117
2022-05-10 16:17:14 +01:00
Simon Pilgrim c0840799e3 [MC][X86] Add vcmpps disassembler tests for Issue #41491
We were missing coverage for vcmpps imm, vreg, vreg, mreg {mreg} patterns
2022-05-06 15:39:17 +01:00
Philipp Tomsich 64816e68f4 [AArch64] Support for Ampere1 core
Add support for the Ampere Computing Ampere1 core.
Ampere1 implements the AArch64 state and is compatible with ARMv8.6-A.

Differential Revision: https://reviews.llvm.org/D117112
2022-05-03 15:54:02 +01:00
CHIANG, YU-HSUN (Tommy Chiang, oToToT) 4a31af88a2 [MC][AArch64] Enable '+v8a' when nothing specified for MCSubtargetInfo
Since D110065, the 'R' profile support is added to LLVM. It turns the
`generic` cpu into the intersection of v8-a and v8-r. However, this
makes some backward compatibility problems. The original patch makes
the clang driver implicitly pass -march=armv8-a when only the triple
is specified. Since it only applies to clang, other tools like
llvm-objdump still faces the backward compatibility problem.

This patch applies the same idea to MC related tools by enabling '+v8a'
feature when nothing is specified (both CPU and FS are empty) for
MCSubtargetInfo creation.

This patch should fix PR53956.

Reviewed by: labrinea

Differential Revision: https://reviews.llvm.org/D124319
2022-04-29 04:53:22 +08:00
Stanislav Mekhanoshin 00d84a9f92 [AMDGPU] Remove vdata from buffer to lds load
Differential Revision: https://reviews.llvm.org/D124485
2022-04-26 17:16:26 -07:00
Ulrich Weigand 1283ccb610 Support z16 processor name
The recently announced IBM z16 processor implements the architecture
already supported as "arch14" in LLVM.  This patch adds support for
"z16" as an alternate architecture name for arch14.
2022-04-21 19:58:22 +02:00
Dmitry Preobrazhensky b4231ac4be [AMDGPU][GFX90A+] Disabled ds_ordered_count and exp
Differential Revision: https://reviews.llvm.org/D124087
2022-04-21 13:16:44 +03:00
Dmitry Preobrazhensky ab18e1a533 [AMDGPU][GFX10] Enabled op_sel for v_add_nc_u16 and v_sub_nc_u16
Differential Revision: https://reviews.llvm.org/D123594
2022-04-13 13:48:42 +03:00
Shengchen Kan fcade8e91e [X86][test] Add encoding/decoding tests for VEX instruction w/ address-size prefix
This patch also contains a regression test for D122448

Reviewed By: hvdijk, RKSimon

Differential Revision: https://reviews.llvm.org/D122449
2022-04-13 12:50:25 +08:00
Dmitry Preobrazhensky 1f6aa90386 [AMDGPU][MC][GFX10] Added syntactic sugar for s_waitcnt_depctr operand
Added the following helpers:

    depctr_hold_cnt(...)
    depctr_sa_sdst(...)
    depctr_va_vdst(...)
    depctr_va_sdst(...)
    depctr_va_ssrc(...)
    depctr_va_vcc(...)
    depctr_vm_vsrc(...)

Differential Revision: https://reviews.llvm.org/D123022
2022-04-07 17:03:44 +03:00
Simon Tatham 82bd0bd24f [AArch64] Make PMMIR_EL1 read-only.
The Arm architecture reference manual (ARM DDI 0487H.a section
D13.5.12) lists every field in the register as RO, and does not list
an MSR instruction that writes it. So we should be defining it as an
ROSysReg, not an RWSysReg.

Reviewed By: vhscampos

Differential Revision: https://reviews.llvm.org/D123111
2022-04-05 11:09:56 +01:00
Min-Yih Hsu 18b38ff6c7 [M68k] Adopt VarLenCodeEmitter for move instructions
The `move` instruction has one of the most complicate sets of variants, so
we're refactoring it first before finishing up rest of the data
instructions in a separate patch.

Note that since we're introducing more `move` variants, the codegen
actually got improved in terms of code size.
2022-04-04 23:02:27 -07:00
Min-Yih Hsu fccdc5618d [M68k] Adopt VarLenCodeEmitter for shift / rotate instructions
This patch is covered by existing MC tests.
2022-04-03 22:52:32 -07:00
Stefan Pintilie 2e55bc9f3c [PowerPC] Set the special DSCR with a compiler option.
Add a compiler option and the instructions required to set the
special Data Stream Control Register (DSCR). The special register will
not be set by default.

Original patch by: Muhammad Usman

Reviewed By: nemanjai, #powerpc

Differential Revision: https://reviews.llvm.org/D117013
2022-03-31 14:06:30 -05:00
Simon Pilgrim 4a33b9ece0 [MC][X86] Ensure all opcode tests are sorted by instruction name
Noticed while reviewing D122449
2022-03-30 11:08:11 +01:00
Stanislav Mekhanoshin 6e3e14f600 [AMDGPU] Support gfx940 smfmac instructions
Differential Revision: https://reviews.llvm.org/D122191
2022-03-24 12:40:42 -07:00
Stanislav Mekhanoshin 27439a7642 [AMDGPU] New gfx940 mfma instructions
Differential Revision: https://reviews.llvm.org/D122044
2022-03-24 12:12:52 -07:00
Stanislav Mekhanoshin 72c1a0d9c2 [AMDGPU] Allow v_accvgpr_write to use SGPR on gfx90a
This is undocumented, but it should work.

Differential Revision: https://reviews.llvm.org/D122252
2022-03-22 13:52:29 -07:00
Stanislav Mekhanoshin d9ac55fab2 [AMDGPU] New MFMA names for existing instructions
Old names are supported as aliases.
_1k MFMA got new opcodes.

Differential Revision: https://reviews.llvm.org/D121741
2022-03-17 13:05:36 -07:00
Stanislav Mekhanoshin 522b259976 [AMDGPU] Allow v_accvgpr_write to use SGPR src on gfx940
Differential Revision: https://reviews.llvm.org/D121843
2022-03-17 12:12:06 -07:00
Amir Ayupov 2c4e38fa6f [X86] Emit REX prefix immediately before the opcode
Fix prefix emission order to emit REX immediately before the opcode (SDM vol2,
2.1, Figure 2-1). According to SDM vol2 2.2.1, "Other placements are ignored".

This fix has a side effect of outputting segment override prefix in a different
order than previously (benign).

Follow-up to https://reviews.llvm.org/D120592

Reviewed By: skan, craig.topper

Differential Revision: https://reviews.llvm.org/D120871
2022-03-16 08:30:31 -07:00
Amir Ayupov 1d3719820f [X86] Preserve redundant Address-Size override prefix
Print and emit redundant Address-Size override prefix if it's set on the
instruction.

Reviewed By: skan

Differential Revision: https://reviews.llvm.org/D120592
2022-03-16 08:30:29 -07:00
Stanislav Mekhanoshin 8dd3d1cf1f [AMDGPU] Add symbolic names for gfx940 HWREGs
The namespaces of HWREGs is now overlapping with gfx10. Thus the
patch is longer than necessary to just support new names. It also
need to handle proper error messages, i.e. to issue a "specified
hardware register is not supported on this GPU" message.

This may need a major refactoring in the future.

Differential Revision: https://reviews.llvm.org/D121418
2022-03-14 16:13:33 -07:00
Stanislav Mekhanoshin 23499103f7 [AMDGPU] Support for gfx940 flat lds opcodes
Differential Revision: https://reviews.llvm.org/D121414
2022-03-14 15:46:19 -07:00
Stanislav Mekhanoshin 1f53f20fc1 [AMDGPU] Support gfx940 v_lshl_add_u64 instruction
Differential Revision: https://reviews.llvm.org/D121401
2022-03-14 15:45:42 -07:00
Stanislav Mekhanoshin 36fe3f13a9 [AMDGPU] flat scratch SVS addressing mode for gfx940
Both VADDR and SADDR are used in SVS mode.

Differential Revision: https://reviews.llvm.org/D121254
2022-03-14 15:23:36 -07:00
Stanislav Mekhanoshin 6181458662 [AMDGPU] gfx940 MUBUF format changes
Differential Revision: https://reviews.llvm.org/D121234
2022-03-11 11:36:49 -08:00
Stanislav Mekhanoshin 932f628121 [AMDGPU] new gfx940 fp atomics
Differential Revision: https://reviews.llvm.org/D121028
2022-03-07 12:32:02 -08:00
Stanislav Mekhanoshin e7b362d75d [AMDGPU] Add v_mov_b64 gfx940 opcode
Differential Revision: https://reviews.llvm.org/D121023
2022-03-07 12:07:12 -08:00
Stanislav Mekhanoshin 8992b50e2f [AMDGPU] gfx940 uses new names for coherency bits
Differential Revision: https://reviews.llvm.org/D120855
2022-03-07 11:50:07 -08:00
Stanislav Mekhanoshin 2c830c8fab [AMDGPU] gfx940: support V_FMAMK_F32 and V_FMAAK_F32
Differential Revision: https://reviews.llvm.org/D120769
2022-03-07 11:31:01 -08:00
Simon Tatham 54dafd38c5 [AArch64] Move FeatureSpecRestrict into core 8.0-R architecture.
It was included in HasV8_0rOps when D88660 first introduced that
architecture definition. In D118045 I moved it out of there and into
ProcessorFeatures.R82, so that -mcpu=cortex-r82 would continue to
behave the same as before but -march=armv8-r would include only the
mandatory parts of the architecture.

In fact, that was a mistake. Firstly, Cortex-R82 _doesn't_ implement
that feature, so it makes no sense to deliberately enable it for that
CPU in particular. But also, it's an extension that only adds system
registers, and we're generally more relaxed about where we enable
those (because kernel developers find it useful to write sysreg-access
instructions after runtime checking, and because sysreg accesses
aren't manufactured during code generation so the risk is small).

So, in line with that usual AArch64 policy, FeatureSpecRestrict ought
to be considered part of 8.0-R for LLVM purposes. So I'm moving it
back into HasV8_0rOps, where it started out.

Reviewed By: lenary

Differential Revision: https://reviews.llvm.org/D120830
2022-03-07 15:55:08 +00:00
Aakanksha 840695814a [AMDGPU] Add gfx1036 target
Differential Revision: https://reviews.llvm.org/D120846
2022-03-02 23:26:38 +00:00
Stefan Pintilie eb1c5a9862 [PowerPC] Add the Power10 LXVKQ instrution.
Add the Power 10 instruction LXVKQ.

This patch was taken from an original patch by: Yi-Hong Lyu

Reviewed By: lei

Differential Revision: https://reviews.llvm.org/D117507
2022-02-23 08:48:59 -06:00
Min-Yih Hsu 4986a41f58 [M68k] Adopt VarLenCodeEmitter for bits instructions
And introduce operand encoding fragments (i.e. MxEncMemOp record) for
addressing modes 'o' and 'e'.
2022-02-17 14:16:19 -08:00
Sheng 4306fbff9c Revert "Revert "[M68k] Adopt VarLenCodeEmitter for control instructions""
This reverts commit 69a7d49de6.

llvm/test/MC/M68k/Relaxations/branch.s needs disassembler support.

So I disabled it temporarily
2022-02-16 17:41:49 +08:00
Sheng 69a7d49de6 Revert "[M68k] Adopt VarLenCodeEmitter for control instructions"
This reverts commit 9ffd498fcb.

This patch introduce regression on MC/M68k/Relaxations/branch.s
2022-02-16 17:09:46 +08:00
Sheng 9ffd498fcb [M68k] Adopt VarLenCodeEmitter for control instructions
Refactor the instructions in M68kInstrControl.td to use the VarLenCodeEmitter.

This patch is tested by the existing test cases.

Reviewed By: myhsu, ricky26

Differential Revision: https://reviews.llvm.org/D119665
2022-02-16 12:54:20 +08:00
Stefan Pintilie a601db30c6 [PowerPC] Remove the LDMX instruction.
The LDMX instruction was to be potentially added in P9 but it was never added
in either ISA 3.0 or ISA 3.1. This patch removes that instruction as it is
currently still an invalid instruction.

Reviewed By: lei

Differential Revision: https://reviews.llvm.org/D118074
2022-02-14 17:03:48 -06:00
Min-Yih Hsu 08f2b0dcf6 [M68k] Adopt the new VarLenCodeEmitterGen for arithmetic instructions
This patch refactors all the existing M68k arithmetic instructions
to use the new VarLenCodeEmitterGen infrastructure.

This patch is tested by the existing MC test cases.

Note that one of the codegen tests needed to be updated because the
ordering of two equivalent instructions were switched.

Differential Revision: https://reviews.llvm.org/D115234
2022-02-11 09:31:12 -08:00
Stanislav Mekhanoshin d3b87e4a1c [AMDGPU] HWRegs TMA and TBA also supported on gfx9
Differential Revision: https://reviews.llvm.org/D118860
2022-02-03 09:36:10 -08:00
Shao-Ce SUN 005fd8aa70 [RISCV] Add support for Zihintpause extention
Add support for the 'pause' hint instruction as an alias for
'fence w, 0'. To do this allow the 'fence' operands pred and succ
to be set to 0 (the empty set). This will also allow future hints
to be encoded as 'fence 0, <x>' and 'fence <x>, 0'.

This patch revised from @mundaym's D93019.

Reviewed By: asb

Differential Revision: https://reviews.llvm.org/D117789
2022-02-03 20:55:47 +08:00
Ting Wang 6f25cb8685 [PowerPC] Add the Power10 XS[MAX|MIN]CQP instruction
Add the Power 10 instruction XS[MAX|MIN]CQP.

Reviewed By: shchenz, amyk

Differential Revision: https://reviews.llvm.org/D118036
2022-01-26 23:00:43 -05:00
Simon Tatham f302e0b5dd [AArch64] Exclude optional features from HasV8_0rOps.
The following SubtargetFeatures are removed from the definition of
HasV8_0rOps, on the grounds that they are optional in Armv8.4-A, and
therefore (by the definition of Armv8.0-R) also optional in v8.0-R:

 * performance monitoring: FeaturePerfMon
 * cryptography: FeatureSM4 and FeatureSHA3
 * half-precision FP: FeatureFullFP16, FeatureFP16FML
 * speculation control: FeatureSSBS, FeaturePredRes, FeatureSB,
   FeatureSpecRestrict

This isn't the full set of features that are listed as optional in the
spec. FeatureCCIDX and FeatureTRACEV8_4 are also optional. But LLVM
includes those in HasV8_3aOps and HasV8_4aOps respectively (I think on
the grounds that the system registers they enable are useful to be
able to access after a runtime check), and so for consistency, I've
left those in HasV8_0rOps too.

After this commit, HasV8_0rOps is a strict subset of HasV8_4aOps (but
missing features that are not in Armv8.0-R at all).

The definition of Cortex-R82 is correspondingly updated to add most of
the features that I've removed from base Armv8.0-R (with the exception
of the cryptography ones), since that particular implementation of
v8.0-R does have them.

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D118045
2022-01-25 10:54:59 +00:00
Min-Yih Hsu 07be76f2ae [M68k][Disassembler][NFC] Re-organize test files
Put test cases of each instruction category into their own files. NFC.
2022-01-25 13:10:15 +08:00
Simon Tatham a4ac40e92f [AArch64] Remove PRBAR0_ELn and PRLAR0_ELn sysregs.
The Armv8-R.64 architecture defines numbered MPU region registers with
indices 1-15, not 0-15. So there's no such register as PRBAR0_EL2 or
PRLAR0_EL1 (for example). The encodings that they would occupy are
used for the unnumbered PRBAR_ELn and PRLAR_ELn registers.

Reviewed By: labrinea

Differential Revision: https://reviews.llvm.org/D117755
2022-01-20 13:37:58 +00:00
Simon Tatham 19b9cd4eae [MC] Add a disassembly test for Armv8-R sysregs.
This is the counterpart to llvm/test/MC/AArch64/armv8r-sysreg.s,
checking all the same encodings when fed to the disassembler.
2022-01-20 13:37:58 +00:00
Simon Tatham e35a3f188f [AArch64] Adding "armv8.8-a" memcpy/memset support.
This family of instructions includes CPYF (copy forward), CPYB (copy
backward), SET (memset) and SETG (memset + initialise MTE tags), with
some sub-variants to indicate whether address translation is done in a
privileged or unprivileged way. For the copy instructions, you can
separately specify the read and write translations (so that kernels
can safely use these instructions in syscall handlers, to memcpy
between the calling process's user-space memory map and the kernel's
own privileged one).

The unusual thing about these instructions is that they write back to
multiple registers, because they perform an implementation-defined
amount of copying each time they run, and write back to _all_ the
address and size registers to indicate how much remains to be done
(and the code is expected to loop on them until the size register
becomes zero). But this is no problem in LLVM - you just define each
instruction to have multiple outputs, multiple inputs, and a set of
constraints tying their register numbers together appropriately.

This commit introduces a special subtarget feature called MOPS (after
the name the spec gives to the CPU id field), which is a dependency of
the top-level 8.8-A feature, and uses that to enable most of the new
instructions. The SETMG instructions also depend on MTE (and the test
checks that).

Differential Revision: https://reviews.llvm.org/D116157
2022-01-05 14:44:24 +00:00
Simon Tatham 8c1e520c90 [AArch64] Adding "armv8.8-a" BC instruction.
This instruction is described in the Arm A64 Instruction Set
Architecture documentation available here:
https://developer.arm.com/documentation/ddi0596/2021-12/Base-Instructions/BC-cond--Branch-Consistent-conditionally-?lang=en

FEAT_HBC "Hinted Conditional Branches" is listed in the 2021 A-Profile Architecture Extensions:
https://developer.arm.com/architectures/cpu-architecture/a-profile/exploration-tools/feature-names-for-a-profile

'BC.cc', where 'cc' is any ordinary condition code, is an instruction
that looks exactly like B.cc (the normal conditional branch), except
that bit 4 of the encoding is 1 rather than 0, which hints something
to the branch predictor (specifically, that this branch is expected to
be highly consistent, even though _which way_ it will consistently go
is not known at compile time).

This commit introduces a special subtarget feature for HBC, which is a
dependency of the top-level 8.8-A feature, and uses that to enable the
new BC instruction.

Differential Revision: https://reviews.llvm.org/D116156
2022-01-03 12:33:51 +00:00
Ties Stuij 63eb7ff47d [ARM] Implement PAC return address signing mechanism for PACBTI-M
This patch implements PAC return address signing for armv8-m. This patch roughly
accomplishes the following things:

- PAC and AUT instructions are generated.
- They're part of the stack frame setup, so that shrink-wrapping can move them
inwards to cover only part of a function
- The auth code generated by PAC is saved across subroutine calls so that AUT
can find it again to check
- PAC is emitted before stacking registers (so that the SP it signs is the one
on function entry).
- The new pseudo-register ra_auth_code is mentioned in the DWARF frame data
- With CMSE also in use: PAC is emitted before stacking FPCXTNS, and AUT
validates the corresponding value of SP
- Emit correct unwind information when PAC is replaced by PACBTI
- Handle tail calls correctly

Some notes:

We make the assembler accept the `.save {ra_auth_code}` directive that is
emitted by the compiler when it saves a register that contains a
return address authentication code.

For EHABI we need to have the `FrameSetup` flag on the instruction and
handle the `t2PACBTI` opcode (identically to `t2PAC`), so we can emit
`.save {ra_auth_code}`, instead of `.save {r12}`.

For PACBTI-M, the instruction which computes return address PAC should use SP
value before adjustment for the argument registers save are (used for variadic
functions and when a parameter is is split between stack and register), but at
the same it should be after the instruction that saves FPCXT when compiling a
CMSE entry function.

This patch moves the varargs SP adjustment after the FPCXT save (they are never
enabled at the same time), so in a following patch handling of the `PAC`
instruction can be placed between them.

Epilogue emission code adjusted in a similar manner.

PACBTI-M code generation should not emit any instructions for architectures
v6-m, v8-m.base, and for A- and R-class cores. Diagnostic message for such cases
is handled separately by a future ticket.

note on tail calls:

If the called function has four arguments that occupy registers `r0`-`r3`, the
only option for holding the function pointer itself is `r12`, but this register
is used to keep the PAC during function/prologue epilogue and clobbers the
function pointer.

When we do the tail call we need the five registers (`r0`-`r3` and `r12`) to
keep six values - the four function arguments, the function pointer and the PAC,
which is obviously impossible.

One option would be to authenticate the return address before all callee-saved
registers are restored, so we have a scratch register to temporarily keep the
value of `r12`. The issue with this approach is that it violates a fundamental
invariant that PAC is computed using CFA as a modifier. It would also mean using
separate instructions to pop `lr` and the rest of the callee-saved registers,
which would offset the advantages of doing a tail call.

Instead, this patch disables indirect tail calls when the called function take
four or more arguments and the return address sign and authentication is enabled
for the caller function, conservatively assuming the caller function would spill
LR.

This patch is part of a series that adds support for the PACBTI-M extension of
the Armv8.1-M architecture, as detailed here:

https://community.arm.com/arm-community-blogs/b/architectures-and-processors-blog/posts/armv8-1-m-pointer-authentication-and-branch-target-identification-extension

The PACBTI-M specification can be found in the Armv8-M Architecture Reference
Manual:

https://developer.arm.com/documentation/ddi0553/latest

The following people contributed to this patch:

- Momchil Velikov
- Ties Stuij

Reviewed By: danielkiss

Differential Revision: https://reviews.llvm.org/D112429
2021-12-07 10:15:19 +00:00
Ties Stuij 5cff77c23f [clang][ARM] PACBTI-M assembly support
Introduce assembly support for Armv8.1-M PACBTI extension. This is an optional
extension in v8.1-M.

There are 10 new system registers and 5 new instructions, all predicated on the
feature.

The attribute for llvm-mc is called "pacbti". For armclang, an architecture
extension also called "pacbti" was created.

This patch is part of a series that adds support for the PACBTI-M extension of
the Armv8.1-M architecture, as detailed here:

https://community.arm.com/arm-community-blogs/b/architectures-and-processors-blog/posts/armv8-1-m-pointer-authentication-and-branch-target-identification-extension

The PACBTI-M specification can be found in the Armv8-M Architecture Reference
Manual:

https://developer.arm.com/documentation/ddi0553/latest

The following people contributed to this patch:

- Victor Campos
- Ties Stuij

Reviewed By: labrinea

Differential Revision: https://reviews.llvm.org/D112420
2021-11-30 09:28:18 +00:00
Dmitry Preobrazhensky 91f4650ebb [AMDGPU][MC][GFX10] Corrected global_atomic_fcmpswap*
Corrected src data size of global_atomic_fcmpswap and global_atomic_fcmpswap_x2 opcodes.

Differential Revision: https://reviews.llvm.org/D113746
2021-11-15 12:51:12 +03:00
Alexandros Lamprineas 8689f5e6e7 [AArch64] Add support for the 'R' architecture profile.
This change introduces subtarget features to predicate certain
instructions and system registers that are available only on
'A' profile targets. Those features are not present when
targeting a generic CPU, which is the default processor.

In other words the generic CPU now means the intersection of
'A' and 'R' profiles. To maintain backwards compatibility we
enable the features that correspond to -march=armv8-a when the
architecture is not explicitly specified on the command line.

References: https://developer.arm.com/documentation/ddi0600/latest

Differential Revision: https://reviews.llvm.org/D110065
2021-10-27 12:32:30 +01:00
Joe Nash b4b7e605a6 [AMDGPU] Support shared literals in FMAMK/FMAAK
These instructions should allow src0 to be a literal with the same
value as the mandatory other literal. Enable it by introducing an
operand that defers adding its value to the MI when decoding till
the mandatory literal is parsed.

Reviewed By: dp, foad

Differential Revision: https://reviews.llvm.org/D111067

Change-Id: I22b0ae0d35bad17b6f976808e48bffe9a6af70b7
2021-10-11 13:09:54 -04:00
Dmitry Preobrazhensky 3500e7d2b0 [AMDGPU][MC][GFX7][GFX10] Corrected image_atomic_fcmpswap
Differential Revision: https://reviews.llvm.org/D109616
2021-09-21 18:06:02 +03:00
Dmitry Preobrazhensky b8e7f53208 [AMDGPU][MC][GFX10] Enabled dlc for FLAT and GLOBAL atomics
Differential Revision: https://reviews.llvm.org/D109614
2021-09-21 16:23:20 +03:00
Victor Campos 79f9c79aaf [AArch64][MC] Merge FeaturePMU into FeaturePerfMon
FeaturePMU was created in AArch64 to accommodate one missing system
register, PMMIR_EL1, in commit ffcd7698ae.

However, the Performance Monitors extension already had a target
feature, which is called FeaturePerfMon. Therefore, FeaturePMU is
redundant.

This patch removes FeaturePMU and merges its contents into
FeaturePerfMon.

Reviewed By: dnsampaio

Differential Revision: https://reviews.llvm.org/D109246
2021-09-06 14:56:49 +01:00
Wang, Pengfei ab40dbfe03 [X86] AVX512FP16 instructions enabling 6/6
Enable FP16 complex FMA instructions.

Ref.: https://software.intel.com/content/www/us/en/develop/download/intel-avx512-fp16-architecture-specification.html

Reviewed By: LuoYuanke

Differential Revision: https://reviews.llvm.org/D105269
2021-08-30 13:08:45 +08:00
Thomas Johnson 8c3886b0ec [ARC] Add ADC (addition with carry) and SBC (subtraction with carry) instructions
Differential Revision: https://reviews.llvm.org/D108672
2021-08-25 07:46:15 -07:00
Thomas Johnson ce1dc9d647 [ARC] Add codegen for the readcyclecounter intrinsic along with disassembly for associated instructions
Differential Revision: https://reviews.llvm.org/D108598
2021-08-24 11:53:20 -07:00
Wang, Pengfei c728bd5bba [X86] AVX512FP16 instructions enabling 5/6
Enable FP16 FMA instructions.

Ref.: https://software.intel.com/content/www/us/en/develop/download/intel-avx512-fp16-architecture-specification.html

Reviewed By: LuoYuanke

Differential Revision: https://reviews.llvm.org/D105268
2021-08-24 09:07:19 +08:00
Wang, Pengfei b088536ce9 [X86] AVX512FP16 instructions enabling 4/6
Enable FP16 unary operator instructions.

Ref.: https://software.intel.com/content/www/us/en/develop/download/intel-avx512-fp16-architecture-specification.html

Reviewed By: LuoYuanke

Differential Revision: https://reviews.llvm.org/D105267
2021-08-22 08:59:35 +08:00
Wang, Pengfei 2379949aad [X86] AVX512FP16 instructions enabling 3/6
Enable FP16 conversion instructions.

Ref.: https://software.intel.com/content/www/us/en/develop/download/intel-avx512-fp16-architecture-specification.html

Reviewed By: LuoYuanke

Differential Revision: https://reviews.llvm.org/D105265
2021-08-18 09:03:41 +08:00
Carl Ritson 99c790dc21 [AMDGPU] Make BVH isel consistent with other MIMG opcodes
Suffix opcodes with _gfx10.
Remove direct references to architecture specific opcodes.
Add a BVH flag and apply this to diassembly.
Fix a number of disassembly errors on gfx90a target caused by
previous incorrect BVH detection code.

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D108117
2021-08-17 10:42:22 +09:00
Craig Topper b82ce77b2b [X86] Support avx512fp16 compare instructions in the IntelInstPrinter.
This enables printing of the mnemonics that contain the predicate
in the Intel printer. This requires accounting for the memory size
that is explicitly printed in Intel syntax. Those changes have been
synced to the ATT printer as well.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D108093
2021-08-16 12:31:36 +08:00
Wang, Pengfei f1de9d6dae [X86] AVX512FP16 instructions enabling 2/6
Enable FP16 binary operator instructions.

Ref.: https://software.intel.com/content/www/us/en/develop/download/intel-avx512-fp16-architecture-specification.html

Reviewed By: LuoYuanke

Differential Revision: https://reviews.llvm.org/D105264
2021-08-15 08:56:33 +08:00
Thomas Johnson b821086876 [ARC] Add codegen for count trailing zeros intrinsic for the ARC backend
Differential Revision: https://reviews.llvm.org/D107828
2021-08-10 12:07:35 -07:00
Wang, Pengfei 6f7f5b54c8 [X86] AVX512FP16 instructions enabling 1/6
1. Enable FP16 type support and basic declarations used by following patches.
2. Enable new instructions VMOVW and VMOVSH.

Ref.: https://software.intel.com/content/www/us/en/develop/download/intel-avx512-fp16-architecture-specification.html

Reviewed By: LuoYuanke

Differential Revision: https://reviews.llvm.org/D105263
2021-08-10 12:46:01 +08:00
Min-Yih Hsu 2167e237ee [M68k] Update disassembler test case following up ADD / ADDA changes
Update disassembler test case to reflect the changes on ADD/ADDA
instruction separation.
2021-08-08 14:20:46 -07:00
Mark Schimmel e622c99f30 [ARC] Add norm/normh instructions with disassembly tests
Add disassembler support for the NORM and NORMH instructions. These instructions
only exist when the ARC processor is configured with the "norm" extension.

fferential Revision: https://reviews.llvm.org/D107118
2021-07-29 17:54:52 -07:00
Thomas Johnson cc238a6e03 [ARC] Add additional mov immediate instruction formats with a fix for u6 decoding
Differential Revision: https://reviews.llvm.org/D107088
2021-07-29 16:41:55 -07:00
Lei Huang 64a15817a0 [PowerPC]Add addex instruction definition and MC tests
Add td definitions and asm/disasm tests for the addex instruction introduced in
ISA 3.0.

Reviewed By: nemanjai, amyk, NeHuang

Differential Revision: https://reviews.llvm.org/D106666
2021-07-26 14:55:38 -05:00
Ulrich Weigand 8cd8120a7b [SystemZ] Add support for new cpu architecture - arch14
This patch adds support for the next-generation arch14
CPU architecture to the SystemZ backend.

This includes:
- Basic support for the new processor and its features.
- Detection of arch14 as host processor.
- Assembler/disassembler support for new instructions.
- New LLVM intrinsics for certain new instructions.
- Support for low-level builtins mapped to new LLVM intrinsics.
- New high-level intrinsics in vecintrin.h.
- Indicate support by defining  __VEC__ == 10304.

Note: No currently available Z system supports the arch14
architecture.  Once new systems become available, the
official system name will be added as supported -march name.
2021-07-26 16:57:28 +02:00
Thomas Johnson 51d8e67e88 [ARC] Add tablegen definition for the Find Leading Set (FLS) instruction
Differential Revision: https://reviews.llvm.org/D106602
2021-07-22 17:42:25 -07:00
Thomas Johnson 1cda1e6186 [ARC] Add disassembly for the conditioned RSUB immediate instruction
Differential Revision: https://reviews.llvm.org/D106497
2021-07-22 11:34:39 -07:00
Carl Ritson 6efb3220b4 [AMDGPU] Add VReg_192/VReg_224 support for MIMG instructions
Allow MIMG instructions to be selected with 6/7 VGPRs for vaddr.
Previously these were rounded up to VReg_256 this saves VGPRs.

Reviewed By: foad

Differential Revision: https://reviews.llvm.org/D103800
2021-07-22 10:42:15 +09:00
Igor Kudrin 657e067bb5 [ARMInstPrinter] Print the target address of a branch instruction
This follows other patches that changed printing immediate values of
branch instructions to target addresses, see D76580 (x86), D76591 (PPC),
D77853 (AArch64).

As observing immediate values might sometimes be useful, they are
printed as comments for branch instructions.

// llvm-objdump -d output (before)
000200b4 <_start>:
   200b4: ff ff ff fa   blx     #-4 <thumb>
000200b8 <thumb>:
   200b8: ff f7 fc ef   blx     #-8 <_start>

// llvm-objdump -d output (after)
000200b4 <_start>:
   200b4: ff ff ff fa   blx     0x200b8 <thumb>         @ imm = #-4
000200b8 <thumb>:
   200b8: ff f7 fc ef   blx     0x200b4 <_start>        @ imm = #-8

// GNU objdump -d.
000200b4 <_start>:
   200b4:       faffffff        blx     200b8 <thumb>
000200b8 <thumb>:
   200b8:       f7ff effc       blx     200b4 <_start>

Differential Revision: https://reviews.llvm.org/D104701
2021-06-30 16:35:28 +07:00
Lucas Prates 88b1135e72 [Aarch64] Adding support for Armv9-A Realm Management Extension
This adds support for Armv9-A's Realm Management Extension, including
three new system registers - MFAR_EL3, GPCCR_EL3 and GPTBR_EL3 - and
four new TLBI instructions.

The reference for the Realm Management Extension can be found at: https://developer.arm.com/documentation/ddi0615/aa.

Based on patches by Victor Campos.

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D104773
2021-06-28 13:45:22 +01:00
Aakanksha Patil 3453f3dd46 [AMDGPU] Add gfx1035 target
Differential Revision: https://reviews.llvm.org/D104804
2021-06-24 14:32:41 -04:00
Kai Luo 1c450c3d7e [PowerPC] Export 16 byte load-store instructions
Export `lq`, `stq`, `lqarx` and `stqcx.` in preparation for implementing 16-byte lock free atomic operations on AIX.
Add a new register class `g8prc` for these instructions, since these instructions require even-odd register pair.

Reviewed By: nemanjai, jsji, #powerpc

Differential Revision: https://reviews.llvm.org/D103010
2021-06-15 01:56:10 +00:00
Carl Ritson f8816c7400 [AMDGPU] Add v5f32/VReg_160 support for MIMG instructions
Avoid having to round up to v8f32/VReg_256 when only 5 VGPRs are
required for a MIMG address operand.

Maintain _V8 instruction variants of pseudo instructions allowing
assembly prior to GFX10 to work as-is.  Currently the validator
can tell for GFX10 what the correct size is, so will disallow
oversize address registers.

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D103672
2021-06-08 11:11:40 +09:00
Jay Foad 9e9edede18 [AMDGPU] Fix MC tests for v_fmaak_f16 and v_fmamk_f16
This looks like a mistake when the tests were committed in r363946.
There were two sets of tests for the f32 variant of these instructions,
instead of one set for f16 and one set for f32.

Differential Revision: https://reviews.llvm.org/D103699
2021-06-07 10:42:52 +01:00
Dmitry Preobrazhensky 13c6568c6e [AMDGPU][MC][GFX90A] Corrected DS_GWS opcodes
Corrected DS_GWS opcodes to use even aligned registers.

Differential Revision: https://reviews.llvm.org/D103185
2021-05-26 21:31:50 +03:00
Stanislav Mekhanoshin f4c0fdc6c9 [AMDGPU] Set unused dst_sel to '?' in the encoding
This is to allow disasm with any bits in the unused fields.

Differential Revision: https://reviews.llvm.org/D102526
2021-05-17 08:38:52 -07:00
Alexandros Lamprineas 1079870971 [llvm-mc][AArch64] HINT instruction disassembled as BTI
The Arm Architecture Reference Manual says that the SystemHintOp_BTI
opcode is prefered when CRm:op2 matches 0100:xx0, but llvm-mc
currently accepts 0100:xxx, which isn't right.

Differential Revision: https://reviews.llvm.org/D102415
2021-05-14 10:05:37 +01:00
David Stuttard 72d570ca08 [AMDGPU][AsmParser/Disassembler] Correct A16 and G16 handling
A16 support for image instructions assembly/disassembly (gfx10) was missing

Also refactor MIMG op addr size calcs to common function

We'd got 3 places where the same operation was being done.

One test is now marked XFAIL until a related codegen patch is in place

Differential Revision: https://reviews.llvm.org/D102231

Change-Id: I7e86e730ef8c71901457855cba570581f4f576bb
2021-05-14 09:25:44 +01:00
Aakanksha Patil 464e4dc50f [AMDGPU] Add gfx1034 target
Differential Revision: https://reviews.llvm.org/D102306
2021-05-13 14:25:18 -04:00
Min-Yih Hsu fc86e6d188 [ARM][disassembler] Fix incorrect number of MCOperands generated by the disassembler
Try to fix bug 49974.

This patch fixes two issues:

 1. BL does not use predicate (BL_pred is the predicate version of BL),
    so we shouldn't add predicate operands in DecodeBranchImmInstruction.
 2. Inside DecodeT2AddSubSPImm, we shouldn't add predicate operands into
    the MCInst because ARMDisassembler::AddThumbPredicate will do that for us.
    However, we should handle CC-out operand for t2SUBspImm and t2AddspImm.

Differential Revision: https://reviews.llvm.org/D100585
2021-04-25 11:55:10 -07:00
Ricky Taylor 2221185776 [M68k] Implement Disassembler
This is an implementation of a disassembler for M68k.

Differential Revision: https://reviews.llvm.org/D98540
2021-04-19 22:24:12 +01:00
Stefan Pintilie f28cb01be0 [PowerPC] Add ROP Protection Instructions for PowerPC
There are four new PowerPC instructions that are introduced in
Power 10. They are hashst, hashchk, hashstp, hashchkp.

These instructions will be used for ROP Protection.
This patch adds the four instructions.

Reviewed By: nemanjai, amyk, #powerpc

Differential Revision: https://reviews.llvm.org/D99375
2021-04-15 11:38:38 -05:00
Thomas Lively ea8dd3ee2e [WebAssembly] Update v128.any_true
In the final SIMD spec, there is only a single v128.any_true instruction, rather
than one for each lane interpretation because the semantics do not depend on the
lane interpretation.

Differential Revision: https://reviews.llvm.org/D100241
2021-04-11 11:13:16 -07:00