Commit Graph

281 Commits

Author SHA1 Message Date
Bradley Smith 8bad8a43c3 [AArch64][SVE] Add patterns to generate FMLA/FMLS/FNMLA/FNMLS/FMAD
Adjust generateFMAsInMachineCombiner to return false if SVE is present
in order to combine fmul+fadd into fma. Also add new pseudo instructions
so as to select the most appropriate of FMLA/FMAD depending on register
allocation.

Depends on D96599

Differential Revision: https://reviews.llvm.org/D96424
2021-02-18 16:55:16 +00:00
David Green 0f435a544a [AArch64] Correct some tablegen operand types. NFC 2021-02-06 14:34:14 +00:00
David Sherwood d1bf26fd94 [AArch64][SVE] Add lowering for llvm abs intrinsic
Add functionality to permit lowering of the abs and neg intrinsics
using the passthru variants.

Differential Revision: https://reviews.llvm.org/D94160
2021-01-08 08:55:25 +00:00
Cameron McInally f4013359b3 [SVE] Add unpacked scalable floating point ZIP/UZP/TRN patterns
Differential Revision: https://reviews.llvm.org/D94193
2021-01-07 09:56:53 -06:00
Bradley Smith c73ae747cb [AArch64][SVE] Add optimization to remove redundant ptest instructions
Co-Authored-by: Graham Hunter <graham.hunter@arm.com>
Co-Authored-by: Paul Walker <paul.walker@arm.com>

Differential Revision: https://reviews.llvm.org/D93292
2021-01-05 15:28:36 +00:00
Paul Walker eba6deab22 [SVE] Lower vector CTLZ, CTPOP and CTTZ operations.
CTLZ and CTPOP are lowered to CLZ and CNT instructions respectively.

CTTZ is not a native SVE operation but is instead lowered to:
  CTTZ(V) => CTLZ(BITREVERSE(V))

In the case of fixed-length support using SVE we also lower CTTZ
operating on NEON sized vectors because of its reliance on
BITREVERSE which is also lowered to SVE intructions at these lengths.

Differential Revision: https://reviews.llvm.org/D93607
2021-01-05 10:42:35 +00:00
Paul Walker 8eec7294fe [SVE] Lower vector BITREVERSE and BSWAP operations.
These operations are lowered to RBIT and REVB instructions
respectively.  In the case of fixed-length support using SVE we
also lower BITREVERSE operating on NEON sized vectors as this
results in fewer instructions.

Differential Revision: https://reviews.llvm.org/D93606
2020-12-22 16:49:50 +00:00
Paul Walker c0bc169cb1 [NFC][SVE] Clean up bfloat isel patterns that emit non-bfloat instructions.
During isel there's no need to protect illegal types. Patch also
adds a missing unit test for tbl2 intrinsic using bfloat types.

Differential Revision: https://reviews.llvm.org/D93404
2020-12-18 13:20:41 +00:00
Paul Walker 632f4d2747 [NFC] Fix a few SVEInstrInfo related stylistic issues. 2020-12-15 16:10:38 +00:00
Kerry McLaughlin c5ced82c8e [SVE][CodeGen] Lower scalable floating-point vector reductions
Changes in this patch:
-  Minor changes to the LowerVECREDUCE_SEQ_FADD function added by @cameron.mcinally
   to also work for scalable types
- Added TableGen patterns for FP reductions with unpacked types (nxv2f16, nxv4f16 & nxv2f32)
- Asserts added to expandFMINNUM_FMAXNUM & expandVecReduceSeq for scalable types

Reviewed By: cameron.mcinally

Differential Revision: https://reviews.llvm.org/D93050
2020-12-14 11:45:42 +00:00
Huihui Zhang 1e113c078a [AArch64][SVE] Fix umin/umax lowering to handle out of range imm.
Immediate must be in an integer range [0,255] for umin/umax instruction.
Extend pattern matching helper SelectSVEArithImm() to take in value type
bitwidth when checking immediate value is in range or not.

Reviewed By: sdesmalen

Differential Revision: https://reviews.llvm.org/D89831
2020-10-23 09:42:56 -07:00
Paul C. Anagnostopoulos 2871c6c93f [Aarch64] [TableGen] Clean up !if(!eq(boolean, 1) and related booleans.
Differential Revision: https://reviews.llvm.org/D89551
2020-10-19 10:33:55 -04:00
Muhammad Asif Manzoor aab6f7db47 [AArch64][SVE] Add lowering for llvm fabs
Add the functionality to lower fabs for passthru variant

Reviewed By: paulwalker-arm

Differential Revision: https://reviews.llvm.org/D88679
2020-10-01 19:41:25 -04:00
Kerry McLaughlin fcf70e1e3b [SVE][CodeGen] Lower scalable fp_extend & fp_round operations
This patch adds FP_EXTEND_MERGE_PASSTHRU & FP_ROUND_MERGE_PASSTHRU
ISD nodes, used to lower scalable vector fp_extend/fp_round operations.
fp_round has an additional argument, the 'trunc' flag, which is an integer of zero or one.

This also fixes a warning introduced by the new tests added to sve-split-fcvt.ll,
resulting from an implicit TypeSize -> uint64_t cast in SplitVecOp_FP_ROUND.

Reviewed By: sdesmalen, paulwalker-arm

Differential Revision: https://reviews.llvm.org/D88321
2020-10-01 12:17:37 +01:00
Muhammad Asif Manzoor 3a76de4275 [AArch64][SVE] Add lowering for llvm frecpx
Add the functionality to lower frecpx for passthru variant

Reviewed By: paulwalker-arm

Differential Revision: https://reviews.llvm.org/D88032
2020-09-23 15:23:54 -04:00
Kerry McLaughlin d0149ba9b4 [SVE][CodeGen] Lower legal integer -> floating point conversions
This patch adds new ISD nodes, SCVTZ_MERGE_PASSTHRU &
UCVTZ_MERGE_PASSTHRU, which are used to lower both legal
scalable vector [S|U]INT_TO_FP operations and the following intrinsics:
 - llvm.aarch64.sve.scvtf
 - llvm.aarch64.sve.ucvtf

Reviewed By: sdesmalen, efriedma

Differential Revision: https://reviews.llvm.org/D87913
2020-09-23 11:53:53 +01:00
David Sherwood 96e52c1364 [SVE][CodeGen] Mark ptrue/pfalse instructions as rematerializable 2020-09-21 16:44:32 +01:00
Paul Walker f3fa954b5b [SVE] Change definition of reduction ISD nodes to have an SVE vector result type.
The current nodes, AArch64::SMAXV_PRED for example, are defined to
return a NEON vector result.  This is incorrect because they modify
the complete SVE register and are thus changed to represent such.

This patch also adds nodes for UADDV_PRED and SADDV_PRED, which
unifies the handling of all SVE reductions.

NOTE: Floating-point reductions are already implemented correctly,
so this patch is essentially making everything consistent with those.

Differential Revision: https://reviews.llvm.org/D87843
2020-09-21 13:16:28 +01:00
Kerry McLaughlin f7185b271f [SVE][CodeGen] Lower floating point -> integer conversions
This patch adds new ISD nodes, FCVTZS_MERGE_PASSTHRU &
FCVTZU_MERGE_PASSTHRU, which are used to lower scalable vector
FP_TO_SINT/FP_TO_UINT operations and the following intrinsics:
 - llvm.aarch64.sve.fcvtzu
 - llvm.aarch64.sve.fcvtzs

Reviewed By: efriedma, paulwalker-arm

Differential Revision: https://reviews.llvm.org/D87232
2020-09-17 14:04:22 +01:00
Muhammad Asif Manzoor fd536eeed9 [AArch64][SVE] Add lowering for llvm fceil
Add the functionality to lower fceil for passthru variant

Reviewed By: paulwalker-arm

Differential Revision: https://reviews.llvm.org/D84548
2020-08-26 15:59:44 -04:00
Francesco Petrogalli 61dfa00957 [MC][SVE] Fix data operand for instruction alias of `st1d`.
The version of `st1d` that operates with vector plus immediate
addressing mode uses the alias `st1d { <Zn>.d }, <Pg>, [<Za>.d]` for
rendering `st1d { <Zn>.d }, <Pg>, [<Za>.d, #0]`. The disassembler was
generating `<Zn>.s` instead of `<Zn>.d>`.

Differential Revision: https://reviews.llvm.org/D86633
2020-08-26 18:22:17 +00:00
Paul Walker 73ac3c0ede [SVE] Lower scalable vector ISD::FNEG operations.
Also updates isConstOrConstSplatFP to allow the mul(A,-1) -> neg(A)
transformation when -1 is expressed as an ISD::SPLAT_VECTOR.

Differential Revision: https://reviews.llvm.org/D86415
2020-08-25 11:22:28 +01:00
Paul Walker 0015b8db8e [SVE] Add ISEL patterns for predicated shifts by an immediate.
For scalable vector shifts the prediacte is typically all active,
which gets selected to an unpredicated shift by immediate.  When
code generating for fixed length vectors the predicate is based
on the vector length and so additional patterns are required to
make use of SVE's predicated shift by immediate instructions.

Differential Revision: https://reviews.llvm.org/D86204
2020-08-20 11:47:20 +01:00
Eli Friedman be944c85f3 [AArch64][SVE] Add patterns for integer mla/mls.
We probably want to introduce pseudo-instructions at some point, like
we have for binary operations, but this seems okay for now.

One thing I'm not sure about is whether we should be doing this as a
DAGCombine instead of directly pattern-matching it. I don't see any big
downside to doing it this way, though.

Differential Revision: https://reviews.llvm.org/D85681
2020-08-18 12:51:16 -07:00
Paul Walker 9f63dc3265 [SVE] Fix shift-by-imm patterns used by asr, lsl & lsr intrinsics.
Right shift patterns will no longer incorrectly accept a shift
amount of zero.  At the same time they will allow larger shift
amounts that are now saturated to their upper bound.

Patterns have been extended to enable immediate forms for shifts
taking an arbitrary predicate.

This patch also unifies the code path for immediate parsing so the
i64 based shifts are no longer treated specially.

Differential Revision: https://reviews.llvm.org/D86084
2020-08-18 11:41:26 +01:00
Paul Walker b6c7b7fa31 [SVE] Add ISD nodes for predicated integer extend inreg operations.
These are useful instructions when lowering fixed length vector
extends, so I've broken this patch out as kind of NFC like work.

Differential Revision: https://reviews.llvm.org/D85546
2020-08-11 11:39:26 +01:00
Paul Walker 0d33a8ef5b [SVE] Lower scalable vector mul operations.
This allows us to remove extra patterns from AArch64SVEInstrInfo.td
because we can reuse those required for fixed length vectors.

Differential Revision: https://reviews.llvm.org/D85328
2020-08-06 11:15:35 +01:00
Paul Walker 3ed59b775d [SVE] Implement lowering for fixed length vector multiplication.
NOTE: Also uses SVE code generation for NEON size vectors, instead
of expanding i64 based vector multiplications.

Differential Revision: https://reviews.llvm.org/D85327
2020-08-06 11:01:39 +01:00
Paul Walker 4be13b15d6 [SVE] Replace remaining _MERGE_OP1 nodes with _PRED variants.
This is the final bit of work to relax the register allocation
requirements when code generating normal LLVM IR, which rarely
care about the result of inactive lanes. By using _PRED nodes
we can make better use of SVE's reversed instructions.

Also removes a redundant parameter from the min/max tests.

Differential Revision: https://reviews.llvm.org/D85142
2020-08-04 11:19:17 +01:00
Francesco Petrogalli 809600d664 [llvm][sve] Reg + Imm addressing mode for ld1ro.
Reviewers: kmclaughlin, efriedma, sdesmalen

Subscribers: tschuett, hiraditya, psnobl, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D83357
2020-07-24 17:48:47 +00:00
Paul Walker 509351d768 [SVE] Add lowering for scalable vector fadd, fdiv, fmul and fsub operations.
Lower the operations to predicated variants.  This is prep work
required for fixed length code generation but also fixes a bug
whereby these operations fail selection when "unpacked" vector
types (e.g. MVT::nxv2f32) are used.

This patch also adds the missing "unpacked" patterns for FMA.

Differential Revision: https://reviews.llvm.org/D83765
2020-07-16 11:31:35 +00:00
Sander de Smalen 8b7b0ad24c [AArch64][SVE] NFC: Rename isOrig -> isReverseInstr
This is a non-functional to clarify some of the terminology in the
AArch64SVEInstrInfo/SVEInstrFormats.td files around the tables
for mapping an instruction to it's reverse instruction counter part,
and vice versa. e.g. DIV -> DIVR and DIVR -> DIV.

Reviewers: paulwalker-arm, cameron.mcinally, rengolin, efriedma

Reviewed By: paulwalker-arm, efriedma

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D82979
2020-07-02 17:01:15 +01:00
Sander de Smalen 075c440f7b [AArch64][SVE] Put zeroing pseudos and patterns under flag.
This patch puts the _ZERO pseudos and corresponding patterns
under the predicate 'UseExperimentalZeroingPseudos', so that they
can be enabled/disabled through compile flags.

This is done because the zeroing pseudos use MOVPRFX to do merging of
the inactive lanes, but it depends on the uarch whether this operation
is actually merged with the destructive operation. If not, it may be
more profitable to use a SELECT and to give the compiler the freedom to
schedule these instructions as normal, rather than keeping them bundled
together. Additionally, this feature is not yet fully implemented and
there are still known bugs (see D80410) that need to be resolved before
the 'experimental' can be dropped from the name.

Reviewers: paulwalker-arm, cameron.mcinally, efriedma

Reviewed By: paulwalker-arm

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D82780
2020-07-02 14:24:33 +01:00
Paul Walker a1aed80a35 [SVE] Relax merge requirement for IR based divides.
We currently lower SDIV to SDIV_MERGE_OP1. This forces the value
for inactive lanes in a way that can hamper register allocation,
however, the lowering has no requirement for inactive lanes.

Instead this patch replaces SDIV_MERGE_OP1 with SDIV_PRED thus
freeing the register allocator. Once done the only user of
SDIV_MERGE_OP1 is intrinsic lowering so I've removed the node
and perform ISel on the intrinsic directly. This also allows
us to implement MOVPRFX based zeroing in the same manner as SUB.

This patch also renames UDIV_MERGE_OP1 and [F]ADD_MERGE_OP1 for
the same reason but in the ADD cases the ISel code is already
as required.

Differential Revision: https://reviews.llvm.org/D82783
2020-07-01 08:18:42 +00:00
Sander de Smalen 39f6a36a24 [AArch64][SVE] NFCI: Choose consistent naming for predicated SDAG nodes
This patch proposes a naming convention for operations that take
a general predicate (and are thus predicated) that specifies
what happens to the false lanes.

Currently the _PRED suffix is used, which doesn't really say much other
than that it takes a predicate. In some instances this means it has
merging predication and in other cases it means zeroing-predication.

This patch also changes the order of operands to
AArch64ISD::DUP_MERGE_PASSTHRU, to pass the predicate as the first
operand, which is in line with all other predicates nodes. It takes the
passthru value as an explicit passthru value, which is always passed as
the last operand.

Reviewers: paulwalker-arm, cameron.mcinally, eli.friedman, dancgr, efriedma

Reviewed By: paulwalker-arm

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D81850
2020-06-29 13:37:30 +01:00
Paul Walker 3a98d5d7e7 [SVE] Code generation for fixed length vector adds.
Summary:
Teach LowerToPredicatedOp to lower fixed length vector operations.

Add AArch64ISD nodes and isel patterns for predicated integer
and floating point adds.

Together this enables SVE code generation for fixed length vector adds.

Reviewers: rengolin, efriedma

Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D82483
2020-06-26 19:54:41 +00:00
Cullen Rhodes c65d4eb5d3 [AArch64][SVE] Guard perm and select bfloat16 intrinsic patterns
Summary:
Permutation and selection bfloat16 intrinsic patterns should be guarded
on the feature flag `+bf16`. Missed in D82182 and D80850.

Reviewers: sdesmalen, fpetrogalli, kmclaughlin, efriedma

Reviewed By: fpetrogalli

Differential Revision: https://reviews.llvm.org/D82492
2020-06-26 09:35:36 +00:00
Cullen Rhodes 26502ad609 [AArch64][SVE] Add bfloat16 support to perm and select intrinsics
Summary:
Added for following intrinsics:

  * zip1, zip2, zip1q, zip2q
  * trn1, trn2, trn1q, trn2q
  * uzp1, uzp2, uzp1q, uzp2q
  * splice
  * rev
  * sel

Reviewed By: david-arm

Differential Revision: https://reviews.llvm.org/D82182
2020-06-24 10:04:51 +00:00
Francesco Petrogalli ef597eda8e [sve][acle] Add SVE BFloat16 extensions.
Summary:
List of intrinsics:

svfloat32_t svbfdot[_f32](svfloat32_t op1, svbfloat16_t op2, svbfloat16_t op3)
svfloat32_t svbfdot[_n_f32](svfloat32_t op1, svbfloat16_t op2, bfloat16_t op3)
svfloat32_t svbfdot_lane[_f32](svfloat32_t op1, svbfloat16_t op2, svbfloat16_t op3, uint64_t imm_index)

svfloat32_t svbfmmla[_f32](svfloat32_t op1, svbfloat16_t op2, svbfloat16_t op3)

svfloat32_t svbfmlalb[_f32](svfloat32_t op1, svbfloat16_t op2, svbfloat16_t op3)
svfloat32_t svbfmlalb[_n_f32](svfloat32_t op1, svbfloat16_t op2, bfloat16_t op3)
svfloat32_t svbfmlalb_lane[_f32](svfloat32_t op1, svbfloat16_t op2, svbfloat16_t op3, uint64_t imm_index)

svfloat32_t svbfmlalt[_f32](svfloat32_t op1, svbfloat16_t op2, svbfloat16_t op3)
svfloat32_t svbfmlalt[_n_f32](svfloat32_t op1, svbfloat16_t op2, bfloat16_t op3)
svfloat32_t svbfmlalt_lane[_f32](svfloat32_t op1, svbfloat16_t op2, svbfloat16_t op3, uint64_t imm_index)

svbfloat16_t svcvt_bf16[_f32]_m(svbfloat16_t inactive, svbool_t pg, svfloat32_t op)
svbfloat16_t svcvt_bf16[_f32]_x(svbool_t pg, svfloat32_t op)
svbfloat16_t svcvt_bf16[_f32]_z(svbool_t pg, svfloat32_t op)

svbfloat16_t svcvtnt_bf16[_f32]_m(svbfloat16_t even, svbool_t pg, svfloat32_t op)
svbfloat16_t svcvtnt_bf16[_f32]_x(svbfloat16_t even, svbool_t pg, svfloat32_t op)

For reference, see section 7.2 of "Arm C Language Extensions for SVE - Version 00bet4"

Reviewers: sdesmalen, ctetreau, efriedma, david-arm, rengolin

Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, cfe-commits, llvm-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D82141
2020-06-22 16:53:02 +00:00
Francesco Petrogalli d32c134648 [llvm][SVE] Reg + reg addressing mode for LD1RO.
Reviewers: efriedma, sdesmalen

Reviewed By: efriedma

Subscribers: tschuett, hiraditya, rkruppe, psnobl, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D80741
2020-06-19 03:56:10 +00:00
Francesco Petrogalli 28a00ac9ba [llvm][SVE] IR intrinsics for quadword permutation instructions.
Summary:
Adding intrinsics and codegen patterns for:

* trn1 <Zd>.q, <Zm>.q, <Zn>.q
* trn2 <Zd>.q, <Zm>.q, <Zn>.q
* zip1 <Zd>.q, <Zm>.q, <Zn>.q
* zip2 <Zd>.q, <Zm>.q, <Zn>.q
* uzp1 <Zd>.q, <Zm>.q, <Zn>.q
* uzp2 <Zd>.q, <Zm>.q, <Zn>.q

These instructions are defined in Armv8.6-A.

Reviewers: sdesmalen, efriedma, kmclaughlin

Reviewed By: sdesmalen

Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D80850
2020-06-15 16:21:56 +00:00
Francesco Petrogalli febeaf94a8 [llvm][SVE] IR intrinsic for LD1RO.
Reviewers: sdesmalen, efriedma

Subscribers: tschuett, hiraditya, rkruppe, psnobl, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D80738
2020-06-03 13:57:16 +00:00
Francesco Petrogalli b572d9b1a7 [llvm][sve] Intrinsics for SVE sudot and usdot instructions.
Summary:
This patch adds IR intrinsics for the mnemonics USDOT and SUDOT of the
8.6 extension of Armv8-a.

Reviewers: sdesmalen, efriedma, david-arm

Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D79876
2020-05-18 22:02:19 +00:00
Francesco Petrogalli 01f9d8ce5c [llvm][SVE] IR intrinscs for matrix multiplication instructions.
Summary:
Instructions:

* SMMLA
* UMMLA
* USMMLA
* FMMLA

Reviewers: sdesmalen, efriedma, kmclaughlin

Subscribers: tschuett, hiraditya, rkruppe, psnobl, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D79638
2020-05-18 22:02:19 +00:00
Eli Friedman a1ce88b4e3 [AArch64][SVE] Implement AArch64ISD::SETCC_PRED
This unifies SETCC operations along the lines of other operations.

Differential Revision: https://reviews.llvm.org/D79975
2020-05-15 11:53:21 -07:00
Cameron McInally b085e51d81 [AArch64][SVE] Add some integer DestructiveBinaryComm* patterns
Add DestructiveBinaryComm* patterns for ADD, SUB, and SUBR.

Differential Revision: https://reviews.llvm.org/D76711
2020-05-14 16:35:49 -05:00
Eli Friedman a52f10b5a3 [AArch64][SVE] Add patterns for VSELECT of immediate merged with a variable.
This covers forms involving "CPY (immediate, merging)".

Differential Revision: https://reviews.llvm.org/D79803
2020-05-13 15:02:08 -07:00
Eli Friedman a8874c76e8 [AArch64][SVE] Add patterns for VSELECT of immediates.
This covers forms involving "CPY (immediate, zeroing)".

This doesn't handle the case where the operands are reversed, and the
condition is freely invertible.  Not sure how to handle that.  Maybe a
DAGCombine.

Differential Revision: https://reviews.llvm.org/D79598
2020-05-11 17:04:22 -07:00
Kerry McLaughlin 3bcd3dd473 [CodeGen][SVE] Lowering of shift operations with scalable types
Summary:
Adds AArch64ISD nodes for:
 - SHL_PRED (logical shift left)
 - SHR_PRED (logical shift right)
 - SRA_PRED (arithmetic shift right)

Existing patterns for unpredicated left shift by immediate
have also been moved into the appropriate multiclasses
in SVEInstrFormats.td.

Reviewers: sdesmalen, efriedma, ctetreau, huihuiz, rengolin

Reviewed By: efriedma

Subscribers: huihuiz, tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, cfe-commits, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D79478
2020-05-07 11:43:49 +01:00
Eli Friedman 2c8546107a [AArch64][SVE] Implement lowering for SIGN_EXTEND etc. of SVE predicates.
Now using patterns, since there's a single-instruction lowering. (We
could convert to VSELECT and pattern-match that, but there doesn't seem
to be much point.)

I think this might be the first instruction to use nested multiclasses
this way? It seems like a good way to reduce duplication between
different integer widths. Let me know if it seems like an improvement.

Also, while I'm here, fix the return type of SETCC so we don't try to
merge a sign-extend with a SETCC.

Differential Revision: https://reviews.llvm.org/D79193
2020-05-06 17:56:32 -07:00