Commit Graph

348 Commits

Author SHA1 Message Date
Cullen Rhodes 3918ef07c4 [AArch64][SVE] Remove redundant ptest after match/nmatch
These instructions are flag setting so the ptest is redundant, the
TableGen class wasn't setting the element size for the predicate causing
the checks in AArch64InstrInfo::optimizePTestInstr to fail.
2022-09-28 08:23:23 +00:00
Paul Walker 0533c39a76 [SVE] Expand DUPM patterns to handle all integer vector types.
NOTE: i8 vector splats are ignored because the immediate range of
DUP already has full coverage.

Differential Revision: https://reviews.llvm.org/D131078
2022-08-05 16:00:08 +00:00
Cullen Rhodes 6082051da1 [AArch64][SVE] Add patterns to select mla/mls
Adds patterns for:

  add(a, select(mask, mul(b, c), splat(0))) -> mla(a, mask, b, c)
  sub(a, select(mask, mul(b, c), splat(0))) -> mls(a, mask, b, c)

Reviewed By: paulwalker-arm

Differential Revision: https://reviews.llvm.org/D130492
2022-07-26 07:52:44 +00:00
Rosie Sumpter e5edc1b5ee [AArch64][SVE] Ensure PTEST operands have type nxv16i1
Currently any legal predicate types will be pattern-matched when
creating a PTEST instruction. This could be a problem in future since
PTEST always uses the .B specifier for the operand, but it is not
always guaranteed that the extra lanes of unpacked types (e.g. nxv4i1)
are zero. This patch ensures the operands of PTEST are type nxv16i1,
where the undef lanes are set to zero.

Differential Revision: https://reviews.llvm.org/D129282/
2022-07-12 09:27:59 +01:00
Sander de Smalen 95e08824fa [AArch64] Add support for various operations on nxv1i1 types.
The supported operations are:
* Logical operations (and, or, xor, bic)
* Logical reductions (and, or, xor, [us]min, [us]max)
* Conversions to/from svbool_t
* Predicate count (CNTP)

Reviewed By: paulwalker-arm

Differential Revision: https://reviews.llvm.org/D128835
2022-07-06 15:57:11 +00:00
Sander de Smalen 690db16422 [AArch64] Make nxv1i1 types a legal type for SVE.
One motivation to add support for these types are the LD1Q/ST1Q
instructions in SME, for which we have defined a number of load/store
intrinsics which at the moment still take a `<vscale x 16 x i1>` predicate
regardless of their element type.

This patch adds basic support for the nxv1i1 type such that it can be passed/returned
from functions, as well as some basic support to support some existing tests that
result in a nxv1i1 type. It also adds support for splats.

Other operations (e.g. insert/extract subvector, logical ops, etc) will be
supported in follow-up patches.

Reviewed By: paulwalker-arm, efriedma

Differential Revision: https://reviews.llvm.org/D128665
2022-07-01 15:11:13 +00:00
Matt Devereau 018a0dd5c8 [AArch64][SVE] Create AArch64ISD node for DUPQLANE128
Create an AArch64ISD node instead of emitting machine node DUP_ZZI_Q.
This allows a simpler DAG combine for work previously attempted
in https://reviews.llvm.org/D128503

Differential Revision: https://reviews.llvm.org/D128902
2022-07-01 11:46:24 +00:00
Paul Walker 43f8a6b749 [SVE] Use CPY to zero active lanes of a floating point vector.
Patterns exist for the integer case that are trivially expandable
to cover 0.0f.

Differential Revision: https://reviews.llvm.org/D128669
2022-07-01 00:59:00 +01:00
Paul Walker 2be4a7a209 [SVE] Extend "and(ipg,cmp(x,y))" patterns to cover the case when y is an immediate.
Differential Revision: https://reviews.llvm.org/D128479
2022-07-01 00:56:22 +01:00
Bradley Smith 424b2ae9ab [AArch64][SVE] Match (add x (urshr/srshr y c)) -> ursra/srsra x y c
Differential Revision: https://reviews.llvm.org/D128447
2022-06-29 12:10:50 +00:00
Bradley Smith 6f27df5084 [AArch64][SVE] Match (add x (lsr/asr y c)) -> usra/ssra x y c
Differential Revision: https://reviews.llvm.org/D128045
2022-06-23 14:56:21 +00:00
Paul Walker e8716179eb [SVE] Make ISD::SPLAT_VECTOR a legal operation.
The implication of this patch being AArch64ISD::DUP no longer
supports scalable vectors.

Differential Revision: https://reviews.llvm.org/D128265
2022-06-23 00:42:47 +01:00
Paul Walker 84f486cfab [NFC][SVE] Simplify SUBR_ZI isel patterns.
Differential Revision: https://reviews.llvm.org/D128199
2022-06-22 00:05:18 +01:00
Rosie Sumpter 2c4e44752d [AArch64][SME] Add load/store intrinsics
This patch adds implementations for the load/store SME ACLE intrinsics:
  - @llvm.aarch64.sme.ld1*
  - @llvm.aarch64.sme.st1*

Differential Revision: https://reviews.llvm.org/D127210
2022-06-14 11:11:22 +01:00
Sander de Smalen 9c38fc111b [AArch64] Remove references to Streaming SVE from target features.
Following discussion on D120261 and D121208 it seems better to remove the
concept of Streaming SVE from the subtarget/assembler predicates and
instead reason about 'SVE' and 'SME' as its higher level features, rather
than trying to model this runtime mode through explicit feature flags.

This patch is largely NFC.

Reviewed By: paulwalker-arm, david-arm

Differential Revision: https://reviews.llvm.org/D125977
2022-05-31 16:25:01 +02:00
Paul Walker 84acdd32ca [SVEInstrFormats] Ensure scatter instructions are named consistently. 2022-05-23 20:22:14 +01:00
zhongyunde e1afae0311 [AArch64][SVE] Add some logical operation DestructiveBinaryComm patterns
Add DestructiveBinaryComm* patterns for ORR, EOR, AND and BIC.
The above instructions requires that the source and destination registers are
equal, so use movprfx should be beneficial to performance.
note: BIC (i.e. A & ~B) is not a commutative operation.

Reviewed By: paulwalker-arm, david-arm

Differential Revision: https://reviews.llvm.org/D124224
2022-04-22 20:31:00 +08:00
Peter Waller f1cb816f90 [AArch64][SVE] Mark {CNT*,RDVL,INDEX} as materializable
Differential Revision: https://reviews.llvm.org/D122731
2022-03-31 15:28:24 +00:00
Hsiangkai Wang b8e296cf6a [AArch64][SME] Add rdsvl instruction
This patch adds support for the following SME instruction:

  * RDSVL

The reference can be found here:
https://developer.arm.com/documentation/ddi0602/2021-12

Differential Revision: https://reviews.llvm.org/D120603
2022-02-28 23:14:50 +00:00
Hsiangkai Wang 7dd7cb0487 [AArch64][SME] Add addsvl and addspl instructions
This patch adds support for the following SME instructions:

  * ADDSPL, ADDSVL

The reference can be found here:
https://developer.arm.com/documentation/ddi0602/2021-12

Differential Revision: https://reviews.llvm.org/D120554
2022-02-28 23:14:50 +00:00
Paul Walker 7ab78f34cd [SVE] Refactor complex immediate pattern used by CPY/DUP.
SelectSVE8BitLslImm didn't account for constant values that have a
larger bit width than the result vector's element type.  This only
seems to affect a single corner case when lowering fixed length
vectors but the code itself is also not consistent with how other
related complex patterns are implemented so I've taken the
opportunity to refactor the code.

Differential Revision: https://reviews.llvm.org/D120440
2022-02-25 16:12:35 +00:00
Paul Walker 8ca5be93cc [SVE] Don't custom lower constant predicate ISD:SPLAT_VECTOR operations.
Differential Revision: https://reviews.llvm.org/D120340
2022-02-25 11:32:37 +00:00
David Truby be826cf4f7
[AArch64][NEON][SVE] Lower FCOPYSIGN using AArch64ISD::BSP
This patch modifies the FCOPYSIGN lowering to go through the BSP
pseudo-instruction. This allows the same lowering code for NEON,
SVE and SVE2.

As part of this, lowering for BSP for SVE and SVE2 is also added.

For SVE and NEON this patch is NFC.

Differential Revision: https://reviews.llvm.org/D118394
2022-02-07 14:35:26 +00:00
Matt Devereau 6b73a4cc7d [AArch64][SVE] Remove false register dependency for unary FP convert operations
Generate movprfx for floating point convert zeroing pseudo operations

Differential Revision: https://reviews.llvm.org/D118617
2022-02-04 09:55:39 +00:00
Matt Devereau 1c6dca96ca [AArch64][SVE] Fold vselect into predicated fmul, fsub and fadd
Fold vselect with an unpredicated fmul/fsub/fadd
operand into a predicated fmul/fsub/fadd:

(vselect (p) (op (a) (b)) (a)) => (op -> (p) (a) (b))

Differential Revision: https://reviews.llvm.org/D117689
2022-02-03 13:43:15 +00:00
Paul Walker bcda4c48c8 [SVE] By using SEL when orring predicates we forgo the need for a PTRUE.
Differential Revision: https://reviews.llvm.org/D118463
2022-01-31 19:39:23 +00:00
Paul Walker 804915f5dc [SVE] Extend isel pattern coverage for INCP & DECP.
Adds patterns for:
    add(x, cntp(p, p)) -> incp(x, p)
    sub(x, cntp(p, p)) -> decp(x, p)

Differential Revision: https://reviews.llvm.org/D118567
2022-01-31 19:05:05 +00:00
Paul Walker 30efee764d [SVE] Remove AArch64ISD::PFALSE.
AArch64ISD::PFALSE does not provide any value, in fact it can
prevent common combines from firing.  We only needed to lower
to PFALSE until ISD::SPLAT_VECTOR became generally available.

Differential Revision: https://reviews.llvm.org/D118469
2022-01-29 11:31:00 +00:00
Paul Walker 49178a2c4e [SVE] Extend isel pattern coverage for BIC.
Adds patterns of the form "(and a, (not b)) -> bic".

NOTE: With this support I'm inclined to remove AArch64ISD::BIC,
but will leave that investigation for another time.

Differential Revision: https://reviews.llvm.org/D118365
2022-01-28 13:14:46 +00:00
Sander de Smalen dafd1f29da [AArch64][SVE] Avoid using ptrue for unpredicated predicate AND.
Reviewed By: david-arm

Differential Revision: https://reviews.llvm.org/D118146
2022-01-27 13:00:23 +00:00
Sander de Smalen d58757e522 [AArch64][SVE] Implement PFALSE with explicit AArch64ISD node.
The ISel patterns for PFALSE helps recognise the instructions as being
free of side-effects, which helps MachineCSE remove redundant
PFALSE instructions.

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D118054
2022-01-27 10:30:13 +00:00
Paul Walker 66bd7ebdf7 [SVE] Use DUPM to handling more splat immediate cases.
NOTE: Only considers i64 based vectors at this time because smaller
element types require extra isel operand parsing.

Differential Revision: https://reviews.llvm.org/D118040
2022-01-26 12:04:44 +00:00
Cullen Rhodes eee993ae4c [AArch64][SVE] Fold predicate into compare
Codegen of added testcase before this patch:

  ptrue   p0.s
  cmpgt   p1.s, p0/z, z0.s, z1.s
  cmpge   p2.s, p0/z, z2.s, z1.s
  and     p0.b, p0/z, p1.b, p2.b
  ret

Patterns originally authored by Will Lovett.

Reviewed By: david-arm

Differential Revision: https://reviews.llvm.org/D116749
2022-01-10 10:52:06 +00:00
Paul Walker 22370530a3 [NFC][SVE] Add missing tests for i32 INC/DEC patterns.
D111441 included trunc isel patterns for sve_int_pred_pattern_a
but no accompanying tests. This patch adds the missing tests and
also simplifies the isel patterns that use sve_cnt_shl_imm.

Differential Revision: https://reviews.llvm.org/D115512
2021-12-17 13:13:36 +00:00
Andrew Wei dc7b672f96 [AArch64][SVE] Lower shuffles to permute instructions: rev/revb/revh/revw
Attempt to lower a shuffle as a permute instruction(rev/revb/revh/revw) for fixed length SVE.

Reviewed By: paulwalker-arm

Differential Revision: https://reviews.llvm.org/D114960
2021-12-15 21:53:00 +08:00
Jessica Clarke a3530dc199 [AArch64][NFC] Alter ComplexPattern types to be consistent with their uses
When used as a non-leaf node, TableGen does not currently use the type
of a ComplexPattern for type inference, which also means it does not
check it doesn't conflict with the use. This differs from when used as a
leaf value, where the type is used for inference. Fixing that
discrepancy is something I intend to upstream as a subsequent review.

AArch64 currently has several ComplexPatterns that are used in contexts
where they're expected to be an iPTR. The cases that lead to type
contradictions are separated out in D108759, but there are additional
differences to the TableGen output when using my locally-patched
TableGen. None of these appear to matter, at least for passing all the
CodeGen tests, but it's safer to avoid such changes (and similar changes
were causing issues on some AMDGPU tests, causing failures to select).
Changing these additional ComplexPatterns to use iPTR rather than i64
ensures that the TableGen output remains bit-for-bit identical (compared
to without having this patch and my TableGen patch, as well as the
intermediate state of having this patch but not my TableGen patch), and
more accurately captures the higher-level meaning of these patterns.

Reviewed By: david-arm

Differential Revision: https://reviews.llvm.org/D109034
2021-12-03 07:04:59 +00:00
Jessica Clarke 0cb44cfbb7 [AArch64][NFC] Fix ComplexPattern types conflicting with uses
When used as a non-leaf node, TableGen does not currently use the type
of a ComplexPattern for type inference, which also means it does not
check it doesn't conflict with the use. This differs from when used as a
leaf value, where the type is used for inference. Fixing that
discrepancy is something I intend to upstream as a subsequent review,
but these are all the type conflicts found (all legitimate) by my
locally-patched TableGen.

Reviewed By: paulwalker-arm

Differential Revision: https://reviews.llvm.org/D108759
2021-12-03 07:04:59 +00:00
David Sherwood 9cef7c1ca9 [CodeGen][SVE] Add missing isel patterns for vector_reverse
We were missing patterns for vector_reverse of unpacked FP vector
types, as well as all the supported bfloat vectors.

Tests added here:

  CodeGen/AArch64/named-vector-shuffle-reverse-sve.ll

Differential Revision: https://reviews.llvm.org/D114089
2021-11-18 09:59:26 +00:00
Peter Waller 599ea3e73f [AArch64][SVE] Break false dependencies for inactive lanes of FP unary operations
Follow up to D105889, covering instructions using sve_fp_2op_p_zd_HSD:
frintn, frintp, frintm, frintz, frinta, frintx, frinti, frecpx and
fsqrt.

Reviewed By: bsmith

Differential Revision: https://reviews.llvm.org/D113485
2021-11-15 09:15:21 +00:00
Ahmed Bougacha bef777206e [AArch64] Rename some timm predicates for consistency. NFC.
timm isn't the common case, and TImmLeafs should make it clear what
they are.  We're adding a plain ImmLeaf for 0_65535, so rename
i64_imm0_65535 to timm64_0_65535, and imm32_0_7 to timm32_0_7.
2021-10-28 11:41:29 -07:00
David Truby 2e0fb007d6 [llvm][AArch64][SVE] Fold literals into math instructions
SVE has predicated literal forms of some instructions for specific
literals, which currently are generated correctly when using ACLE
but not when those instructions are generated directly.

This adds the patterns to generate those instructions when
generating from standard LLVM IR instructions.

Differential Revision: https://reviews.llvm.org/D99074
2021-10-17 10:57:04 +00:00
Kerry McLaughlin 1a2e90199f [SVE][CodeGen] Add patterns for ADD/SUB + element count
This patch adds patterns to match the following with INC/DEC:
 - @llvm.aarch64.sve.cnt[b|h|w|d] intrinsics + ADD/SUB
 - vscale + ADD/SUB

For some implementations of SVE, INC/DEC VL is not as cheap as ADD/SUB and
so this behaviour is guarded by the "use-scalar-inc-vl" feature flag, which for SVE
is off by default. There are no known issues with SVE2, so this feature is
enabled by default when targeting SVE2.

Reviewed By: david-arm

Differential Revision: https://reviews.llvm.org/D111441
2021-10-13 11:36:15 +01:00
Bradley Smith 5be266db7a [AArch64][SVE] Improve VECTOR_SPLICE codegen for VL > 128-bit
Differential Revision: https://reviews.llvm.org/D111135
2021-10-07 15:28:55 +00:00
Peter Waller be26e6ff73 [AArch64][SVE] Remove redundant PTEST following PNEXT/PFIRST
PNEXT and PFIRST set the NZCV flags, so the subsequent PTEST can be
optimized away in AArch64InstrInfo::optimizePTestInstr.

See-also: https://reviews.llvm.org/D93292

Differential Revision: https://reviews.llvm.org/D110177
2021-10-05 15:10:48 +00:00
Cullen Rhodes d42f76fd36 [AArch64][SVE] NFC: Remove unused template args
For sve_fp_3op_p_zds_zx we have zero patterns downstream but the
intrinsic args can be added again if/when the patterns are implemented.

Identified in D109359.

Reviewed By: sdesmalen

Differential Revision: https://reviews.llvm.org/D109429
2021-09-09 07:10:57 +00:00
Cullen Rhodes 5b848a35d2 [AArch64][SVE] NFC: Use stepvector directly in index multiclasses
Also fixes a couple of warnings identified in D109359:

  SVEInstrFormats.td:5099:59: warning: unused template argument: sve_int_index_ri::step_vector
  SVEInstrFormats.td:5133:59: warning: unused template argument: sve_int_index_rr::step_vector

Reviewed By: david-arm

Differential Revision: https://reviews.llvm.org/D109422
2021-09-09 07:10:57 +00:00
Cullen Rhodes 1fe0e6a380 [AArch64][SME] Support ptrue(s) in streaming mode
The ptrue and ptrues instructions are legal in streaming mode, missed in
D106272.

The reference can be found here:
https://developer.arm.com/documentation/ddi0602/2021-06/SVE-Instructions

Reviewed By: paulwalker-arm

Differential Revision: https://reviews.llvm.org/D107807
2021-08-11 07:49:36 +00:00
Bradley Smith 81eafb8a37 [AArch64][SVE] Break false dependencies for inactive lanes of unary operations
Differential Revision: https://reviews.llvm.org/D105889
2021-07-26 15:01:21 +00:00
Caroline Concatto 0bfc26e3a4 [SVE][AArch64] Improve code generation for vector_splice for Imm > 0
This patch implements vector_splice in tablegen for all cases when the
Immediate is positive and lower than the known minimum value of
a scalable vector.
Vector_splice can be implemented using SVE instruction EXT.
For instance :
    @llvm.experimental.vector.splice(Vector_1, Vector_2, Imm)
    @llvm.experimental.vector.splice(<A,B,C,D>, <E,F,G,H>, 1) ==> <B, C, D, E>
        EXT  Vector_1, Vector_2, Imm              // Vector_1 = B, C, D + Vector_2 = E

Depends on D105633

Differential Revision: https://reviews.llvm.org/D106273
2021-07-26 11:45:46 +01:00
Eli Friedman 0ca46a1757 [SelectionDAG] Fix the representation of ISD::STEP_VECTOR.
The existing rule about the operand type is strange.  Instead, just say
the operand is a TargetConstant with the right width.  (Legalization
ignores TargetConstants, so it doesn't matter if that width is legal.)

Highlights:

1. I had to substantially rewrite the AArch64 isel patterns to expect a
TargetConstant.  Nothing too exotic, but maybe a little hairy. Maybe
worth considering a target-specific node with some dagcombines instead
of this complicated nest of isel patterns.
2. Our behavior on RV32 for vectors of i64 has changed slightly. In
particular, we correctly preserve the width of the arithmetic through
legalization.  This changes the DAG a bit. Maybe room for
improvement here.
3. I explicitly defined the behavior around overflow. This is necessary
to make the DAGCombine transforms legal, and I don't think it causes any
practical issues.

Differential Revision: https://reviews.llvm.org/D105673
2021-07-21 10:58:40 -07:00