Commit Graph

10 Commits

Author SHA1 Message Date
David Green 48cef1fa8e [ARM] Create VMOVRRD from adjacent vector extracts
This adds a combine for extract(x, n); extract(x, n+1)  ->
VMOVRRD(extract x, n/2). This allows two vector lanes to be moved at the
same time in a single instruction, and thanks to the other VMOVRRD folds
we have added recently can help reduce the amount of executed
instructions. Floating point types are very similar, but will include a
bitcast to an integer type.

This also adds a shouldRewriteCopySrc, to prevent copy propagation from
DPR to SPR, which can break as not all DPR regs can be extracted from
directly.  Otherwise the machine verifier is unhappy.

Differential Revision: https://reviews.llvm.org/D100244
2021-04-20 15:15:43 +01:00
David Green a968e7b82e [ARM] KnownBits for CSINC/CSNEG/CSINV
This adds some simple known bits handling for the three CSINC/NEG/INV
instructions. From the operands known bits we can compute the common
bits of the first operand and incremented/negated/inverted second
operand. The first, especially CSINC ZR, ZR, comes up fair amount in the
tests. The others are more rare so a unit test for them is added.

Differential Revision: https://reviews.llvm.org/D97788
2021-03-04 08:40:20 +00:00
David Green e1c1adf9dc [ARM] Match dual lane vmovs from insert_vector_elt
MVE has a dual lane vector move instruction, capable of moving two
general purpose registers into lanes of a vector register. They look
like one of:
  vmov q0[2], q0[0], r2, r0
  vmov q0[3], q0[1], r3, r1
They only accept these lane indices though (and only insert into an
i32), either moving lanes 1 and 3, or 0 and 2.

This patch adds some tablegen patterns for them, selecting from vector
inserts elements. Because the insert_elements are know to be
canonicalized to ascending order there are several patterns that we need
to select. These lane indices are:

3 2 1 0    -> vmovqrr 31; vmovqrr 20
3 2 1      -> vmovqrr 31; vmov 2
3 1        -> vmovqrr 31
2 1 0      -> vmovqrr 20; vmov 1
2 0        -> vmovqrr 20

With the top one being the most common. All other potential patterns of
lane indices will be matched by a combination of these and the
individual vmov pattern already present. This does mean that we are
selecting several machine instructions at once due to the need to
re-arrange the inserts, but in this case there is nothing else that will
attempt to match an insert_vector_elt node.

This is a recommit of 6cc3d80a84 after
fixing the backward instruction definitions.
2020-12-18 16:13:08 +00:00
David Green 6e913e4451 Revert "[ARM] Match dual lane vmovs from insert_vector_elt"
This one needed more testing.
2020-12-18 13:33:40 +00:00
David Green 6cc3d80a84 [ARM] Match dual lane vmovs from insert_vector_elt
MVE has a dual lane vector move instruction, capable of moving two
general purpose registers into lanes of a vector register. They look
like one of:
  vmov q0[2], q0[0], r2, r0
  vmov q0[3], q0[1], r3, r1
They only accept these lane indices though (and only insert into an
i32), either moving lanes 1 and 3, or 0 and 2.

This patch adds some tablegen patterns for them, selecting from vector
inserts elements. Because the insert_elements are know to be
canonicalized to ascending order there are several patterns that we need
to select. These lane indices are:

3 2 1 0    -> vmovqrr 31; vmovqrr 20
3 2 1      -> vmovqrr 31; vmov 2
3 1        -> vmovqrr 31
2 1 0      -> vmovqrr 20; vmov 1
2 0        -> vmovqrr 20

With the top one being the most common. All other potential patterns of
lane indices will be matched by a combination of these and the
individual vmov pattern already present. This does mean that we are
selecting several machine instructions at once due to the need to
re-arrange the inserts, but in this case there is nothing else that will
attempt to match an insert_vector_elt node.

Differential Revision: https://reviews.llvm.org/D92553
2020-12-15 15:58:52 +00:00
David Green eecba95067 [ARM] Replace arm vendor with none. NFC 2020-04-22 18:19:35 +01:00
David Green 2f3574c168 [ARM] Ignore Implicit CPSR regs when lowering from Machine to MC operands
The code here seems to date back to r134705, when tablegen lowering was first
being added. I don't believe that we need to include CPSR implicit operands on
the MCInst. This now works more like other backends (like AArch64), where all
implicit registers are skipped.

This allows the AliasInst for CSEL's to match correctly, as can be seen in the
test changes.

Differential revision: https://reviews.llvm.org/D66703

llvm-svn: 370745
2019-09-03 11:30:54 +00:00
David Green 57cc65ff47 [ARM] Generate 8.1-m CSINC, CSNEG and CSINV instructions.
Arm 8.1-M adds a number of related CSEL instructions, including CSINC, CSNEG and CSINV. These choose between two values given the content in CPSR and a condition, performing an increment, negation or inverse of the false value.

This adds some selection for them, either from constant values or patterns. It does not include CSEL directly, which is currently not always making code better. It is still useful, but we will have to check more carefully where it should and shouldn't be used.

Code by Ranjeet Singh and Simon Tatham, with some modifications from me.

Differential revision: https://reviews.llvm.org/D66483

llvm-svn: 370739
2019-09-03 10:53:07 +00:00
David Green c7e55d4f52 [ARM] MVE predicate register support
This adds support code for building and shuffling i1 predicate registers. It
generally uses two basic principles, either converting the predicate into an
scalar (through a PREDICATE_CAST) and doing scalar operations on it there, or
by converting the register to an full vector register and back.

Some of the code here is a not super efficient but will hopefully cover most
cases of moving i1 vectors around and can be improved in subsequent patches.

Some code by David Sherwood.

Differential Revision: https://reviews.llvm.org/D65052

llvm-svn: 366890
2019-07-24 11:51:36 +00:00
David Green b9d96ceca0 [ARM] MVE integer compares and selects
This adds the very basics for MVE vector predication, adding integer VCMP and
VSEL instruction support. This is done through predicate registers (MVT::v16i1,
MVT::v8i1, MVT::v4i1), but otherwise using same mechanics as NEON to custom
lower setcc's through ARMISD::VCXX nodes (VCEQ, VCGT, VCEQZ, etc).

An extra VCNE was added, as this can be handled sensibly by MVE's expanded
number of VCMP condition codes. (There are also VCLE and VCLT which are added
later).

VPSEL is also added here, simply selecting on the vselect.

Original code by David Sherwood.

Differential Revision: https://reviews.llvm.org/D65051

llvm-svn: 366885
2019-07-24 11:08:14 +00:00