Commit Graph

22 Commits

Author SHA1 Message Date
Simon Pilgrim a71ad6a3c8 [DAG] visitINSERT_VECTOR_ELT - fold insert_vector_elt(scalar_to_vector(x),v,i) -> build_vector()
Allow scalar_to_vector nodes to be used for the start of a build_vector creation
2022-06-11 15:29:22 +01:00
Alexey Bataev 2cca53c815 [DAG]Introduce llvm::processShuffleMasks and use it for shuffles in DAG Type Legalizer.
We can process the long shuffles (working across several actual
vector registers) in the best way if we take the actual register
represantion into account. We can build more correct representation of
register shuffles, improve number of recognised buildvector sequences.
Also, same function can be used to improve the cost model for the
shuffles. in future patches.

Part of D100486

Differential Revision: https://reviews.llvm.org/D115653
2022-04-20 09:37:16 -07:00
Alexey Bataev 5f7ac15912 Revert "[DAG]Introduce llvm::processShuffleMasks and use it for shuffles in DAG Type Legalizer."
This reverts commit 2f49163b33 to fix
a buildbot failure. Reported in https://lab.llvm.org/buildbot#builders/105/builds/24284
2022-04-20 06:35:55 -07:00
Alexey Bataev 2f49163b33 [DAG]Introduce llvm::processShuffleMasks and use it for shuffles in DAG Type Legalizer.
We can process the long shuffles (working across several actual
vector registers) in the best way if we take the actual register
represantion into account. We can build more correct representation of
register shuffles, improve number of recognised buildvector sequences.
Also, same function can be used to improve the cost model for the
shuffles. in future patches.

Part of D100486

Differential Revision: https://reviews.llvm.org/D115653
2022-04-20 05:32:56 -07:00
David Green 52e0cf9d61 [ARM] Enable subreg liveness
This enables subreg liveness in the arm backend when MVE is present,
which allows the register allocator to detect when subregister are
alive/dead, compared to only acting on full registers. This can helps
produce better code on MVE with the way MQPR registers are made up of
SPR registers, but is especially helpful for MQQPR and MQQQQPR
registers, where there are very few "registers" available and being able
to split them up into subregs can help produce much better code.

Differential Revision: https://reviews.llvm.org/D107642
2021-08-17 14:10:33 +01:00
David Green 371ee32e01 [ARM] Fold extract of ARM_BUILD_VECTOR
This adds a small fold for extract (ARM_BUILD_VECTOR) to fold to the
original node. This can help simplify the resulting codegen in some
cases.

Differential Revision: https://reviews.llvm.org/D104860
2021-06-29 11:03:19 +01:00
David Green 48cef1fa8e [ARM] Create VMOVRRD from adjacent vector extracts
This adds a combine for extract(x, n); extract(x, n+1)  ->
VMOVRRD(extract x, n/2). This allows two vector lanes to be moved at the
same time in a single instruction, and thanks to the other VMOVRRD folds
we have added recently can help reduce the amount of executed
instructions. Floating point types are very similar, but will include a
bitcast to an integer type.

This also adds a shouldRewriteCopySrc, to prevent copy propagation from
DPR to SPR, which can break as not all DPR regs can be extracted from
directly.  Otherwise the machine verifier is unhappy.

Differential Revision: https://reviews.llvm.org/D100244
2021-04-20 15:15:43 +01:00
Simon Pilgrim 9dd83f5ee8 [DAG] visitVECTOR_SHUFFLE - attempt to match commuted shuffles with MergeInnerShuffle.
Try to match "shuffle(C, shuffle(A, B, M0), M1) -> shuffle(A, B, M2)" etc. by using MergeInnerShuffle's commuted inner shuffle mode.
2021-03-01 10:42:11 +00:00
David Green 21a4faab60 [ARM] Move double vector insert patterns using vins to DAG combine
This removes the existing patterns for inserting two lanes into an
f16/i16 vector register using VINS, instead using a DAG combine to
pattern match the same code sequences. The tablegen patterns were
already on the large side (foreach LANE = [0, 2, 4, 6]) and were not
handling all the cases they could. Moving that to a DAG combine, whilst
not less code, allows us to better control and expand the selection of
VINSs. Additionally this allows us to remove the AddedComplexity on
VCVTT.

The extra trick that this has learned in the process is to move two
adjacent lanes using a single f32 vmov, allowing some extra
inefficiencies to be removed.

Differenial Revision: https://reviews.llvm.org/D96876
2021-02-22 09:29:47 +00:00
David Green 0c7e044a7f [ARM] One-off identity shuffle
A One-Off Identity mask is a shuffle that is mostly an identity mask
from as single source but contains a single element out-of-place, either
from a different vector or from another position in the same vector. As
opposed to lowering this via a ARMISD::BUILD_VECTOR we can generate an
extract/insert pair directly. Under ARM with individually accessible
lane elements this often becomes a simple lane move.

This also alters the LowerVECTOR_SHUFFLEUsingMovs code to use v4f32 (not
v4i32), a more natural type for lane moves.

Differential Revision: https://reviews.llvm.org/D95551
2021-02-08 21:24:32 +00:00
David Green 1b435eb8f3 [ARM] i16 insert-of-extract to VINS pattern
This adds another tablegen fold that converts an i16 odd-lane-insert of
an even-lane-extract into a VINS. We extract the existing f32 value from
the destination register and VINS the new value into it. The rest of the
backend then is able to optimize the INSERT_SUBREG / COPY_TO_REGCLASS /
EXTRACT_SUBREG.

Differential Revision: https://reviews.llvm.org/D95456
2021-02-08 08:41:07 +00:00
David Green c722575633 [ARM] Select VINS from vector inserts
This patch adds tablegen patterns for pairs of i16/f16 insert/extracts.
If we are inserting into two adjacent vector lanes (0 and 1 for
example), we can use either a vmov;vins or vmovx;vins to insert the pair
together, avoiding a round-trip from GRP registers. This is quite a
large patterns with a number of EXTRACT_SUBREG/INSERT_SUBREG/
COPY_TO_REGCLASS nodes, but hopefully as most of those become copies all
that will be cleaned up by further optimizations.

The VINS pattern was also adjusted to allow it to represent that it is
inserting into the top half of an existing register.

Differential Revision: https://reviews.llvm.org/D95381
2021-02-02 13:50:02 +00:00
David Green e1c1adf9dc [ARM] Match dual lane vmovs from insert_vector_elt
MVE has a dual lane vector move instruction, capable of moving two
general purpose registers into lanes of a vector register. They look
like one of:
  vmov q0[2], q0[0], r2, r0
  vmov q0[3], q0[1], r3, r1
They only accept these lane indices though (and only insert into an
i32), either moving lanes 1 and 3, or 0 and 2.

This patch adds some tablegen patterns for them, selecting from vector
inserts elements. Because the insert_elements are know to be
canonicalized to ascending order there are several patterns that we need
to select. These lane indices are:

3 2 1 0    -> vmovqrr 31; vmovqrr 20
3 2 1      -> vmovqrr 31; vmov 2
3 1        -> vmovqrr 31
2 1 0      -> vmovqrr 20; vmov 1
2 0        -> vmovqrr 20

With the top one being the most common. All other potential patterns of
lane indices will be matched by a combination of these and the
individual vmov pattern already present. This does mean that we are
selecting several machine instructions at once due to the need to
re-arrange the inserts, but in this case there is nothing else that will
attempt to match an insert_vector_elt node.

This is a recommit of 6cc3d80a84 after
fixing the backward instruction definitions.
2020-12-18 16:13:08 +00:00
David Green 6e913e4451 Revert "[ARM] Match dual lane vmovs from insert_vector_elt"
This one needed more testing.
2020-12-18 13:33:40 +00:00
David Green 6cc3d80a84 [ARM] Match dual lane vmovs from insert_vector_elt
MVE has a dual lane vector move instruction, capable of moving two
general purpose registers into lanes of a vector register. They look
like one of:
  vmov q0[2], q0[0], r2, r0
  vmov q0[3], q0[1], r3, r1
They only accept these lane indices though (and only insert into an
i32), either moving lanes 1 and 3, or 0 and 2.

This patch adds some tablegen patterns for them, selecting from vector
inserts elements. Because the insert_elements are know to be
canonicalized to ascending order there are several patterns that we need
to select. These lane indices are:

3 2 1 0    -> vmovqrr 31; vmovqrr 20
3 2 1      -> vmovqrr 31; vmov 2
3 1        -> vmovqrr 31
2 1 0      -> vmovqrr 20; vmov 1
2 0        -> vmovqrr 20

With the top one being the most common. All other potential patterns of
lane indices will be matched by a combination of these and the
individual vmov pattern already present. This does mean that we are
selecting several machine instructions at once due to the need to
re-arrange the inserts, but in this case there is nothing else that will
attempt to match an insert_vector_elt node.

Differential Revision: https://reviews.llvm.org/D92553
2020-12-15 15:58:52 +00:00
David Green 6eee2d9b5b [ARM] Convert VDUPLANE to VDUP under MVE
Unlike Neon, MVE does not have a way of duplicating from a vector lane,
so a VDUPLANE currently selects to a VDUP(move_from_lane(..)). This
forces that to be done earlier as a dag combine to allow other folds to
happen.

It converts to a VDUP(EXTRACT). On FP16 this is then folded to a
VGETLANEu to prevent it from creating a vmovx;vmovhr pair, using a
single move_from_reg instead.

Differential Revision: https://reviews.llvm.org/D79606
2020-05-09 18:58:13 +01:00
Simon Pilgrim 70293ba26f [DAG] SimplifyMultipleUseDemandedBits - remove superfluous bitcasts
If the SimplifyMultipleUseDemandedBits calls BITCASTs that peek through back to the original type then we can remove the BITCASTs entirely.

Differential Revision: https://reviews.llvm.org/D79572
2020-05-08 19:04:49 +01:00
David Green eecba95067 [ARM] Replace arm vendor with none. NFC 2020-04-22 18:19:35 +01:00
David Green c697dd9ffd [ARM] Make remaining MVE instruction predictable
The unpredictable/hasSideEffects flag is usually inferred by tablegen
from whether the instruction has a tablegen pattern (and that pattern
only has a single output instruction). Now that the MVE intrinsics are
all committed and producing code, the remaining instructions still
marked as unpredictable need to be specially handled. This adds the flag
directly to instructions that need it, notably the V*MLAL instructions
and some of the MOV's.

Differential Revision: https://reviews.llvm.org/D76910
2020-04-02 10:57:40 +01:00
Simon Pilgrim eb0e1978df [TargetLowering] SimplifyDemandedBits - call SimplifyMultipleUseDemandedBits for ISD::EXTRACT_VECTOR_ELT (REAPPLIED)
This patch attempts to peek through vectors based on the demanded bits/elt of a particular ISD::EXTRACT_VECTOR_ELT node, allowing us to avoid dependencies on ops that have no impact on the extract.

In particular this helps remove some unnecessary scalar->vector->scalar patterns.

The wasm shift patterns are annoying - @tlively has indicated that the wasm vector shift codegen are to be refactored in the near-term and isn't considered a major issue.

Reapplied after reversion at rL368660 due to PR42982 which was fixed at rGca7fdd41bda0.

Differential Revision: https://reviews.llvm.org/D65887
2020-01-04 13:15:50 +00:00
David Green 792fab343b [ARM] Attempt to use whole register vmovs for MVE shuffles.
MVE doesn't have the range of shuffle instructions available in Neon. We
also cannot use the trick of cutting a difficult vector shuffle in half
to simplify things. Instead we need to be more careful about how we
lower shuffles.

This patch adds an extra combine that attempts to find "whole lane"
vmovs when lowering shuffles of smaller types. This helps us make some
shuffles a lot simpler, generating single lane movs for the parts that
can make use of it, falling back to the original shuffle for the rest.

Differential Revision: https://reviews.llvm.org/D69509
2019-12-08 10:53:54 +00:00
David Green 411bfe476b [ARM] Add and update a lot of VLDn tests. NFC 2019-11-19 18:37:30 +00:00