MVE has a dual lane vector move instruction, capable of moving two
general purpose registers into lanes of a vector register. They look
like one of:
vmov q0[2], q0[0], r2, r0
vmov q0[3], q0[1], r3, r1
They only accept these lane indices though (and only insert into an
i32), either moving lanes 1 and 3, or 0 and 2.
This patch adds some tablegen patterns for them, selecting from vector
inserts elements. Because the insert_elements are know to be
canonicalized to ascending order there are several patterns that we need
to select. These lane indices are:
3 2 1 0 -> vmovqrr 31; vmovqrr 20
3 2 1 -> vmovqrr 31; vmov 2
3 1 -> vmovqrr 31
2 1 0 -> vmovqrr 20; vmov 1
2 0 -> vmovqrr 20
With the top one being the most common. All other potential patterns of
lane indices will be matched by a combination of these and the
individual vmov pattern already present. This does mean that we are
selecting several machine instructions at once due to the need to
re-arrange the inserts, but in this case there is nothing else that will
attempt to match an insert_vector_elt node.
This is a recommit of 6cc3d80a84 after
fixing the backward instruction definitions.
MVE has a dual lane vector move instruction, capable of moving two
general purpose registers into lanes of a vector register. They look
like one of:
vmov q0[2], q0[0], r2, r0
vmov q0[3], q0[1], r3, r1
They only accept these lane indices though (and only insert into an
i32), either moving lanes 1 and 3, or 0 and 2.
This patch adds some tablegen patterns for them, selecting from vector
inserts elements. Because the insert_elements are know to be
canonicalized to ascending order there are several patterns that we need
to select. These lane indices are:
3 2 1 0 -> vmovqrr 31; vmovqrr 20
3 2 1 -> vmovqrr 31; vmov 2
3 1 -> vmovqrr 31
2 1 0 -> vmovqrr 20; vmov 1
2 0 -> vmovqrr 20
With the top one being the most common. All other potential patterns of
lane indices will be matched by a combination of these and the
individual vmov pattern already present. This does mean that we are
selecting several machine instructions at once due to the need to
re-arrange the inserts, but in this case there is nothing else that will
attempt to match an insert_vector_elt node.
Differential Revision: https://reviews.llvm.org/D92553
This adds some extra processing into the Pre-RA ARM load/store optimizer
to detect and merge MVE loads/stores and adds of the same base. This we
don't always turn into a post-inc during ISel, and due to the nature of
it being a graph we don't always know an order to use for the nodes, not
knowing which nodes to make post-inc and which to use the new post-inc
of. After ISel, we have an order that we can use to post-inc the
following instructions.
So this looks for a loads/store with a starting offset of 0, and an
add/sub from the same base, plus a number of other loads/stores. We then
do some checks and convert the zero offset load/store into a postinc
variant. Any loads/stores after it have the offset subtracted from their
immediates. For example:
LDR #4 LDR #4
LDR #0 LDR_POSTINC #16
LDR #8 LDR #-8
LDR #12 LDR #-4
ADD #16
It only handles MVE loads/stores at the moment. Normal loads/store will
be added in a followup patch, they just have some extra details to
ensure that we keep generating LDRD/LDM successfully.
Differential Revision: https://reviews.llvm.org/D77813
This adds Post inc variants of the VLD2/4 and VST2/4 instructions in
MVE. It uses the same mechanism/nodes as Neon, transforming the
intrinsic+add pair into a ARMISD::VLD2_UPD, which gets selected to a
post-inc instruction. The code to do that is mostly taken from the
existing Neon code, but simplified as less variants are needed.
It also fills in some getTgtMemIntrinsic for the arm.mve.vld2/4
instrinsics, which allow the nodes to have MMO's, calculated as the full
length to the memory being loaded/stored.
Differential Revision: https://reviews.llvm.org/D71194