Commit Graph

18 Commits

Author SHA1 Message Date
David Green 2753722b0f [ARM] Mark MVE_VMOV_to_lane_32 as isInsertSubregLike
This allows the peephole optimizer to know that a MVE_VMOV_to_lane_32 is
the same as an insert subreg, allowing it to optimize some redundant
lane moves.

Differential Revision: https://reviews.llvm.org/D95433
2021-02-02 16:35:47 +00:00
David Green ad12e6ee95 [ARM] Turn sext_inreg(VGetLaneu) into VGetLaneu
This adds a DAG combine for converting sext_inreg of VGetLaneu into
VGetLanes, providing the types match correctly.

Differential Revision: https://reviews.llvm.org/D95073
2021-02-01 11:10:35 +00:00
David Green 901cc9b6f3 [ARM] Extend lowering for i64 reductions
The lowering of a <4 x i16> or <4 x i8> vecreduce.add into an i64 would
previously be expanded, due to the i64 not being legal. This patch
adjusts our reduction matchers, making it produce a VADDLV(sext A to
v4i32) instead.

Differential Revision: https://reviews.llvm.org/D93622
2021-01-04 12:44:43 +00:00
David Green f47bac5dd2 [ARM] Extra vecreduce tests with smaller than legal types. NFC 2020-12-20 21:20:39 +00:00
David Green e1c1adf9dc [ARM] Match dual lane vmovs from insert_vector_elt
MVE has a dual lane vector move instruction, capable of moving two
general purpose registers into lanes of a vector register. They look
like one of:
  vmov q0[2], q0[0], r2, r0
  vmov q0[3], q0[1], r3, r1
They only accept these lane indices though (and only insert into an
i32), either moving lanes 1 and 3, or 0 and 2.

This patch adds some tablegen patterns for them, selecting from vector
inserts elements. Because the insert_elements are know to be
canonicalized to ascending order there are several patterns that we need
to select. These lane indices are:

3 2 1 0    -> vmovqrr 31; vmovqrr 20
3 2 1      -> vmovqrr 31; vmov 2
3 1        -> vmovqrr 31
2 1 0      -> vmovqrr 20; vmov 1
2 0        -> vmovqrr 20

With the top one being the most common. All other potential patterns of
lane indices will be matched by a combination of these and the
individual vmov pattern already present. This does mean that we are
selecting several machine instructions at once due to the need to
re-arrange the inserts, but in this case there is nothing else that will
attempt to match an insert_vector_elt node.

This is a recommit of 6cc3d80a84 after
fixing the backward instruction definitions.
2020-12-18 16:13:08 +00:00
David Green 6e913e4451 Revert "[ARM] Match dual lane vmovs from insert_vector_elt"
This one needed more testing.
2020-12-18 13:33:40 +00:00
David Green 6cc3d80a84 [ARM] Match dual lane vmovs from insert_vector_elt
MVE has a dual lane vector move instruction, capable of moving two
general purpose registers into lanes of a vector register. They look
like one of:
  vmov q0[2], q0[0], r2, r0
  vmov q0[3], q0[1], r3, r1
They only accept these lane indices though (and only insert into an
i32), either moving lanes 1 and 3, or 0 and 2.

This patch adds some tablegen patterns for them, selecting from vector
inserts elements. Because the insert_elements are know to be
canonicalized to ascending order there are several patterns that we need
to select. These lane indices are:

3 2 1 0    -> vmovqrr 31; vmovqrr 20
3 2 1      -> vmovqrr 31; vmov 2
3 1        -> vmovqrr 31
2 1 0      -> vmovqrr 20; vmov 1
2 0        -> vmovqrr 20

With the top one being the most common. All other potential patterns of
lane indices will be matched by a combination of these and the
individual vmov pattern already present. This does mean that we are
selecting several machine instructions at once due to the need to
re-arrange the inserts, but in this case there is nothing else that will
attempt to match an insert_vector_elt node.

Differential Revision: https://reviews.llvm.org/D92553
2020-12-15 15:58:52 +00:00
Amara Emerson 322d0afd87 [llvm][mlir] Promote the experimental reduction intrinsics to be first class intrinsics.
This change renames the intrinsics to not have "experimental" in the name.

The autoupgrader will handle legacy intrinsics.

Relevant ML thread: http://lists.llvm.org/pipermail/llvm-dev/2020-April/140729.html

Differential Revision: https://reviews.llvm.org/D88787
2020-10-07 10:36:44 -07:00
David Green 6cfd38d03d [ARM] Fixup single source mla reductions.
This fixes a complication on top of D87276. If we are sign extending
around a mul with the two operands that are the same, instcombine will
helpfully convert one of the sext to a zext. Reverse that so that we
again generate a reduction.

Differnetial Revision: https://reviews.llvm.org/D87287
2020-09-12 14:31:26 +01:00
David Green c437446d90 [ARM] Recognize "double extend" reduction patterns
We can sometimes get code that does:
  xe = zext i16 x to i32
  ye = zext i16 y to i32
  m = mul i32 xe, ye
  me = zext i32 m to i64
  r = vecreduce.add(me)
This "double extend" can trip up the reduction identification, but
should give identical results.

This extends the pattern matching to handle them.

Differential Revision: https://reviews.llvm.org/D87276
2020-09-12 13:51:42 +01:00
David Green 40b72c9c79 [ARM] Extra MLA reductions tests. NFC 2020-09-11 17:51:15 +01:00
David Green 186a7f81e8 [ARM] Add VADDV and VMLAV patterns for v16i16
This adds patterns for v16i16's vecreduce, using all the existing code
to go via an i32 VADDV/VMLAV and truncating the result.

Differential Revision: https://reviews.llvm.org/D85452
2020-08-09 11:09:49 +01:00
David Green 747c574b94 [ARM] Extra MVE VMLAV reduction patterns
These patterns for i8 and i16 VMLA's were missing. They end up from
legalized vector.reduce.add.v8i16 and vector.reduce.add.v16i8, and
although the instruction works differently (the mul and add are
performed in a higher precision), I believe it is OK because only an
i8/i16 are demanded from them, and so the results will be the same. At
least, they pass any testing I can think to run on them.

There are some tests that end up looking worse, but are quite artificial
due to passing half vector types through a call boundary. I would not
expect the vmull to realistically come up like that, and a vmlava is
likely better a lot of the time.

Differential Revision: https://reviews.llvm.org/D80524
2020-05-29 16:23:24 +01:00
David Green eecba95067 [ARM] Replace arm vendor with none. NFC 2020-04-22 18:19:35 +01:00
David Green fbd53ffc3a [ARM] MVE VMULL patterns
This adds MVE vmull patterns, which are conceptually the same as
mul(vmovl, vmovl), and so the tablegen patterns follow the same
structure.

For i8 and i16 this is simple enough, but in the i32 version the
multiply (in 64bits) is illegal, meaning we need to catch the pattern
earlier in a dag fold. Because bitcasts are involved in the zext
versions and the patterns are a little different in little and big
endian. I have only added little endian support in this patch.

Differential Revision: https://reviews.llvm.org/D76740
2020-04-02 10:57:40 +01:00
David Green c9eaed5149 [ARM] MVE VMOV.i64
In the original batch of MVE VMOVimm code generation VMOV.i64 was left
out due to the way it was done downstream. It turns out that it's fairly
simple though. This adds the codegen for it, similar to NEON.

Bigendian is technically incorrect in this version, which John is fixing
in a Neon patch.
2020-03-30 07:44:23 +01:00
David Green 33aa5dfe9c [ARM] VMLAVA reduction patterns
Similar to VADDV and VADDLV that have been added recently, this adds
lowering and patterns for VMLAV, VMLAVA, VMLALV and VMLALVA. They
perform the same roles as the add's, just folding a mul into the same
instruction (and so taking two inputs). As such, they need to be lowered
in the same way as the types are often not legal.

Differential Revision: https://reviews.llvm.org/D74390
2020-02-19 12:39:58 +00:00
David Green 0ac4f6b627 [ARM] MVE vector reduce MLA tests. NFC. 2020-02-17 11:54:04 +00:00