Commit Graph

193 Commits

Author SHA1 Message Date
Sam Parker f88000a4b5 [ARM][MVE] Add VHADD and VHSUB patterns
Add patterns that use a normal, non-wrapping, add and sub nodes along
with an arm vshr imm node.

Differential Revision: https://reviews.llvm.org/D77065
2020-04-17 07:45:15 +01:00
Sam Parker dd8153b757 [ARM][MVE] Tail predicate VML[A|S]LDAV
Make the non-exchanging versions of the multiply add/sub instructions
validForTailPredication.

Differential Revision: https://reviews.llvm.org/D77648
2020-04-15 11:34:39 +01:00
David Green fbd53ffc3a [ARM] MVE VMULL patterns
This adds MVE vmull patterns, which are conceptually the same as
mul(vmovl, vmovl), and so the tablegen patterns follow the same
structure.

For i8 and i16 this is simple enough, but in the i32 version the
multiply (in 64bits) is illegal, meaning we need to catch the pattern
earlier in a dag fold. Because bitcasts are involved in the zext
versions and the patterns are a little different in little and big
endian. I have only added little endian support in this patch.

Differential Revision: https://reviews.llvm.org/D76740
2020-04-02 10:57:40 +01:00
David Green c697dd9ffd [ARM] Make remaining MVE instruction predictable
The unpredictable/hasSideEffects flag is usually inferred by tablegen
from whether the instruction has a tablegen pattern (and that pattern
only has a single output instruction). Now that the MVE intrinsics are
all committed and producing code, the remaining instructions still
marked as unpredictable need to be specially handled. This adds the flag
directly to instructions that need it, notably the V*MLAL instructions
and some of the MOV's.

Differential Revision: https://reviews.llvm.org/D76910
2020-04-02 10:57:40 +01:00
Sam Parker 94b195ff12 [ARM][LowOverheadLoops] Add horizontal reduction support
Add a bit more logic into the 'FalseLaneZeros' tracking to enable
horizontal reductions and also make the VADDV variants
validForTailPredication.

Differential Revision: https://reviews.llvm.org/D76708
2020-03-30 09:55:41 +01:00
David Green c9eaed5149 [ARM] MVE VMOV.i64
In the original batch of MVE VMOVimm code generation VMOV.i64 was left
out due to the way it was done downstream. It turns out that it's fairly
simple though. This adds the codegen for it, similar to NEON.

Bigendian is technically incorrect in this version, which John is fixing
in a Neon patch.
2020-03-30 07:44:23 +01:00
Sam Parker 0e6aa08381 [ARM][MVE] Add DoubleWidthResult flag
Add a flag for those instructions which read from the top/bottom
halves of their inputs and produce a vector of results with double
width elements.

Differential Revision: https://reviews.llvm.org/D76762
2020-03-27 13:44:04 +00:00
David Green 8689f98e9b [ARM] Fix MVE VCMPr f16 pattern
This patterns seemed to be using the f32 instruction, not f16. Fix it to
use the correct one.

Differential Revision: https://reviews.llvm.org/D76841
2020-03-27 11:18:24 +00:00
Sam Parker e87250202d [ARM][MVE] Add HorizontalReduction flag
Add a target flag for instructions that reduce into one, or more,
scalar reg(s), including variants of:
- VADDV
- VABAV
- VMINV/VMAXV
- VMLADAV

Differential Revision: https://reviews.llvm.org/D76683
2020-03-25 11:12:03 +00:00
Sam Parker 6f86e6bf40 [ARM][MVE] Add target flag for narrowing insts
Add a flag, 'RetainsPreviousHalfElement', for operations that operate
on top/bottom halves of their input and only write to half of their
destination, leaving the other half to retain its previous value.

Differential Revision: https://reviews.llvm.org/D76608
2020-03-24 08:36:44 +00:00
Simon Tatham 1adfa4c991 [ARM,MVE] Add ACLE intrinsics for the vaddv/vaddlv family.
Summary:
I've implemented them as target-specific IR intrinsics rather than
using `@llvm.experimental.vector.reduce.add`, on the grounds that the
'experimental' intrinsic doesn't currently have much code generation
benefit, and my replacements encapsulate the sign- or zero-extension
so that you don't expose the illegal MVE vector type (`<4 x i64>`) in
IR.

The machine instructions come in two versions: with and without an
input accumulator. My new IR intrinsics, like the 'experimental' one,
don't take an accumulator parameter: we represent that by just adding
on the input value using an ordinary i32 or i64 add. So if you write
the `vaddvaq` C-language intrinsic with an input accumulator of zero,
it can be optimised to VADDV, and conversely, if you write something
like `x += vaddvq(y)` then that can be combined into VADDVA.

Most of this is achieved in isel lowering, by converting these IR
intrinsics into the existing `ARMISD::VADDV` family of custom SDNode
types. For the difficult case (64-bit accumulators), isel lowering
already implements the optimization of folding an addition into a
VADDLV to make a VADDLVA; so once we've made a VADDLV, our job is
already done, except that I had to introduce a parallel set of ARMISD
nodes for the //predicated// forms of VADDLV.

For the simpler VADDV, we handle the predicated form by just leaving
the IR intrinsic alone and matching it in an ordinary dag pattern.

Reviewers: dmgreen, MarkMurrayARM, miyuki, ostannard

Reviewed By: dmgreen

Subscribers: kristof.beyls, hiraditya, danielkiss, cfe-commits

Tags: #clang

Differential Revision: https://reviews.llvm.org/D76491
2020-03-20 15:42:33 +00:00
Simon Tatham 45a9945b9e [ARM,MVE] Add ACLE intrinsics for the vminv/vmaxv family.
Summary:
I've implemented these as target-specific IR intrinsics, because
they're not //quite// enough like @llvm.experimental.vector.reduce.min
(which doesn't take the extra scalar parameter). Also this keeps the
predicated and unpredicated versions looking similar, and the
floating-point minnm/maxnm versions fold into the same schema.

We had a couple of min/max reductions already implemented, from the
initial pathfinding exercise in D67158. Those were done by having
separate IR intrinsic names for the signed and unsigned integer
versions; as part of this commit, I've changed them to use a flag
parameter indicating signedness, which is how we ended up deciding
that the rest of the MVE intrinsics family ought to work. So now
hopefully the ewhole lot is consistent.

In the new llc test, the output code from the `v8f16` test functions
looks quite unpleasant, but most of it is PCS lowering (you can't pass
a `half` directly in or out of a function). In other circumstances,
where you do something else with your `half` in the same function, it
doesn't look nearly as nasty.

Reviewers: dmgreen, MarkMurrayARM, miyuki, ostannard

Reviewed By: MarkMurrayARM

Subscribers: kristof.beyls, hiraditya, cfe-commits

Tags: #clang

Differential Revision: https://reviews.llvm.org/D76490
2020-03-20 15:42:33 +00:00
David Green b3499f572d [ARM] Change VDUP type to i32 for MVE
The MVE VDUP instruction take a GPR and splats into every lane of a
vector register. Unlike NEON we do not have a VDUPLANE equivalent
instruction, doing the same splat from a fp register. Previously a VDUP
to a v4f32/v8f16 would be represented as a (v4f32 VDUP f32), which
would mean the instruction pattern needs to add a COPY_TO_REGCLASS to
the GPR.

Instead this now converts that earlier during an ISel DAG combine,
converting (VDUP x) to (VDUP (bitcast x)). This can allow instruction
selection to tell that the input needs to be an i32, which in one of the
testcases allows it to use ldr (or specifically ldm) over (vldr;vmov).

Whilst being simple enough for floats, as the types sizes are the same,
these is no BITCAST equivalent for getting a half into a i32. This uses
a VMOVrh ARMISD node, which doesn't know the same tricks yet.

Differential Revision: https://reviews.llvm.org/D76292
2020-03-20 09:48:45 +00:00
Simon Tatham e13d153c1b [ARM,MVE] Add intrinsics for the VQDMLAD family.
Summary:
This is another set of instructions too complicated to be sensibly
expressed in IR by anything short of a target-specific intrinsic.
Given input vectors a,b, the instruction generates intermediate values
2*(a[0]*b[0]+a[1]+b[1]), 2*(a[2]*b[2]+a[3]+b[3]), etc; takes the high
half of each double-width values, and overwrites half the lanes in the
output vector c, which you therefore have to provide the input value
of. Optionally you can swap the elements of b so that the are things
like a[0]*b[1]+a[1]*b[0]; optionally you can round to nearest when
taking the high half; and optionally you can take the difference
rather than sum of the two products. Finally, saturation is applied
when converting back to a single-width vector lane.

Reviewers: dmgreen, MarkMurrayARM, miyuki, ostannard

Reviewed By: miyuki

Subscribers: kristof.beyls, hiraditya, cfe-commits

Tags: #clang

Differential Revision: https://reviews.llvm.org/D76359
2020-03-18 17:11:22 +00:00
Simon Tatham 928776de92 [ARM,MVE] Add intrinsics for the VQDMLAH family.
Summary:
These are complicated integer multiply+add instructions with extra
saturation, taking the high half of a double-width product, and
optional rounding. There's no sensible way to represent that in
standard IR, so I've converted the clang builtins directly to
target-specific intrinsics.

Reviewers: dmgreen, MarkMurrayARM, miyuki, ostannard

Reviewed By: miyuki

Subscribers: kristof.beyls, hiraditya, cfe-commits

Tags: #clang

Differential Revision: https://reviews.llvm.org/D76123
2020-03-18 10:55:04 +00:00
Simon Tatham 28c5d97bee [ARM,MVE] Add intrinsics and isel for MVE integer VMLA.
Summary:
These instructions compute multiply+add in integers, with one of the
operands being a splat of a scalar. (VMLA and VMLAS differ in whether
the splat operand is a multiplier or the addend.)

I've represented these in IR using existing standard IR operations for
the unpredicated forms. The predicated forms are done with target-
specific intrinsics, as usual.

When operating on n-bit vector lanes, only the bottom n bits of the
i32 scalar operand are used. So we have to tell that to isel lowering,
to allow it to remove a pointless sign- or zero-extension instruction
on that input register. That's done in `PerformIntrinsicCombine`, but
first I had to enable `PerformIntrinsicCombine` for MVE targets
(previously all the intrinsics it handled were for NEON), and make it
a method of `ARMTargetLowering` so that it can get at
`SimplifyDemandedBits`.

Reviewers: dmgreen, MarkMurrayARM, miyuki, ostannard

Reviewed By: dmgreen

Subscribers: kristof.beyls, hiraditya, danielkiss, cfe-commits

Tags: #clang

Differential Revision: https://reviews.llvm.org/D76122
2020-03-18 10:55:04 +00:00
Simon Tatham 3f8e714e2f [ARM,MVE] Add intrinsics and isel for MVE fused multiply-add.
Summary:
This adds the ACLE intrinsic family for the VFMA and VFMS
instructions, which perform fused multiply-add on vectors of floats.

I've represented the unpredicated versions in IR using the cross-
platform `@llvm.fma` IR intrinsic. We already had isel rules to
convert one of those into a vector VFMA in the simplest possible way;
but we didn't have rules to detect a negated argument and turn it into
VFMS, or rules to detect a splat argument and turn it into one of the
two vector/scalar forms of the instruction. Now we have all of those.

The predicated form uses a target-specific intrinsic as usual, but
I've stuck to just one, for a predicated FMA. The subtraction and
splat versions are code-generated by passing an fneg or a splat as one
of its operands, the same way as the unpredicated version.

In arm_mve_defs.h, I've had to introduce a tiny extra piece of
infrastructure: a record `id` for use in codegen dags which implements
the identity function. (Just because you can't declare a Tablegen
value of type dag which is //only// a `$varname`: you have to wrap it
in something. Now I can write `(id $varname)` to get the same effect.)

Reviewers: dmgreen, MarkMurrayARM, miyuki, ostannard

Reviewed By: dmgreen

Subscribers: kristof.beyls, hiraditya, danielkiss, cfe-commits, llvm-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D75998
2020-03-12 11:13:50 +00:00
Sam Parker a314050065 [ARM][MVE] VFMA and VFMS validForTailPredication
Add four instructions to the whitelist.

Differential Revision: https://reviews.llvm.org/D75902
2020-03-10 14:58:29 +00:00
Sam Parker 4cf0dddcc6 [ARM][MVE] Enable VMOVN for tail predication
These instructions also don't exchange lanes, so make them legal.

Differential Revision: https://reviews.llvm.org/D75669
2020-03-06 08:59:22 +00:00
Sam Parker 77e30758dd [ARM][MVE] Enable *SHRN* for tail predication
These instructions don't swap lanes so make them valid.

Differential Revision: https://reviews.llvm.org/D75667
2020-03-05 11:00:45 +00:00
Simon Tatham 1a8cbfa514 [ARM,MVE] Add ACLE intrinsics for VCVT[ANPM] family.
Summary:
These instructions convert a vector of floats to a vector of integers
of the same size, with assorted non-default rounding modes.
Implemented in IR as target-specific intrinsics, because as far as I
can see there are no matches for that functionality in the standard IR
intrinsics list.

Reviewers: MarkMurrayARM, dmgreen, miyuki, ostannard

Reviewed By: dmgreen

Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D75255
2020-03-02 10:33:30 +00:00
Simon Tatham b08d2ddd69 [ARM,MVE] Add ACLE intrinsics for VCVT.F32.F16 family.
Summary:
These instructions make a vector of `<4 x float>` by widening every
other lane of a vector of `<8 x half>`.

I wondered about representing these using standard IR, along the lines
of a shufflevector to extract elements of the input into a `<4 x half>`
followed by an `fpext` to turn that into `<4 x float>`. But it looks as
if that would take a lot of work in isel lowering to make it match any
pattern I could sensibly write in Tablegen, and also I haven't been
able to think of any other case where that pattern might be generated
in IR, so there wouldn't be any extra code generation win from doing
it that way.

Therefore, I've just used another target-specific intrinsic. We can
always change it to the other way later if anyone thinks of a good
reason.

(In order to put the intrinsic definition near similar things in
`IntrinsicsARM.td`, I've also lifted the definition of the
`MVEMXPredicated` multiclass higher up the file, without changing it.)

Reviewers: MarkMurrayARM, dmgreen, miyuki, ostannard

Reviewed By: miyuki

Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D75254
2020-03-02 10:33:30 +00:00
Simon Tatham 69441e53c9 [ARM,MVE] Correct MC operands in VCVT.F32.F16. (NFC)
Summary:
The two MVE instructions that convert between v4f32 and v8f16 were
implemented as instances of the same class, with the same MC operand
list.

But that's not really appropriate, because the narrowing conversion
only partially overwrites its output register (it only has 4 f16
values to write into a vector of 8), so even when unpredicated, it
needs a $Qd_src input, a constraint tying that to the $Qd output, and
a vpred_n.

The widening conversion is better represented like any other
instruction that completely replaces its output when unpredicated: it
should have no $Qd_src operand, and instead, a vpred_r containing a
$inactive parameter. That's a better match to other similar
instructions, such as its integer analogue, the VMOVL instruction that
makes a v4i32 by sign- or zero-extending every other lane of a v8i16.

This commit brings the widening VCVT.F32.F16 into line with the other
instructions that behave like it. That means you can write isel
patterns that use it unpredicated, without having to add a pointless
undefined $QdSrc operand.

No existing code generation uses that instruction yet, so there should
be no functional change from this fix.

Reviewers: MarkMurrayARM, dmgreen, miyuki, ostannard

Reviewed By: dmgreen

Subscribers: kristof.beyls, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D75253
2020-03-02 10:33:30 +00:00
Simon Tatham a41ecf0eb0 [ARM,MVE] Add ACLE intrinsics for VQMOV[U]N family.
Summary:
These instructions work like VMOVN (narrowing a vector of wide values
to half size, and overwriting every other lane of an output register
with the result), except that the narrowing conversion is saturating.
They come in three signedness flavours: signed to signed, unsigned to
unsigned, and signed to unsigned. All are represented in IR by a
target-specific intrinsic that takes two separate 'unsigned' flags.

Reviewers: MarkMurrayARM, dmgreen, miyuki, ostannard

Reviewed By: dmgreen

Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D75252
2020-03-02 10:33:30 +00:00
David Green e2a2f3f7fc [ARM] MVE VMLAS
This addes extra patterns for the VMLAS MVE instruction, which performs
Qda = Qda * Qn + Rm, a similar pattern to the existing VMLA. The sinking
of splat(Rm) into the loop is already performed, meaning we just need
extra Pat's in tablegen.

Differential Revision: https://reviews.llvm.org/D75115
2020-02-28 14:27:21 +00:00
Simon Tatham 9eb3cc10b2 [ARM,MVE] Add predicated intrinsics for many unary functions.
Summary:
This commit adds the predicated MVE intrinsics for the same set of
unary operations that I added in their unpredicated forms in

* D74333 (vrint)
* D74334 (vrev)
* D74335 (vclz, vcls)
* D74336 (vmovl)
* D74337 (vmovn)

but since the predicated versions are a lot more similar to each
other, I've kept them all together in a single big patch. Everything
here is done in the standard way we've been doing other predicated
operations: an IR intrinsic called `@llvm.arm.mve.foo.predicated` and
some isel rules that match that alongside whatever they accept for the
unpredicated version of the same instruction.

In order to write the isel rules conveniently, I've refactored the
existing isel rules for the affected instructions into multiclasses
parametrised by a vector-type class, in the usual way. All those
refactorings are intended to leave the existing isel rules unchanged:
the only difference should be that new ones for the predicated
intrinsics are introduced.

The only tiny infrastructure change I needed in this commit was to
change the implementation of `IntrinsicMX` in `arm_mve_defs.td` so
that the records it defines are anonymous rather than named (and use
`NameOverride` to set the output intrinsic name), which allows me to
call it twice in two multiclasses with the same `NAME` without a
tablegen-time error.

Reviewers: dmgreen, MarkMurrayARM, miyuki, ostannard

Reviewed By: MarkMurrayARM

Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D75165
2020-02-26 15:12:07 +00:00
Mikhail Maltsev f4fd7dbf85 [ARM,MVE] Add vqdmull[b,t]q intrinsic families
Summary:
This patch adds two families of ACLE intrinsics: vqdmullbq and
vqdmulltq (including vector-vector and vector-scalar variants) and the
corresponding LLVM IR intrinsics llvm.arm.mve.vqdmull and
llvm.arm.mve.vqdmull.predicated.

Reviewers: simon_tatham, MarkMurrayARM, dmgreen, ostannard

Reviewed By: MarkMurrayARM

Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D74845
2020-02-20 10:51:19 +00:00
Mikhail Maltsev 461fd94f00 [ARM,MVE] Fix predicate types of some intrinsics
Summary:
Some predicated MVE intrinsics return a vector with element size
different from the input vector element size. In this case the
predicate must type correspond to the output vector type.

The following intrinsics use the incorrect predicate type:
* llvm.arm.mve.mull.int.predicated
* llvm.arm.mve.mull.poly.predicated
* llvm.arm.mve.vshll.imm.predicated

This patch fixes the issue.

Reviewers: simon_tatham, dmgreen, ostannard, MarkMurrayARM

Reviewed By: MarkMurrayARM

Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D74838
2020-02-19 16:24:54 +00:00
David Green 33aa5dfe9c [ARM] VMLAVA reduction patterns
Similar to VADDV and VADDLV that have been added recently, this adds
lowering and patterns for VMLAV, VMLAVA, VMLALV and VMLALVA. They
perform the same roles as the add's, just folding a mul into the same
instruction (and so taking two inputs). As such, they need to be lowered
in the same way as the types are often not legal.

Differential Revision: https://reviews.llvm.org/D74390
2020-02-19 12:39:58 +00:00
David Green fceb3e3b4a [ARM] MVE VADDLV lowering
Following on from the extra VADDV lowering, this extends things to
handle VADDLV which allows summing values into a pair of i32 registers,
together treated as a i64. This needs to be done in DAGCombine too as
the types are otherwise illegal, which is a fairly simple addition on
top of the existing code.

There is also a VADDLVA instruction handled here, that adds the incoming
values from the two general purpose registers. As opposed to the
non-long version where we could just add patterns for add(x, VADDV), the
long version needs to handle this early before the i64 has being split
into too many pieces.

Differential Revision: https://reviews.llvm.org/D74224
2020-02-19 11:07:20 +00:00
David Green 51c6e9445c [ARM] Extra MVE VADDV reduction patterns
We already make use of the VADDV vector reduction instruction for cases
where the input and the output start out at the same type. The MVE
instruction however will sum into an i32, so if we are summing a v16i8
into an i32, we can still use the same instructions. In terms of IR,
this looks like a sext of a legal type (v16i8) into a very illegal type
(v16i32) and a vecreduce.add of that into the result. This means we have
to catch the pattern early in a DAG combine, producing a target VADDVs/u
node, where the signedness is now important.

This is the first part, handling VADDV and VADDVA. There are also
VADDVL/VADDVLA instructions, which are interesting because they sum into
a 64bit value. And VMLAV and VMLALV, which are interesting because they
also do a multiply of two values. It may look a little odd in places as
a result.

On it's own this will probably not do very much, as the vectorizer will
not produce this IR yet.

Differential Revision: https://reviews.llvm.org/D74218
2020-02-19 09:45:35 +00:00
Mikhail Maltsev 63809d365e [ARM,MVE] Add vbrsrq intrinsics family
Summary:
This patch adds a new MVE intrinsics family, `vbrsrq`: vector bit
reverse and shift right. The intrinsics are compiled into the VBRSR
instruction. Two new LLVM IR intrinsics were also added: arm.mve.vbrsr
and arm.mve.vbrsr.predicated.

Reviewers: simon_tatham, dmgreen, ostannard, MarkMurrayARM

Reviewed By: simon_tatham

Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D74721
2020-02-18 17:31:21 +00:00
Simon Tatham c32af4447f [ARM,MVE] Add the vmovnbq,vmovntq intrinsic family.
Summary:
These are in some sense the inverse of vmovl[bt]q: they take a vector
of n wide elements and truncate each to half its width. So they only
write half a vector's worth of output data, and therefore they also
take an 'inactive' parameter to provide the other half of the data in
the output vector. So vmovnb overwrites the even lanes of 'inactive'
with the narrowed values from the main input, and vmovnt overwrites
the odd lanes.

LLVM had existing codegen which generates these MVE instructions in
response to IR that takes two vectors of wide elements, or two vectors
of narrow ones. But in this case, we have one vector of each. So my
clang codegen strategy is to narrow the input vector of wide elements
by simply reinterpreting it as the output type, and then we have two
narrow vectors and can represent the operation as a vector shuffle
that interleaves lanes from both of them.

Even so, not all the cases I needed ended up being selected as a
single MVE instruction, so I've added a couple more patterns that spot
combinations of the 'MVEvmovn' and 'ARMvrev32' SDNodes which can be
generated as a VMOVN instruction with operands swapped.

This commit adds the unpredicated forms only.

Reviewers: dmgreen, miyuki, MarkMurrayARM, ostannard

Reviewed By: dmgreen

Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D74337
2020-02-18 09:34:50 +00:00
Simon Tatham 5e97940cd2 [ARM,MVE] Add the vmovlbq,vmovltq intrinsic family.
Summary:
These intrinsics take a vector of 2n elements, and return a vector of
n wider elements obtained by sign- or zero-extending every other
element of the input vector. They're represented in IR as a
shufflevector that extracts the odd or even elements of the input,
followed by a sext or zext.

Existing LLVM codegen already matches this pattern and generates the
VMOVLB instruction (which widens the even-index input lanes). But no
existing isel rule was generating VMOVLT, so I've added some. However,
the new rules currently only work in little-endian MVE, because the
pattern they expect from isel lowering includes a bitconvert which
doesn't have the right semantics in big-endian.

The output of one existing codegen test is improved by those new
rules.

This commit adds the unpredicated forms only.

Reviewers: dmgreen, miyuki, MarkMurrayARM, ostannard

Reviewed By: dmgreen

Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D74336
2020-02-18 09:34:50 +00:00
Simon Tatham 9dcc1667ab [ARM] Allow `ARMVectorRegCast` to match bitconverts too. (NFC)
Summary:
When we start putting instances of `ARMVectorRegCast` in complex isel
patterns, it will be awkward that they're often turned into the more
standard `bitconvert` in little-endian mode. We'd rather not have to
write separate isel patterns for the two endiannesses, matching
different but equivalent cast operations.

This change aims to fix that awkwardness in advance, by turning the
Tablegen record `ARMVectorRegCast` from a simple `SDNode` instance
into a `PatFrags` that can match either kind of cast – with a
predicate that prevents it matching a bitconvert in the big-endian
case, where bitconvert isn't semantically identical.

No existing code generation should be affected by this change, but it
will enable the patterns introduced by D74336 to work in both
endiannesses.

Reviewers: dmgreen

Reviewed By: dmgreen

Subscribers: kristof.beyls, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D74716
2020-02-18 09:34:50 +00:00
Simon Tatham 68b49f7ef4 [ARM,MVE] Add intrinsics vclzq and vclsq.
Summary:
vclzq maps nicely to the existing target-independent @llvm.ctlz IR
intrinsic. But vclsq ('count leading sign bits') has no corresponding
target-independent intrinsic, so I've made up @llvm.arm.mve.vcls.

This commit adds the unpredicated forms only.

Reviewers: dmgreen, miyuki, MarkMurrayARM, ostannard

Reviewed By: miyuki

Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D74335
2020-02-18 09:34:50 +00:00
Simon Tatham c8b3196e54 [ARM,MVE] Add intrinsics for FP rounding operations.
Summary:
This adds the unpredicated forms of six different MVE intrinsics which
all round a vector of floating-point numbers to integer values,
leaving them still in FP format, differing only in rounding mode and
exception settings.

Five of them map to existing target-independent intrinsics in LLVM IR,
such as @llvm.trunc and @llvm.rint. The sixth, mapping to the `vrintn`
instruction, is done by inventing a target-specific intrinsic.

(`vrintn` behaves the same as `vrintx` in terms of the output value:
the side effects on the FPSCR flags are the only difference between
the two. But ACLE specifies separate user-callable intrinsics for the
two, so the side effects matter enough to make sure we generate the
right one of the two instructions in each case.)

Reviewers: dmgreen, miyuki, MarkMurrayARM, ostannard

Reviewed By: miyuki

Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D74333
2020-02-18 09:34:50 +00:00
Mikhail Maltsev 489f62e801 [ARM,MVE] Add vector-scalar intrinsics
Summary:
This patch adds vector-scalar variants to the following families of
MVE intrinsics:
* vaddq
* vsubq
* vmulq
* vqaddq
* vqsubq
* vhaddq
* vhsubq
* vqdmulhq
* vqrdmulhq

The vector-scalar variants perform a splat operation on the scalar
operand and then perform the same operations as their vector-vector
counterparts. Code generation is done accordingly (using LLVM IR 'insert'
and 'shuffle' operations which are later converted into an ARMvdup
SDNode).

Reviewers: simon_tatham, dmgreen, MarkMurrayARM, ostannard

Reviewed By: dmgreen

Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D74620
2020-02-17 17:47:05 +00:00
David Green 9d4c597541 [ARM] Fix ReconstructShuffle for bigendian
Simon pointed out that this function is doing a bitcast, which can be
incorrect for big endian. That makes the lowering of VMOVN in MVE
wrong, but the function is shared between Neon and MVE so both can
be incorrect.

This attempts to fix things by using the newly added VECTOR_REG_CAST
instead of the BITCAST. As it may now be used on Neon, I've added the
relevant patterns for it there too. I've also added a quick dag combine
for it to remove them where possible.

Differential Revision: https://reviews.llvm.org/D74485
2020-02-13 09:56:46 +00:00
Mikhail Maltsev 2694cc3dca [ARM][MVE] Add fixed point vector conversion intrinsics
Summary:
This patch implements the following Arm ACLE MVE intrinsics:
* vcvtq_n_*
* vcvtq_m_n_*
* vcvtq_x_n_*

and two corresponding LLVM IR intrinsics:
* int_arm_mve_vcvt_fix (vcvtq_n_*)
* int_arm_mve_vcvt_fix_predicated (vcvtq_m_n_*, vcvtq_x_n_*)

Reviewers: simon_tatham, ostannard, MarkMurrayARM, dmgreen

Reviewed By: MarkMurrayARM

Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D74134
2020-02-06 16:49:45 +00:00
Simon Tatham 961530fdc9 [ARM,MVE] Fix vreinterpretq in big-endian mode.
Summary:
In big-endian MVE, the simple vector load/store instructions (i.e.
both contiguous and non-widening) don't all store the bytes of a
register to memory in the same order: it matters whether you did a
VSTRB.8, VSTRH.16 or VSTRW.32. Put another way, the in-register
formats of different vector types relate to each other in a different
way from the in-memory formats.

So, if you want to 'bitcast' or 'reinterpret' one vector type as
another, you have to carefully specify which you mean: did you want to
reinterpret the //register// format of one type as that of the other,
or the //memory// format?

The ACLE `vreinterpretq` intrinsics are specified to reinterpret the
register format. But I had implemented them as LLVM IR bitcast, which
is specified for all types as a reinterpretation of the memory format.
So a `vreinterpretq` intrinsic, applied to values already in registers,
would code-generate incorrectly if compiled big-endian: instead of
emitting no code, it would emit a `vrev`.

To fix this, I've introduced a new IR intrinsic to perform a
register-format reinterpretation: `@llvm.arm.mve.vreinterpretq`. It's
implemented by a trivial isel pattern that expects the input in an
MQPR register, and just returns it unchanged.

In the clang codegen, I only emit this new intrinsic where it's
actually needed: I prefer a bitcast wherever it will have the right
effect, because LLVM understands bitcasts better. So we still generate
bitcasts in little-endian mode, and even in big-endian when you're
casting between two vector types with the same lane size.

For testing, I've moved all the codegen tests of vreinterpretq out
into their own file, so that they can have a different set of RUN
lines to check both big- and little-endian.

Reviewers: dmgreen, MarkMurrayARM, miyuki, ostannard

Reviewed By: dmgreen

Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D73786
2020-02-03 11:20:06 +00:00
Simon Tatham cf7e98e6f7 [ARM,MVE] Add intrinsics for vdupq.
Summary:
The unpredicated case of this is trivial: the clang codegen just makes
a vector splat of the input, and LLVM isel is already prepared to
handle that. For the predicated version, I've generated a `select`
between the same vector splat and the `inactive` input parameter, and
added new Tablegen isel rules to match that pattern into a predicated
`MVE_VDUP` instruction.

Reviewers: dmgreen, MarkMurrayARM, miyuki, ostannard

Reviewed By: dmgreen

Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D73356
2020-02-03 11:20:06 +00:00
Simon Tatham 4321c6af28 [ARM,MVE] Support immediate vbicq,vorrq,vmvnq intrinsics.
Summary:
Immediate vmvnq is code-generated as a simple vector constant in IR,
and left to the backend to recognize that it can be created with an
MVE VMVN instruction. The predicated version is represented as a
select between the input and the same constant, and I've added a
Tablegen isel rule to turn that into a predicated VMVN. (That should
be better than the previous VMVN + VPSEL: it's the same number of
instructions but now it can fold into an adjacent VPT block.)

The unpredicated forms of VBIC and VORR are done by enabling the same
isel lowering as for NEON, recognizing appropriate immediates and
rewriting them as ARMISD::VBICIMM / ARMISD::VORRIMM SDNodes, which I
then instruction-select into the right MVE instructions (now that I've
also reworked those instructions to use the same MC operand encoding).
In order to do that, I had to promote the Tablegen SDNode instance
`NEONvorrImm` to a general `ARMvorrImm` available in MVE as well, and
similarly for `NEONvbicImm`.

The predicated forms of VBIC and VORR are represented as a vector
select between the original input vector and the output of the
unpredicated operation. The main convenience of this is that it still
lets me use the existing isel lowering for VBICIMM/VORRIMM, and not
have to write another copy of the operand encoding translation code.

This intrinsic family is the first to use the `imm_simd` system I put
into the MveEmitter tablegen backend. So, naturally, it showed up a
bug or two (emitting bogus range checks and the like). Fixed those,
and added a full set of tests for the permissible immediates in the
existing Sema test.

Also adjusted the isel pattern for `vmovlb.u8`, which stopped matching
because lowering started turning its input into a VBICIMM. Now it
recognizes the VBICIMM instead.

Reviewers: dmgreen, MarkMurrayARM, miyuki, ostannard

Reviewed By: dmgreen

Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D72934
2020-01-23 11:53:52 +00:00
Simon Tatham 772e493193 [ARM,MVE] Revise immediate VBIC/VORR to look more like NEON.
Summary:
In NEON, the immediate forms of VBIC and VORR are each represented as
a single MC instruction, which takes its immediate operand already
encoded in a NEON-friendly format: 8 data bits, plus some control bits
indicating how to expand them into a full vector.

In MVE, we represented immediate VBIC and VORR as four separate MC
instructions each, for an 8-bit immediate shifted left by 0, 8, 16 or
24 bits. For each one, the value of the immediate operand is in the
'natural' form, i.e. the numerical value that would actually be BICed
or ORRed into each vector lane (and also the same value shown in
assembly). For example, MVE_VBICIZ16v4i32 takes an operand such as
0xab0000, which NEON would represent as 0xab | (control bits << 8).

The MVE approach is superficially nice (it makes assembly input and
output easy, and it's also nice if you're manually constructing
immediate VBICs). But it turns out that it's better for isel if we
make the NEON and MVE instructions work the same, because the
ARMISD::VBICIMM and VORRIMM node types already encode their immediate
into the NEON format, so it's easier if we can just use it.

Also, this commit reduces the total amount of code rather than
increasing it, which is surely an indication that it really is simpler
to do it this way!

Reviewers: dmgreen, ostannard, miyuki, MarkMurrayARM

Reviewed By: dmgreen

Subscribers: kristof.beyls, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D73205
2020-01-23 11:53:52 +00:00
David Green 58991ba773 [ARM] Mark MVE loads/store as not having side effects
The hasSideEffect parameter is usually automatically inferred from
instruction patterns. For some of our MVE instructions, we do not have
patterns though, such as for the pre/post inc loads and stores. This
instead specifies the flag manually on the base MVE_VLDRSTR_base
tablegen class, making sure we get this correct.

This can help with scheduling multiple loads more optimally. Here I've
added a unittest as a more direct form of testing.

Differential Revision: https://reviews.llvm.org/D73117
2020-01-22 17:56:55 +00:00
Mark Murray b10a0eb04a [ARM][MVE][Intrinsics] Take abs() of VMINNMAQ, VMAXNMAQ intrinsics' first arguments.
Summary: Fix VMINNMAQ, VMAXNMAQ intrinsics; BOTH arguments have the absolute values taken.

Reviewers: dmgreen, simon_tatham

Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D72830
2020-01-20 14:33:26 +00:00
Simon Tatham f3e73e88fd [ARM,MVE] Fix confusing MC names for MVE VMINA/VMAXA insns.
Summary:
A recent commit accidentally defined names like `MVE_VMAXAs8` as
instances of the multiclass `MVE_VMINA`, and vice versa. This has no
effect on the test suite, because nothing directly refers to those
instruction names (the isel patterns are generated in Tablegen using
`!cast<Instruction>(NAME)` inside a lower-level multiclass). But it
means that `llvm-mc -show-inst` was listing VMAXA as VMINA, and it
would also affect any further draft code gen patches that use those
instruction ids.

Reviewers: MarkMurrayARM, dmgreen, miyuki, ostannard

Reviewed By: dmgreen

Subscribers: kristof.beyls, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D73034
2020-01-20 13:25:52 +00:00
David Green ff2e67a4f7 [ARM] MVE VLDn postinc
This adds Post inc variants of the VLD2/4 and VST2/4 instructions in
MVE. It uses the same mechanism/nodes as Neon, transforming the
intrinsic+add pair into a ARMISD::VLD2_UPD, which gets selected to a
post-inc instruction. The code to do that is mostly taken from the
existing Neon code, but simplified as less variants are needed.

It also fills in some getTgtMemIntrinsic for the arm.mve.vld2/4
instrinsics, which allow the nodes to have MMO's, calculated as the full
length to the memory being loaded/stored.

Differential Revision: https://reviews.llvm.org/D71194
2020-01-20 06:57:07 +00:00
Mark Murray da9d57d2c2 [ARM][MVE][Intrinsics] Add VMINAQ, VMINNMAQ, VMAXAQ, VMAXNMAQ intrinsics.
Summary: Add VMINAQ, VMINNMAQ, VMAXAQ, VMAXNMAQ intrinsics and unit tests.

Reviewers: simon_tatham, miyuki, dmgreen

Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D72761
2020-01-15 17:20:15 +00:00
Simon Tatham 71d5454b37 [ARM,MVE] Use the new Tablegen `defvar` and `if` statements.
Summary:
This cleans up a lot of ugly `foreach` bodges that I've been using to
work around the lack of those two language features. Now they both
exist, I can make then all into something more legible!

In particular, in the common pattern in `ARMInstrMVE.td` where a
multiclass defines an `Instruction` instance plus one or more `Pat` that
select it, I've used a `defvar` to wrap `!cast<Instruction>(NAME)` so
that the patterns themselves become a little more legible.

Replacing a `foreach` with a `defvar` removes a level of block
structure, so several pieces of code have their indentation changed by
this patch. Best viewed with whitespace ignored.

NFC: the output of `llvm-tblgen -print-records` on the two affected
Tablegen sources is exactly identical before and after this change, so
there should be no effect at all on any of the other generated files.

Reviewers: MarkMurrayARM, miyuki

Reviewed By: MarkMurrayARM

Subscribers: kristof.beyls, hiraditya, dmgreen, cfe-commits, llvm-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D72690
2020-01-14 12:08:03 +00:00