Commit Graph

344 Commits

Author SHA1 Message Date
Craig Topper 87afefcd22 [RISCV] Fix mistake in comment. NFC 2021-04-19 11:15:32 -07:00
Craig Topper 7ed01a420a [RISCV] Pad v4i1/v2i1/v1i1 stores with 0s to make a full byte.
As noted in the FIXME there's a sort of agreement that the any
extra bits stored will be 0.

The generated code is pretty terrible. I was really hoping we
could use a tail undisturbed trick, but tail undisturbed no
longer applies to masked destinations in the current draft
spec.

Fingers crossed that it isn't common to do this. I doubt IR
from clang or the vectorizer would ever create this kind of store.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D100618
2021-04-19 11:05:18 -07:00
Fraser Cormack c9a93c3e01 [RISCV] Lower vector shuffles to vrgather operations
This patch extends the lowering of RVV fixed-length vector shuffles to
avoid the default stack expansion and instead lower to vrgather
instructions.

For "permute"-style shuffles where one vector is swizzled, we can lower
to one vrgather. For shuffles involving two vector operands, we lower to
one unmasked vrgather (or splat, where appropriate) followed by a masked
vrgather which blends in the second half.

On occasion, when it's not possible to create a legal BUILD_VECTOR for
the indices, we use vrgatherei16 instructions with 16-bit index types.

For 8-bit element vectors where we may have indices over 255, we have a
fairly blunt fallback to the stack expansion to avoid custom-splitting
of the vector types.

To enable the selection of masked vrgather instructions, this patch
extends the various RISCVISD::VRGATHER nodes to take a passthru operand.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D100549
2021-04-19 11:13:13 +01:00
Craig Topper 1afdfc6169 [RISCV] Rename RISCVISD::GREVI(W)/GORCI(W) to RISCVISD::GREV(W)/GORC(W). Don't require second operand to be a constant.
Prep work for adding intrinsics for these instructions in the
future.
2021-04-13 11:04:28 -07:00
Craig Topper 7c9bbbf735 [RISCV] Rename RISCVISD::SHFLI to RISCVISD::SHFL and don't require the second operand to be an immediate.
Prep work for adding intrinsics in the future.

Left an assert that the input is constant in ReplaceNodeResults,
as the intrinsic shouldn't go through that path.
2021-04-12 23:46:50 -07:00
Craig Topper cb4c793e46 [RISCV] Update computeKnownBitsForTargetNode to treat READ_VLENB as being 16 byte aligned.
According to the 0.10 spec, VLEN is at least 128 bits and is a
power of 2.
2021-04-11 17:54:23 -07:00
Craig Topper bc0e052730 [RISCV] Teach targetShrinkDemandedConstant to preserve (and X, 0xffff) when zext.h is supported.
Similar to what we do for zext.w.

Disable the (srl (and X, 0xffff), C) custom isel when zext.h is
available.
2021-04-11 10:03:35 -07:00
Fraser Cormack a5693445ca [RISCV] Support OR/XOR/AND reductions on vector masks
This patch adds RVV codegen support for OR/XOR/AND reductions for both
scalable- and fixed-length vector types. There are a few possible
codegen strategies for each -- vmfirst.m, vmsbf.m, and vmsif.m could be
used to some extent -- but the vpopc.m instruction was chosen since it
produces the scalar result in one instruction, after which scalar
instructions can finish off the computation.

The reductions are lowered identically for both scalable- and
fixed-length vectors, although some alternate strategies may be more
optimal on fixed-length vectors since it's cheaper to get the length of
those types.

Other reduction types were not deemed to be relevant for mask vectors.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D100030
2021-04-08 09:46:38 +01:00
Serge Pavlov 65b1103798 [RISCV] DAG nodes and pseudo instructions for CSR access
New custom DAG nodes were added to represent operations on CSR. These
nodes are lowered to corresponding pseudo instruction. Using the pseudo
instructions allows to specify different scheduling information for
operations on different system registers. It also make possible to
specify dependencies of instructions on specific system registers.

Differential Revision: https://reviews.llvm.org/D98936
2021-04-08 10:36:36 +07:00
Craig Topper 56ea2e2fdd [RISCV] Add a special case to lowerSELECT for select of 2 constants with a SETLT condition.
If the constants have a difference of 1 we can convert one to
the other by adding or subtracting the condition.

We have a DAG combine for this, but it only runs before type
legalization. If the select is introduced later during type
legalization or op legalization we will miss it.

We don't need a specific condition, but some conditions are
harder to materialize than others on RISCV. I know that SETLT
will be a single instruction and it is what is used by the
motivating pattern from signed saturating add/sub.

Differential Revision: https://reviews.llvm.org/D99021
2021-04-07 13:47:17 -07:00
Craig Topper f087d7544a [RISCV] Support vslide1up/down intrinsics for SEW=64 on RV32.
This can't use our normal strategy of splatting the scalar and using
a .vv operation instead of .vx.

Instead this patch bitcasts the vector to the equivalent SEW=32
vector and inserts the scalar parts using two vslide1up/down. We
do that unmasked and apply the mask separately at the end with
a vmerge.

For vslide1up there maybe some other options here like getting
i64 into element 0 and using vslideup.vi with this vector as
vd and the original source as vs1. Masking would still need to
be done afterwards.

That idea doesn't work for vslide1down. We need to slidedown and
then insert a single scalar at vl-1 which we could do with a
vslideup, but that assumes vl > 0 which I don't think we can assume.

The i32 double slide1down implemented here is the best I could come
up with and I just made vslide1up consistent.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D99910
2021-04-07 10:44:53 -07:00
Craig Topper 01a23dccb1 [RISCV] Add an assertion to the ReplaceNodeResults handling of bitcasts to make sure the VT is always a scalar integer. 2021-04-06 16:48:40 -07:00
Craig Topper 2641c1f15e [RISCV] Don't custom type legalize fixed vector to scalar integer bitcasts if the fixed vector type isn't legal.
We encountered a hang in our internal code base. I'm having trouble
creating a test case because the test that hit it was testing some
code that is not upstream.
2021-04-06 15:00:33 -07:00
Craig Topper af2837675a [RISCV] Split RISCVISD::VMV_S_XF_VL into separate integer and FP.
It's a bit silly, but it allows us to write stricter type
constraints for isel. There's still some extra type checks in
the generated table due to some type interference limitations
around HWMode.
2021-04-05 12:57:35 -07:00
Fraser Cormack af3a839c70 [RISCV] Add support for bitcasts between scalars and fixed-length vectors
This patch supports bitcasts from scalar types to fixed-length vectors
and vice versa. It custom-lowers and custom-legalizes them to
EXTRACT_VECTOR_ELT/INSERT_VECTOR_ELT operations, using a single-element
vectors to hold the scalar where appropriate.

Previously, some of these would fail to select, others would be expanded
through stack loads and stores. Effort was made to ensure the codegen
avoids the stack for both legal and illegal scalar types.

Some of the codegen could be improved, but on first glance it looks like
a general optimization of EXTRACT_VECTOR_ELT when extracting an i64
element on RV32.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D99667
2021-04-05 17:21:55 +01:00
Fraser Cormack 3f0df4d7b0 [RISCV] Expand scalable-vector truncstores and extloads
Caught in internal testing, these operations are assumed legal by
default, even for scalable vector types. Expand them back into separate
truncations and stores, or loads and extensions.

Also add explicit fixed-length vector tests for these operations, even
though they should have been correct already.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D99654
2021-04-05 17:03:45 +01:00
Craig Topper 4708a05da0 [RISCV] Use gorciw for i32 orc.b intrinsic when Zbp is enabled.
The W version of orc.b does not exist in Zbp so we need to use
gorci encoding. If we have Zbp, we can use gorciw which can avoid a
sext.w in some cases.
2021-04-04 17:14:28 -07:00
Craig Topper 98d5db3e3a [RISCV] Lower orc.b intrinsic to RISCVISD::GORCI.
This will allow us to share any future known bits, demaned bits,
or sign bits improvements.
2021-04-04 12:31:41 -07:00
Craig Topper a2ea003fcb [RISCV] Don't convert fshr/fshl to target specific FSL/FSR node if shift amount is a constant.
As long as it's a constant we can directly pattern match it
without any problems. It's only when it isn't a constant that
we need to add an AND.

In theory this should allow more target independent optimizations
to remain active.
2021-04-03 23:13:30 -07:00
Levy Hsu 944adbf285 Recommit "[RISCV] Add IR intrinsic for Zbb extension"
Forgot to amend the Author.

Original commit message:

Header files are included in a separate patch in case the name needs to be changed.

RV32 / 64:
orc.b

Differential Revision: https://reviews.llvm.org/D99320
2021-04-02 11:50:19 -07:00
Craig Topper 1f0b309f24 Revert "[RISCV] Add IR intrinsic for Zbb extension"
This reverts commit 1808194590.

I forgot to change the author.
2021-04-02 11:47:02 -07:00
Craig Topper 1808194590 [RISCV] Add IR intrinsic for Zbb extension
Header files are included in a separate patch in case the name needs to be changed.

RV32 / 64:
orc.b
2021-04-02 11:23:57 -07:00
Craig Topper dbbc95e3e5 [RISCV] Use softPromoteHalf legalization for fp16 without Zfh rather than PromoteFloat.
The default legalization strategy is PromoteFloat which keeps
half in single precision format through multiple floating point
operations. Conversion to/from float is done at loads, stores,
bitcasts, and other places that care about the exact size being 16
bits.

This patches switches to the alternative method softPromoteHalf.
This aims to keep the type in 16-bit format between every operation.
So we promote to float and immediately round for any arithmetic
operation. This should be closer to the IR semantics since we
are rounding after each operation and not accumulating extra
precision across multiple operations. X86 is the only other
target that enables this today. See https://reviews.llvm.org/D73749

I had to update getRegisterTypeForCallingConv to force f16 to
use f32 when the F extension is enabled. This way we can still
pass it in the lower bits of an FPR for ilp32f and lp64f ABIs.
The softPromoteHalf would otherwise always give i16 as the
argument type.

Reviewed By: asb, frasercrmck

Differential Revision: https://reviews.llvm.org/D99148
2021-04-01 12:41:57 -07:00
Craig Topper d157e3f387 [RISCV] Fix handling of nxvXi64 vmsgt(u).vx intrinsics on RV32.
We need to splat the scalar separately and use .vv, but there is
no vmsgt(u).vv. So add isel patterns to select vmslt(u).vv with
swapped operands.

We also need to get VT to use for the splat from an operand rather
than the result since the result VT is nxvXi1.

Reviewed By: HsiangKai

Differential Revision: https://reviews.llvm.org/D99704
2021-04-01 10:38:05 -07:00
Craig Topper b7c2e577cc [RISCV] Add custom type legalization to form MULHSU when possible.
There's no target independent ISD opcode for MULHSU, so custom
legalize 2*XLen multiplies ourselves. We have to be a little
careful to prefer MULHU or MULHSU.

I thought about doing this in isel by pattern matching the
(add (mul X, (srai Y, XLen-1)), (mulhu X, Y)) pattern. I decided
against this because the add might become part of a chain of adds.
I don't trust DAG combine not to reassociate with other adds making
it difficult to find both pieces again.

Reviewed By: asb

Differential Revision: https://reviews.llvm.org/D99479
2021-04-01 10:15:55 -07:00
Craig Topper 2a8b7cab6a [RISCV] Add RISCVISD opcodes for CLZW and CTZW.
Our CLZW isel pattern is quite easily broken by surrounding code
preventing it from matching sometimes. This usually results in
failing to remove the and X, 0xffffffff inserted by type
legalization. The add with -32 that type legalization also inserts
will often gets combined into other add/sub nodes. That doesn't
usually result in extra code when we don't use clzw.

CTTZ seems to be less fragile, but I wanted to keep it consistent
with CTLZ.

Reviewed By: asb, HsiangKai

Differential Revision: https://reviews.llvm.org/D99317
2021-03-31 09:40:07 -07:00
Fraser Cormack 10fc6e4358 [RISCV] Add support for the stepvector intrinsic
This adds almost everything required for supporting the new stepvector
intrinsic on RVV. It is lowered to the existing VID_VL SDNode.

The only exception is a limitation that RV32 cannot yet lower the
intrinsic on i64 vectors. This is because the step operand is
(currently) required to be at least as large as the vector element type.
I will look into patching that out and loosening the requirement to only
an integer pointer type.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D99594
2021-03-31 11:41:17 +01:00
Craig Topper a33fcafaf0 [RISCV] Pass 'half' in the lower 16 bits of an f32 value when F extension is enabled, but Zfh is not.
Without Zfh the half type isn't legal, but it could still be
used as an argument/return in IR. Clang will not generate this today.

Previously we promoted the half value to float for arguments and
returns if the F extension is enabled but Zfh isn't. Then depending on
which ABI is enabled we would pass it in either an FPR or a GPR in
float format.

If the F extension isn't enabled, it would get passed in the lower
16 bits of a GPR in half format.

With this patch the value will always in half format and will be
in the lower bits of a GPR or FPR. This should be consistent
with where the bits are located when Zfh is enabled.

I've based this implementation off of how this is done on ARM.

I've manually nan-boxed the value to 32 bits using integer ops.
It looks like flw, fsw, fmv.s, fmv.w.x, fmf.x.w won't
canonicalize nans so should leave the value alone. I think those
are the instructions that could get used on this value.

Reviewed By: kito-cheng

Differential Revision: https://reviews.llvm.org/D98670
2021-03-30 09:47:54 -07:00
Craig Topper f069000b43 [RISCV] Remove floating point condition code legalization from lowerFixedLengthVectorSetccToRVV.
After D98939, this is done by LegalizeVectorOps making this code dead.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D99519
2021-03-30 09:11:56 -07:00
Craig Topper c40cea6f08 [RISCV] Teach targetShrinkDemandedConstant to preserve (and X, 0xffffffff).
We look for this pattern frequently in isel patterns so its a
good idea to try to preserve it.

This also let's us remove our special isel handling for srliw
and use a direct pattern match of (srl (and X, 0xffffffff), C)
since no bits will be removed from the and mask.

Differential Revision: https://reviews.llvm.org/D99042
2021-03-25 09:03:25 -07:00
Fraser Cormack 99211352c1 [RISCV] Optimize select-like vector shuffles
This patch adds a small optimization for vector shuffle lowering,
detecting shuffles which can be re-expressed as vector selects.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D99270
2021-03-25 11:39:57 +00:00
Fraser Cormack 321a71a772 [RISCV] Optimize BUILD_VECTOR sequences that reveal hidden splats
This patch adds further optimization techniques to RVV BUILD_VECTOR
lowering. It teaches the compiler to find splats of larger vector
element types "hidden" in smaller ones. For example, a v4i8 build_vector
(0x1, 0x2, 0x1, 0x2) could be splat as v2i16 0x0201. This is generally
more optimal than the dominant-element BUILD_VECTORs and so takes
priority.

This optimization is currently limited to all-constant-or-undef
BUILD_VECTORs as those were found to be the most common. There's no
reason this couldn't be extended to other BUILD_VECTORs, but the
additional bit-manipulation instructions may require more sophisticated
heuristics.

There are some cases where the materialization of the larger constant
takes more scalar instructions than it does to build the vector with
vector instructions. We could add heuristics to try and catch this.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D99195
2021-03-25 10:35:31 +00:00
Craig Topper 0f99c6c56e [RISCV] Remove duplicate DebugLoc variables from cases in ReplaceNodeResults. NFC
We already created a DebugLoc at the top of the function. We can
just use that one.
2021-03-24 20:23:03 -07:00
Fraser Cormack feff66a082 [RISCV] Further optimize BUILD_VECTORs with repeated elements
This patch builds upon the initial BUILD_VECTOR work introduced in
D98700. It further optimizes the lowering of BUILD_VECTOR by using
VSELECT operations to effectively insert repeated elements into the
vector with relatively few instructions. This allows us to optimize more
BUILD_VECTORs without significantly increasing the size of the generated
code.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D98969
2021-03-23 14:14:48 +00:00
Fraser Cormack 5bfbd9d938 [RISCV] Optimize all-constant mask BUILD_VECTORs
This patch adds an optimization for mask-vector BUILD_VECTOR nodes whose
elements are all constants or undef. It lowers such operations by
building up the vector via a series of integer operations, in which
multiple mask elements are inserted into a vector at a time via
i8/i16/i32/i64 element types. The final result is then bitcast from that
integer vector.

We restrict this optimization in certain circumstances when optimizing
for size. If we are required to use more than one integer insert
operation, then it will likely increase code size compared with using a
load from a constant pool.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D98860
2021-03-23 10:11:19 +00:00
Craig Topper 294efcd6f7 [RISCV] Add support for fixed vector masked gather/scatter.
I've split the gather/scatter custom handler to avoid complicating
it with even more differences between gather/scatter.

Tests are the scalable vector tests with the vscale removed and
dropped the tests that used vector.insert. We're probably not
as thorough on the splitting cases since we use 128 for VLEN here
but scalable vector use a known min size of 64.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D98991
2021-03-22 10:17:30 -07:00
Craig Topper 5d315691c4 [RISCV] Add missing bitcasts to the results of lowerINSERT_SUBVECTOR and lowerEXTRACT_SUBVECTOR when handling mask vectors.
Found by adding asserts to LegalizeDAG to catch incorrect result
types being returned.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D98964
2021-03-19 10:54:33 -07:00
Craig Topper 85f3f6b3cc [RISCV] Lower scalable vector masked loads to intrinsics to match fixed vectors and reduce isel patterns.
Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D98840
2021-03-19 10:39:35 -07:00
Fraser Cormack d399b82e2a [RISCV] Maintain fixed-length info when optimizing BUILD_VECTORs
I'm not sure how I failed to notice this before, but when optimizing
dominant-element BUILD_VECTORs we would lower via the scalable container type,
which lost us the information about the fixed length of the vector types. By
lowering via the fixed-length type we can preserve that information and
eliminate redundant vsetvli instructions.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D98938
2021-03-19 17:21:06 +00:00
Fraser Cormack 550292ecb1 [RISCV] Fix missing scalable->fixed-length vector conversion
Returning the scalable-vector container type would present problems when
the fixed-length INSERT_VECTOR_ELT was used by later operations.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D98776
2021-03-19 16:49:47 +00:00
Craig Topper c9861f722e [RISCV] Correct the output chain in lowerFixedLengthVectorMaskedLoadToRVV
We returned the input chain instead of the output chain from the
new load. This bypasses the load in the chain. I haven't found a
good way to test this yet. IR order prevents my initial attempts
at causing reordering.
2021-03-18 16:34:35 -07:00
Fraser Cormack 3495031a39 [RISCV] Support scalable-vector masked scatter operations
This patch adds support for masked scatter intrinsics on scalable vector
types. It is mostly an extension of the earlier masked gather support
introduced in D96263, since the addressing mode legalization is the
same.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D96486
2021-03-18 10:17:50 +00:00
Fraser Cormack 0331399dc9 [RISCV] Support scalable-vector masked gather operations
This patch supports the masked gather intrinsics in RVV.

The RVV indexed load/store instructions only support the "unsigned unscaled"
addressing mode; indices are implicitly zero-extended or truncated to XLEN and
are treated as byte offsets. This ISA supports the intrinsics directly, but not
the majority of various forms of the MGATHER SDNode that LLVM combines to. Any
signed or scaled indexing is extended to the XLEN value type and scaled
accordingly. This is done during DAG combining as widening the index types to
XLEN may produce illegal vectors that require splitting, e.g.
nxv16i8->nxv16i64.

Support for scalable-vector CONCAT_VECTORS was added to avoid spilling via the
stack when lowering split legalized index operands.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D96263
2021-03-18 09:26:18 +00:00
Fraser Cormack c2b4600ec8 [RISCV] Support bitcasts of fixed-length mask vectors
Without this patch, bitcasts of fixed-length mask vectors would go
through the stack.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D98779
2021-03-18 08:52:42 +00:00
Craig Topper 696ddef569 [RISCV] Support masked load/store for fixed vectors.
Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D98561
2021-03-17 10:26:15 -07:00
Fraser Cormack 70251759a2 [RISCV] Optimize "dominant element" BUILD_VECTORs
This patch adds an optimization path for BUILD_VECTOR nodes where the
majority of the elements are identical. These can be splatted, with the
remaining elements patched up with INSERT_VECTOR_ELTs. The threshold can
be tweaked as required - it is currently conservative. Undef elements
are disregarded when judging the dominance of a particular element. This
allows them to be covered by the splat value.

In addition, vectors of 2 elements are always optimized to a splat (for
the upper element) and an insert at element zero.

This optimization is disabled when optimizing for size.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D98700
2021-03-17 10:09:04 +00:00
Craig Topper 229eeb187d [RISCV] Look through copies when trying to find an implicit def in addVSetVL.
The InstrEmitter can sometimes insert a copy after an IMPLICIT_DEF
before connecting it to the vector instruction. This occurs when
constrainRegClass reduces to a class with less than 4 registers.
I believe LMUL8 on masked instructions triggers this since the
result can only use the v8, v16, or v24 register group as the mask
is using v0.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D98567
2021-03-16 07:59:09 -07:00
Craig Topper a33ce06cf5 [RISCV] Improve i32 UADDSAT/USUBSAT on RV64.
The default promotion uses zero extends that become shifts. We
cam use sign extend instead which is better for RISCV.

I've used two different implementations based on whether we
have minu/maxu instructions.

Differential Revision: https://reviews.llvm.org/D98683
2021-03-16 07:44:06 -07:00
Craig Topper 41759c3d92 [RISCV] Add RISCVISD::BR_CC similar to RISCVISD::SELECT_CC.
This allows me to introduce similar combines for branches as
we have recently added for SELECT_CC. Some of them are less
useful for standalone setccs and only help branch instructions.
By having a BR_CC node its easier to only affect branches.

I'm using CondCodeSDNode to make isel patterns easier to
write so we can refer to the codes by name. SELECT_CC uses a
constant instead.

I've translated the condition code just like SELECT_CC so
we need less patterns for the swapped conditions. This
includes special cases for X < 1 and X > -1 that get translated
to blez and bgez by using a 0 constant.

computeKnownBitsForTargetNode support for SELECT_CC is added
to allow MaskedValueIsZero to work for cases where the true
and false values of the SELECT_CC are setccs and the
result of the SELECT_CC is used by a BR_CC. This was needed
to avoid regressions in some of the overflow tests.

Reviewed By: luismarques

Differential Revision: https://reviews.llvm.org/D98159
2021-03-15 11:54:01 -07:00
Craig Topper 3dc5b533e0 [RISCV] Improve legalization of i32 UADDO/USUBO on RV64.
The default legalization uses zero extends that require pair of shifts
on RISCV. Instead we can take advantage of the fact that unsigned
compares work equally well on sign extended inputs. This allows
us to use addw/subw and sext.w.

Reviewed By: luismarques

Differential Revision: https://reviews.llvm.org/D98233
2021-03-15 09:30:23 -07:00