Commit Graph

1015 Commits

Author SHA1 Message Date
Fraser Cormack 8c73a31c11 [RISCV] Allow passing fixed-length vectors via the stack
The vector calling convention dictates that when the vector argument
registers are exhaused, GPRs are used to pass the address via the stack.
When the GPRs themselves are exhausted, at best we would previously
crash with an assertion, and at worst we'd generate incorrect code.

This patch addresses this issue by passing fixed-length vectors via the
stack with their full fixed-length size and aligned to their element
type size. Since the calling convention lowering can't yet handle
scalable vector types, this patch adds a fatal error to make it clear
that we are lacking in this regard.

Reviewed By: HsiangKai

Differential Revision: https://reviews.llvm.org/D102422
2021-05-27 14:14:07 +01:00
Fraser Cormack 772b58a641 [SelectionDAG][RISCV] Don't unroll 0/1-type bool VSELECTs
This patch extends the cases in which the legalizer is able to express
VSELECT in terms of XOR/AND/OR. When dealing with a VSELECT between
boolean vector types, the mask itself is an all-ones or all-ones value
of the operand type, so a 0/1 boolean type behaves identically to a 0/-1
type.

This greatly helps RISC-V which relies on expansion for these nodes. It
also allows scalable-vector bool VSELECTs to use the default expansion,
where before it would crash in SelectionDAG::UnrollVectorOp.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D103147
2021-05-27 10:08:57 +01:00
Craig Topper fdf10e6197 [RISCV] Use X0 as destination of inserted vsetvli when possible.
We aren't going to connect the result to anything so we might
as well avoid allocating a register.

Reviewed By: frasercrmck, HsiangKai

Differential Revision: https://reviews.llvm.org/D102031
2021-05-26 13:08:51 -07:00
Craig Topper 9065118b64 [RISCV] Optimize SEW=64 shifts by splat on RV32.
SEW=64 shifts only uses the log2(64) bits of shift amount. If we're
splatting a 64 bit value in 2 parts, we can avoid splatting the
upper bits and just let the low bits be sign extended. They won't
be read anyway.

For the purposes of SelectionDAG semantics of the generic ISD opcodes,
if hi was non-zero or bit 31 of the low is 1, the shift was already
undefined so it should be ok to replace high with sign extend of low.

In order do be able to find the split i64 value before it becomes
a stack operation, I added a new ISD opcode that will be expanded
to the stack spill in PreprocessISelDAG. This new node is conceptually
similar to BuildPairF64, but I expanded earlier so that we could
go through regular isel to get the right VLSE opcode for the LMUL.
BuildPairF64 is expanded in a CustomInserter.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D102521
2021-05-26 10:23:32 -07:00
Jessica Clarke d63d662d3c [RISCV] Remove --riscv-no-aliases from RVV tests
This serves no useful purpose other than to clutter things up. Diff
summary as the real diff is extremely unwieldy:

   24844 -; CHECK-NEXT:    jalr zero, 0(ra)
   24844 +; CHECK-NEXT:    ret
       8 -; CHECK-NEXT:    vl4re8.v v28, (a0)
       8 +; CHECK-NEXT:    vl4r.v v28, (a0)
      64 -; CHECK-NEXT:    vl8re8.v v24, (a0)
      64 +; CHECK-NEXT:    vl8r.v v24, (a0)
     392 -; RUN:   --riscv-no-aliases < %s | FileCheck %s
     392 +; RUN:   < %s | FileCheck %s
       1 -; RUN:   -verify-machineinstrs --riscv-no-aliases < %s \
       1 +; RUN:   -verify-machineinstrs < %s \

As discussed in D103004.
2021-05-26 17:59:38 +01:00
Craig Topper b2c7ac874f [RISCV] Don't propagate VL/VTYPE across inline assembly in the Insert VSETVLI pass.
It's conceivable someone could put a vsetvli in inline assembly
so its safer to consider them as barriers. The alternative would
be to trust that the user marks VL and VTYPE registers as clobbers
of the inline assembly if they do that, but hat seems error prone.

I'm assuming inline assembly in vector code is going to be rare.

Reviewed By: frasercrmck, HsiangKai

Differential Revision: https://reviews.llvm.org/D103126
2021-05-26 09:56:20 -07:00
Craig Topper 1b47a3de48 [RISCV] Enable cross basic block aware vsetvli insertion
This patch extends D102737 to allow VL/VTYPE changes to be taken
into account before adding an explicit vsetvli.

We do this by using a data flow analysis to propagate VL/VTYPE
information from predecessors until we've determined a value for
every value in the function.

We use this information to determine if a vsetvli needs to be
inserted before the first vector instruction the block.

Differential Revision: https://reviews.llvm.org/D102739
2021-05-26 09:25:42 -07:00
Fraser Cormack 7e27e4273d [RISCV] Pre-commit fixed-length mask vselect tests
These are default-expanded but later unrolled due to RISC-V's vector
boolean content policy. A patch to improve this codegen will follow
shortly.
2021-05-26 10:44:45 +01:00
Ben Shi bf77317049 [RISCV] Optimize xor/or with immediate in the zbs extension
Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D102893
2021-05-25 14:14:09 +08:00
Craig Topper b510e4cf1b [RISCV] Add a vsetvli insert pass that can be extended to be aware of incoming VL/VTYPE from other basic blocks.
This is a replacement for D101938 for inserting vsetvli
instructions where needed. This new version changes how
we track the information in such a way that we can extend
it to be aware of VL/VTYPE changes in other blocks. Given
how much it changes the previous patch, I've decided to
abandon the previous patch and post this from scratch.

For now the pass consists of a single phase that assumes
the incoming state from other basic blocks is unknown. A
follow up patch will extend this with a phase to collect
information about how VL/VTYPE change in each block and
a second phase to propagate this information to the entire
function. This will be used by a third phase to do the
vsetvli insertion.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D102737
2021-05-24 11:47:27 -07:00
serge-sans-paille 4ab3041acb Revert "[NFC] remove explicit default value for strboolattr attribute in tests"
This reverts commit bda6e5bee0.

See https://lab.llvm.org/buildbot/#/builders/109/builds/15424 for instance
2021-05-24 19:43:40 +02:00
serge-sans-paille bda6e5bee0 [NFC] remove explicit default value for strboolattr attribute in tests
Since d6de1e1a71, no attributes is quivalent to
setting attribute to false.

This is a preliminary commit for https://reviews.llvm.org/D99080
2021-05-24 19:31:04 +02:00
luxufan d70e9195a3 [RISCV] Optimize getVLENFactoredAmount function.
If the local variable `NumOfVReg` isPowerOf2_32(NumOfVReg - 1) or isPowerOf2_32(NumOfVReg + 1), the ADDI and MUL instructions can be replaced with SLLI and ADD(or SUB) instructions.

Based on original patch by StephenFan.

Reviewed By: frasercrmck, StephenFan

Differential Revision: https://reviews.llvm.org/D100577
2021-05-24 10:04:37 -07:00
Fraser Cormack 7a211ed110 [RISCV] Prevent store combining from infinitely looping
RVV code generation does not successfully custom-lower BUILD_VECTOR in all
cases. When it resorts to default expansion it may, on occasion, be expanded to
scalar stores through the stack. Unfortunately these stores may then be picked
up by the post-legalization DAGCombiner which merges them again. The merged
store uses a BUILD_VECTOR which is then expanded, and so on.

This patch addresses the issue by overriding the `mergeStoresAfterLegalization`
hook. A lack of granularity in this method (being passed the scalar type) means
we opt out in almost all cases when RVV fixed-length vector support is enabled.
The only exception to this rule are mask vectors, which are always either
custom-lowered or are expanded to a load from a constant pool.

Reviewed By: HsiangKai

Differential Revision: https://reviews.llvm.org/D102913
2021-05-24 10:19:32 +01:00
Jessica Clarke e10958c807 [SelectionDAG][Mips][PowerPC][RISCV][WebAssembly] Teach computeKnownBits/ComputeNumSignBits about atomics
Unlike normal loads these don't have an extension field, but we know
from TargetLowering whether these are sign-extending or zero-extending,
and so can optimise away unnecessary extensions.

This was noticed on RISC-V, where sign extensions in the calling
convention would result in unnecessary explicit extension instructions,
but this also fixes some Mips inefficiencies. PowerPC sees churn in the
tests as all the zero extensions are only for promoting 32-bit to
64-bit, but these zero extensions are still not optimised away as they
should be, likely due to i32 being a legal type.

This also simplifies the WebAssembly code somewhat, which currently
works around the lack of target-independent combines with some ugly
patterns that break once they're optimised away.

Re-landed with correct handling in ComputeNumSignBits for Tmp == VTBits,
where zero-extending atomics were incorrectly returning 0 rather than
the (slightly confusing) required return value of 1.

Re-landed again after D102819 fixed PowerPC to correctly zero-extend all
of its atomics as it claimed to do, since the combination of that bug
and this optimisation caused buildbot regressions.

Reviewed By: RKSimon, atanasyan

Differential Revision: https://reviews.llvm.org/D101342
2021-05-20 20:34:23 +01:00
Fraser Cormack c74ab891fc [RISCV] Ensure small mask BUILD_VECTORs aren't expanded
The default expansion for BUILD_VECTORs -- save for going through
shuffles -- is to go through the stack. This method only works when the
type is at least byte-sized, so for v2i1 and v4i1 we would crash.

This patch ensures that small mask-type BUILD_VECTORs are always handled
without crashing. We lower to a SETCC of the equivalent i8 type.

This also exposes some pre-existing issues where the lowering when
optimizing for size results in larger code than without. Those will be
tackled in future patches.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D102767
2021-05-20 19:12:29 +01:00
Fraser Cormack 26bd2250c1 [RISCV] Ensure shuffle splat operands are type-legal
The use of `SelectionDAG::getSplatValue` isn't guaranteed to return a
type-legal splat value as it may implicitly extract a vector element
from another shuffle. It is not permitted to introduce an illegal type
when lowering shuffles.

This patch addresses the crash by adding a boolean flag to
`getSplatValue`, defaulting to false, which when set will ensure a
type-legal return value. If it is unable to do that it will fail to
return a splat value.

I've been through the existing uses of `getSplatValue` in other targets
and was unable to find a need or test cases showing a need to update
their uses. In some cases, the call is made during `LegalizeVectorOps`
which may still produce illegal scalar types. In other situations, the
illegally-typed splat value may be quickly patched up to a legal type
(such as any-extending the returned `extract_vector_elt` up to a legal
type) before `LegalizeDAG` notices.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D102687
2021-05-20 18:00:03 +01:00
Fraser Cormack ca2c245ba4 [RISCV] Support INSERT_VECTOR_ELT into i1 vectors
Like the element extraction of these vectors, we choose to promote up to
an i8 vector type and perform the insertion there.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D102697
2021-05-19 09:41:50 +01:00
Fraser Cormack 175bdf127d [RISCV] Fix operand order in fixed-length VM(OR|AND)NOT patterns
Where the RVV specification writes `vs2, vs1`, our TableGen patterns use
`rs1, rs2`. These differences can easily cause confusion. The VMANDNOT
instruction performs `LHS && !RHS`, and similarly for VMORNOT.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D102606
2021-05-18 09:21:25 +01:00
Ben Shi 3cf7983cbe [RISCV][test] Add new tests of or/xor in the zbs extension
Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D102625
2021-05-18 07:10:17 +08:00
Fraser Cormack 85e31eddf2 [DAGCombiner] Relax an assertion to an early return
The select-of-constants transform was asserting that its constant vector
inputs did not implicitly truncate their input without that as an
explicit precondition to the function. This patch relaxes that assertion
into an early return to skip the optimization.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D102393
2021-05-17 09:15:55 +01:00
Ben Shi 7746e818a5 [RISCV] Optimize or/xor with immediate in the zbs extension
Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D102398
2021-05-17 10:59:52 +08:00
Ben Shi 1dfd7d5041 [RISCV][test] Add new tests of or/xor in the zbs extension
Reviewed By: MaskRay

Differential Revision: https://reviews.llvm.org/D102396
2021-05-17 09:47:23 +08:00
Hsiangkai Wang b41e1306b8 [RISCV] Add the DebugLoc parameter to getVLENFactoredAmount().
The MachineBasicBlock::iterator is continuously changing during
generating the frame handling instructions. We should use the DebugLoc
from the caller, instead of getting it from the changing iterator.

If the prologue instructions located in a basic block without any other
instructions after these prologue instructions, the iterator will be
updated to the boundary of the basic block and it is invalid to use the
iterator to access DebugLoc. This patch also fixes the crash when
accessing DebugLoc using the iterator.

Differential Revision: https://reviews.llvm.org/D102386
2021-05-14 21:31:06 +08:00
Fraser Cormack 797e580db9 [RISCV][NFC] Simplify test run lines
Several tests had -verify-machineinstrs twice, and several tests were
explicitly specifying the default FileCheck prefix of CHECK.
2021-05-13 12:41:00 +01:00
Fraser Cormack c5ec00e62b [TargetLowering] Improve legalization of scalable vector types
This patch extends the vector type-conversion and legalization capabilities of
scalable vector types.

Firstly, `vscale x 1` types now behave more like the corresponding `vscale x
2+` types. This enables the integer promotion legalization of extended scalable
types, such as the promotion of `<vscale x 1 x i5>` to `<vscale x 1 x i8>`.

These `vscale x 1` types are also now better handled by
`getVectorTypeBreakdown`, where what looks like older handling for 1-element
fixed-length vector types was spuriously updated to include scalable types.

Widening of scalable types is now better supported, by using `INSERT_SUBVECTOR`
to insert the smaller scalable vector "value" type into the wider scalable
vector "part" type. This allows AArch64 to pass and return `vscale x 1` types
by value by widening.

There are still cases where we are unable to legalize `vscale x 1` types, such
as where expansion would require splitting the vector in two.

Reviewed By: sdesmalen

Differential Revision: https://reviews.llvm.org/D102073
2021-05-12 16:33:07 +01:00
Stefan Pintilie 8d37411e48 Revert "[SelectionDAG][Mips][PowerPC][RISCV][WebAssembly] Teach computeKnownBits/ComputeNumSignBits about atomics"
This reverts commit 6c80361b84.
Breaks PowerPC Big Endian buildbots.
2021-05-12 09:46:18 -05:00
Craig Topper d092dd56ae [RISCV] Regenerate stepvector.ll. NFC
It looks like the RV32 and RV64 prefixes were removed from the
RUN lines while another patch was in review that added check
lines that used them.
2021-05-11 13:04:57 -07:00
Fangrui Song ec27c5f170 [RISCV] Prefer to lower MC_GlobalAddress operands to .Lfoo$local
Similar to X86 D73230 and AArch64 D101872

With this change, we can set dso_local in clang's -fpic -fno-semantic-interposition mode,
for default visibility external linkage non-ifunc-non-COMDAT definitions.

For such dso_local definitions, variable access/taking the address of a
function/calling a function will go through a local alias to avoid GOT/PLT.

Reviewed By: jrtc27, luismarques

Differential Revision: https://reviews.llvm.org/D101875
2021-05-11 11:29:45 -07:00
Craig Topper ce6e4f27dd [RISCV] Use fractional LMULs for fixed length types smaller than riscv-v-vector-bits-min.
My thought process is that if v2i64 is an LMUL=1 type then v2i32
should be an LMUL=1/2 type. We limit the fractional LMUL so that
SEW=64 clips to LMUL=1, SEW=32 clips to LMUL=1/2, etc. This
ensures there's always a fractional LMUL available to truncate a type.
This does reduce the number of vsetvlis in some cases.

Some tests increase vsetvlis because the best container type for a
mask type is dependent on the LMUL+SEW that the mask was produced
from, but you can't tell that from the type. I think this is
something we need to solve this in the machine IR when optimizing
vsetvlis.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D101215
2021-05-11 09:42:48 -07:00
Craig Topper dc00cbb505 [RISCV] Match trunc_vector_vl+sra_vl/srl_vl with splat shift amount to vnsra/vnsrl.
Limited to splats because we would need to truncate the shift
amount vector otherwise.

I tried to do this with new ISD nodes and a DAG combine to
avoid such a large pattern, but we don't form the splat until
LegalizeDAG and need DAG combine to remove a scalable->fixed->scalable
cast before it becomes visible to the shift node. By the time that
happens we've already visited the truncate node and won't revisit it.

I think I have an idea how to improve i64 on RV32 I'll save for a
follow up.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D102019
2021-05-11 09:29:31 -07:00
Hsiangkai Wang d8ec2b183e [RISCV] Fix the calculation of the offset of Zvlsseg spilling.
For Zvlsseg spilling, we need to convert the pseudo instructions
into multiple vector load/store instructions with appropriate offsets.
For example, for PseudoVSPILL3_M2, we need to convert it to

VS2R %v2, %base
ADDI %base, %base, (vlenb x 2)
VS2R %v4, %base
ADDI %base, %base, (vlenb x 2)
VS2R %v6, %base

We need to keep the size of the offset in the pseudo spilling instructions.
In this case, it is (vlenb x 2).

In the original implementation, we use the size of frame objects divide the
number of vectors in zvlsseg types. The size of frame objects is not
necessary exactly the same as the spilling data. It may be larger than
it. So, we change it to (VLENB x LMUL) in this patch. The calculation is
more direct and easy to understand.

Differential Revision: https://reviews.llvm.org/D101869
2021-05-11 10:13:18 +08:00
Craig Topper 80b9510806 [RISCV] Correct VL for fixed length masked scatter.
We were incorrectly calling getVectorNumElements on a scalable
vector type. This shouldn't be allowed. This gives a warning on
EVT, but not MVT.
2021-05-10 09:50:08 -07:00
Fraser Cormack 6db0cedd23 [LegalizeVectorOps][RISCV] Add scalable-vector SELECT expansion
This patch extends VectorLegalizer::ExpandSELECT to permit expansion
also for scalable vector types. The only real change is conditionally
checking for BUILD_VECTOR or SPLAT_VECTOR legality depending on the
vector type.

We can use this to fix "cannot select" errors for scalable vector
selects on the RISCV target. Note that in future patches RISCV will
possibly custom-lower vector SELECTs to VSELECTs for branchless codegen.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D102063
2021-05-10 08:22:35 +01:00
Jessica Clarke 6c80361b84 [SelectionDAG][Mips][PowerPC][RISCV][WebAssembly] Teach computeKnownBits/ComputeNumSignBits about atomics
Unlike normal loads these don't have an extension field, but we know
from TargetLowering whether these are sign-extending or zero-extending,
and so can optimise away unnecessary extensions.

This was noticed on RISC-V, where sign extensions in the calling
convention would result in unnecessary explicit extension instructions,
but this also fixes some Mips inefficiencies. PowerPC sees churn in the
tests as all the zero extensions are only for promoting 32-bit to
64-bit, but these zero extensions are still not optimised away as they
should be, likely due to i32 being a legal type.

This also simplifies the WebAssembly code somewhat, which currently
works around the lack of target-independent combines with some ugly
patterns that break once they're optimised away.

Re-landed with correct handling in ComputeNumSignBits for Tmp == VTBits,
where zero-extending atomics were incorrectly returning 0 rather than
the (slightly confusing) required return value of 1.

Reviewed By: RKSimon, atanasyan

Differential Revision: https://reviews.llvm.org/D101342
2021-05-06 04:01:20 +01:00
Jessica Clarke 897d7bceb9 Revert "[SelectionDAG][Mips][PowerPC][RISCV][WebAssembly] Teach computeKnownBits/ComputeNumSignBits about atomics"
This seems to have broken sanitizers, giving lots of

  Assertion `NumBits <= MAX_INT_BITS && "bitwidth too large"' failed.

failures across multiple targets (currently X86 and PowerPC). Reverting
until I have a chance to reproduce and debug.

This reverts commit 6e876f9ded.
2021-05-05 17:02:05 +01:00
Jessica Clarke 6e876f9ded [SelectionDAG][Mips][PowerPC][RISCV][WebAssembly] Teach computeKnownBits/ComputeNumSignBits about atomics
Unlike normal loads these don't have an extension field, but we know
from TargetLowering whether these are sign-extending or zero-extending,
and so can optimise away unnecessary extensions.

This was noticed on RISC-V, where sign extensions in the calling
convention would result in unnecessary explicit extension instructions,
but this also fixes some Mips inefficiencies. PowerPC sees churn in the
tests as all the zero extensions are only for promoting 32-bit to
64-bit, but these zero extensions are still not optimised away as they
should be, likely due to i32 being a legal type.

This also simplifies the WebAssembly code somewhat, which currently
works around the lack of target-independent combines with some ugly
patterns that break once they're optimised away.

Reviewed By: RKSimon, atanasyan

Differential Revision: https://reviews.llvm.org/D101342
2021-05-05 16:34:45 +01:00
Fraser Cormack 61a46375a2 [RISCV][VP][NFC] Add tests for VP_SREM and VP_UREM
As agreed in D101826, these are follow-up tests for the RISC-V VP
support.
2021-05-05 13:13:34 +01:00
Fraser Cormack 437468f319 [RISCV][VP][NFC] Add tests for VP_MUL and VP_[US]DIV
As agreed in D101826, these are follow-up tests for the RISC-V VP
support.
2021-05-05 13:08:57 +01:00
Fraser Cormack 491a3d1359 [RISCV][VP][NFC] Add tests for VP_SHL and VP_LSHR
As agreed in D101826, these are follow-up tests for the RISC-V VP
support. Tests for VP_ASHR were landed as part of D101826.
2021-05-05 13:01:04 +01:00
Fraser Cormack 3fbcf07a99 [RISCV][VP][NFC] Add tests for VP_AND, VP_XOR, VP_OR
As agreed in D101826, these are follow-up tests for the RISC-V VP
support.
2021-05-05 12:58:08 +01:00
Fraser Cormack 6f17613bfb [RISCV][VP] Lower VP ISD nodes to RVV instructions
This patch supports all of the current set of VP integer binary
intrinsics by lowering them to to RVV instructions. It does so by using
the existing RISCVISD *_VL custom nodes as an intermediate layer. Both
scalable and fixed-length vectors are supported by using this method.

One notable change to the existing vector codegen strategy is that
scalable all-ones and all-zeros mask SPLAT_VECTORs are now lowered to
RISCVISD VMSET_VL and VMCLR_VL nodes to match their fixed-length
BUILD_VECTOR counterparts. This allows them to reuse the existing
"all-ones" VL patterns.

To reduce the size of the phabricator diff, some tests are intentionally
left out and will be added later if the patch is accepted.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D101826
2021-05-05 12:32:24 +01:00
Fraser Cormack cd6a52fede [RISCV] Cap legal fixed-length vectors to 256-element types
Previously, RISC-V would make legal all fixed-length vectors types whose
size are less than or equal to some function of the minimum value of
VLEN and the maximum-permissible LMUL grouping.

Due to vector legalization issues, this patch instead caps the legal
fixed-length vector types to those with 256 elements. This value was
chosen because it is the longest vector length which has corresponding
MVTs across all supported element types.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D101839
2021-05-05 09:51:08 +01:00
Fraser Cormack 6523ff6d47 [ValueTypes] Add MVTs for v256i16 and v256f16
This patch adds the two MVTs to fix a legalizer crash when using vector
shuffles of <256 x i16> and <128 x i16> on RISC-V. The legalizer can't
promote the operand of `v256i32 = any_extend_vector_inreg v128i16`.

Reviewed By: craig.topper, RKSimon

Differential Revision: https://reviews.llvm.org/D101769
2021-05-04 18:06:13 +01:00
Jessica Clarke fb92cf9208 [RISCV] Pre-commit tests for D101342
These tests show inefficient sign extension for AMOs on RISC-V. The
normal CodeGen tests use anyext return values, but if marked signext
then we end up generating unnecessary sign extension instructions. This
can be seen when compiling C that returns an i32 (signed or unsigned),
where the calling convention results in a signext return value.
2021-05-04 11:12:43 +01:00
Fraser Cormack 46fa214a6f [RISCV] Lower splats of non-constant i1s as SETCCs
This patch adds support for splatting i1 types to fixed-length or
scalable vector types. It does so by lowering the operation to a SETCC
of the equivalent i8 type.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D101465
2021-05-04 09:14:05 +01:00
Fraser Cormack d23e4f6872 [RISCV] Add support for fmin/fmax vector reductions
Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D101518
2021-05-03 10:33:51 +01:00
Craig Topper ba63cdb8f2 [RISCV] Store SEW in RISCV vector pseudo instructions in log2 form.
This shrinks the immediate that isel table needs to emit for these
instructions. Hoping this allows me to change OPC_EmitInteger to
use a better variable length encoding for representing negative
numbers. Similar to what was done a few months ago for OPC_CheckInteger.

The alternative encoding uses less bytes for negative numbers, but
increases the number of bytes need to encode 64 which was a very
common number in the RISCV table due to SEW=64. By using Log2 this
becomes 6 and is no longer a problem.
2021-05-02 12:09:20 -07:00
Fraser Cormack 1d85b24762 [RISCV][NFC] Merge RV32/RV64 test checks with a common prefix 2021-04-30 09:43:48 +01:00
Fraser Cormack 791766e6d2 [RISCV] Support STEP_VECTOR with a step greater than one
DAGCombiner was recently taught how to combine STEP_VECTOR nodes,
meaning the step value is no longer guaranteed to be one by the time it
reaches the backend for lowering.

This patch supports such cases on RISC-V by lowering to other step
values to a multiply following the vid.v instruction. It includes a
small optimization for common cases where the multiply can be expressed
as a shift left.

Reviewed By: rogfer01

Differential Revision: https://reviews.llvm.org/D100856
2021-04-30 09:36:18 +01:00
luxufan 5603ed60ad [RISCV] Fix StackOffset calculation when using sp to access the fixed stack object in the case of rvv vector objects existed
When rvv vector objects existed, using sp to access the fixed stack object will pass the rvv vector objects field. So the StackOffset needs add a scalable offset of the size of rvv vector objects field

Differential Revision: https://reviews.llvm.org/D100286
2021-04-30 11:02:38 +08:00
luxufan 325b454ed8 [RISCV] Precommit a test case that test accessing a fixed object when has rvv vector object existed
Differential Revision: https://reviews.llvm.org/D100284
2021-04-30 10:35:03 +08:00
Craig Topper dcdda2bdf2 [RISCV] Teach DAG combine to fold (and (select_cc lhs, rhs, cc, -1, c), x) -> (select_cc lhs, rhs, cc, x, (and, x, c))
Similar for or/xor with 0 in place of -1.

This is the canonical form produced by InstCombine for something like `c ? x & y : x;` Since we have to use control flow to expand select we'll usually end up with a mv in basic block. By folding this we may be able to pull the and/or/xor into the block instead and avoid a mv instruction.

The code here is based on code from ARM that uses this to create predicated instructions. I'm doing it on SELECT_CC so it happens late, but we could do it on select earlier which is what ARM does. I'm not sure if we lose any combine opportunities if we do it earlier.

I left out add and sub because this can separate sext.w from the add/sub. It also made a conditional i64 addition/subtraction on RV32 worse. I guess both of those would be fixed by doing this earlier on select.

The select-binop-identity.ll test has not been commited yet, but I made the diff show the changes to it.

Reviewed By: luismarques

Differential Revision: https://reviews.llvm.org/D101485
2021-04-29 09:43:51 -07:00
Craig Topper 60216adef1 [RISCV] Add test cases for D101485. NFC 2021-04-29 09:43:51 -07:00
Craig Topper 0c330afdfa [RISCV] Enable SPLAT_VECTOR for fixed vXi64 types on RV32.
This replaces D98479.

This allows type legalization to form SPLAT_VECTOR_PARTS so we don't
lose the splattedness when the scalar type is split.

I'm handling SPLAT_VECTOR_PARTS for fixed vectors separately so
we can continue using non-VL nodes for scalable vectors.

I limited to RV32+vXi64 because DAGCombiner::visitBUILD_VECTOR likes
to form SPLAT_VECTOR before seeing if it can replace the BUILD_VECTOR
with other operations. Especially interesting is a splat BUILD_VECTOR of
the extract_vector_elt which can become a splat shuffle, but won't if
we form SPLAT_VECTOR first. We either need to reorder visitBUILD_VECTOR
or add visitSPLAT_VECTOR.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D100803
2021-04-29 08:20:09 -07:00
Craig Topper 25391cec3a [RISCV] Teach computeKnownBits that vsetvli returns number less than 2^31.
This seems like a reasonable upper bound on VL. WG discussions for
the V spec would probably allow us to use 2^16 as an upper bound
on VLEN, but this is good enough for now.

This allows us to remove sext and zext if user happens to assign
the size_t result into an int and then uses it as a VL intrinsic
argument which is size_t.

Reviewed By: frasercrmck, rogfer01, arcbbb

Differential Revision: https://reviews.llvm.org/D101472
2021-04-29 08:07:59 -07:00
Fraser Cormack f6c54a61da [RISCV][NFC] Combine identical RV32 and RV64 test checks 2021-04-29 11:38:10 +01:00
Fraser Cormack 43ad058a01 [RISCV] Fix stack slot for argument types (Bug 49500)
This is an complementary/alternative fix for D99068. It takes a slightly
different approach by explicitly summing up all of the required split
part type sizes and ensuring we allocate enough space for them. It also
takes the maximum alignment of each part.

Compared with D99068 there are fewer changes to the stack objects in
existing tests. However, @luismarques has shown in that patch that there
are opportunities to reduce our stack usage in the future.

Reviewed By: luismarques

Differential Revision: https://reviews.llvm.org/D99087
2021-04-29 09:10:48 +01:00
Craig Topper ce09dd54e6 [RISCV] Select 5 bit immediate for VSETIVLI during isel rather than peepholing in the custom inserter.
This adds a special operand type that is allowed to be either
an immediate or register. By giving it a unique operand type the
machine verifier will ignore it.

This perturbs a lot of tests but mostly it is just slightly different
instruction orders. Something bad did happen to some min/max reduction
tests. We're spilling vector registers when we weren't before.

Reviewed By: khchen

Differential Revision: https://reviews.llvm.org/D101246
2021-04-27 14:38:16 -07:00
Craig Topper 262a72f50f [RISCV] Use stack slot to handle SPLAT_VECTOR_PARTS on RV32.
Reduces the amount of vector ALU operations and reduces vector
register pressure.
2021-04-26 15:43:02 -07:00
Craig Topper e2cd92cb9b [RISCV] Match splatted load to scalar load + splat. Form strided load during isel.
This modifies my previous patch to push the strided load formation
to isel. This gives us opportunity to fold the splat into a .vx
operation first. Using a scalar register and a .vx operation reduces
vector register pressure which can be important for larger LMULs.

If we can't fold the splat into a .vx operation, then it can make
sense to use a strided load to free up the vector arithmetic
ALU to do actual arithmetic rather than tying it up with vmv.v.x.

Reviewed By: khchen

Differential Revision: https://reviews.llvm.org/D101138
2021-04-26 13:32:03 -07:00
Ben Shi 60ed86d350 [RISCV] Optimize addition with immediate
Reviewed by: craig.topper

Differential Revision: https://reviews.llvm.org/D101244
2021-04-26 13:26:17 +08:00
Craig Topper 8f5cd49405 [RISCV] Teach DAG combine what bits Zbp instructions demanded from their inputs.
This teaches DAG combine that shift amount operands for grev, gorc
shfl, unshfl only read a few bits.

This also teaches DAG combine that grevw, gorcw, shflw, unshflw,
bcompressw, bdecompressw only consume the lower 32 bits of their
inputs.

In the future we can teach SimplifyDemandedBits to also propagate
demanded bits of the output to the inputs in some cases.
2021-04-25 21:54:06 -07:00
Levy Hsu 8cf54c7ff5 [RISCV] [1/2] Add IR intrinsic for Zbe extension
RV32/64:
bcompress
bdecompress

RV64 ONLY:
bcompressw
bdecompressw

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D101143
2021-04-25 19:14:34 -07:00
Craig Topper 3064a63b2b [RISCV] Remove GetVRegNoV0 from the output register class of masked compare pseudo instructions.
Theses instructions are allowed to write v0 when they are masked.
We'll still never use v0 because of the earlyclobber constraint so
this doesn't really help anything. It just makes the definitions
correct.

While I was there remove an unused multiclass I noticed.

Reviewed By: HsiangKai

Differential Revision: https://reviews.llvm.org/D101118
2021-04-23 09:33:29 -07:00
Fraser Cormack 83b8f8da82 [RISCV] Custom lower vector F(MIN|MAX)NUM to vf(min|max)
This patch adds support for both scalable- and fixed-length vector code
lowering of the llvm.minnum and llvm.maxnum intrinsics to the equivalent
RVV instructions.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D101035
2021-04-23 12:22:15 +01:00
Levy Hsu b49337bbb9 [RISCV] [1/2] Add IR intrinsic for Zbp extension
RV32/64:
    grev
    grevi
    gorc
    gorci
    shfl
    shfli
    unshfl
    unshfli

RV64 ONLY:
    grevw
    greviw
    gorcw
    gorciw
    shflw
    shfli     (For non-existing shfliw)
    unshfli   (For non-existing unshfliw)

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D100830
2021-04-22 16:34:51 -07:00
Craig Topper 5185b52988 [RISCV] Fix crash with fptosi.sat/fptoui.sat intrinsics on RV64. Add test cases.
Add PromoteIntOp_FP_TO_XINT_SAT to type legalize the bit width
operand from i32 to i64 for RV64.

Add test cases for the saturating intrinsics for half/float/double
and i32/i64. CodeGen is definitely not optimal. We can probably
make use of the native behavior of fcvt instructions in many cases.

Fixes PR50083
2021-04-22 15:18:15 -07:00
Craig Topper e01c419ecd [RISCV] Add IR intrinsics for vmsge(u).vv/vx/vi.
These instructions don't really exist, but we have ways we can
emulate them.

.vv will swap operands and use vmsle().vv. .vi will adjust the
immediate and use .vmsgt(u).vi when possible. For .vx we need to
use some of the multiple instruction sequences from the V extension
spec.

For unmasked vmsge(u).vx we use:
  vmslt{u}.vx vd, va, x; vmnand.mm vd, vd, vd

For cases where mask and maskedoff are the same value then we have
vmsge{u}.vx v0, va, x, v0.t which is the vd==v0 case that
requires a temporary so we use:
  vmslt{u}.vx vt, va, x; vmandnot.mm vd, vd, vt

For other masked cases we use this sequence:
  vmslt{u}.vx vd, va, x, v0.t; vmxor.mm vd, vd, v0
We trust that register allocation will prevent vd in vmslt{u}.vx
from being v0 since v0 is still needed by the vmxor.

Differential Revision: https://reviews.llvm.org/D100925
2021-04-22 10:44:38 -07:00
Craig Topper d77d56acfd [RISCV] Add missing tests for vector type for second operand of vmsgt and vmsgtu IR intrinsics.
Refactor to use new multiclass instead of individual patterns.

We already supported this due to SEW=64 on RV32, but we didn't have
test cases for all the types we supported.

Part of D100925
2021-04-22 10:44:38 -07:00
Craig Topper 9524a0553d [RISCV] Support vector type for second operand of vmfge and vmfgt IR intrinsics.
We don't have instructions for these, but can swap the operands
to use vmle/vmflt. This makes the IR interface more consistent and
simplifies the frontend implementation.

Part of D100925
2021-04-22 10:44:38 -07:00
Craig Topper 70254ccb69 [RISCV] Turn splat shuffles of vector loads into strided load with stride of x0.
Implementations are allowed to optimize an x0 stride to perform
less memory accesses. This is the case in SiFive cores.

No idea if this is the case in other implementations. We might
need a tuning flag for this.

Reviewed By: frasercrmck, arcbbb

Differential Revision: https://reviews.llvm.org/D100815
2021-04-22 10:02:57 -07:00
Craig Topper 77f14c96e5 [RISCV] Use stack temporary to splat two GPRs into SEW=64 vector on RV32.
Rather than doing splatting each separately and doing bit manipulation
to merge them in the vector domain, copy the data to the stack
and splat it using a strided load with x0 stride. At least on
some implementations this vector load is optimized to not do
a load for each element.

This is equivalent to how we move i64 to f64 on RV32.

I've only implemented this for the intrinsic fallbacks in this
patch. I think we do similar splatting/shifting/oring in other
places. If this is approved, I'll refactor the others to share
the code.

Differential Revision: https://reviews.llvm.org/D101002
2021-04-22 09:50:07 -07:00
Jun Ma 978eb3f168 [DAGCombiner] Allow operand of step_vector to be negative.
It is proper to relax non-negative limitation of step_vector.
Also this patch adds more combines for step_vector:
(sub X, step_vector(C)) -> (add X,  step_vector(-C))

Differential Revision: https://reviews.llvm.org/D100812
2021-04-22 20:58:03 +08:00
Serge Pavlov 740962e5d0 [RISCV] Custom lowering of SET_ROUNDING
Differential Revision: https://reviews.llvm.org/D91242
2021-04-22 15:04:55 +07:00
Serge Pavlov 6e63dfdae2 [RISCV] Custom lowering of FLT_ROUNDS_
Differential Revision: https://reviews.llvm.org/D90854
2021-04-22 11:39:15 +07:00
Craig Topper f6d8cf7798 [RISCV] Teach lowerSPLAT_VECTOR_PARTS to detect cases where Hi is sign extended from Lo.
This recognizes the case when Hi is (sra Lo, 31). We can use
SPLAT_VECTOR_I64 rather than splatting the high bits and
combining them in the vector register.
2021-04-21 20:24:23 -07:00
Fraser Cormack c141bd3cf9 [DAGCombiner] Support all-ones/all-zeros SPLAT_VECTOR in more combines
This patch adds incrementally-better support for SPLAT_VECTOR in a
handful of vector combines by changing a few more
isBuildVectorAllOnes/isBuildVectorAllZeros to the equivalent
isConstantSplatVectorAllOnes/Zeros calls.

Reviewed By: paulwalker-arm

Differential Revision: https://reviews.llvm.org/D100851
2021-04-21 11:05:37 +01:00
Fraser Cormack 3f02d26943 [RISCV] Further fixes for RVV stack offset computation
This patch fixes a case missed out by D100574, in which RVV scalable
stack offset computations may require three live registers in the case
where the offset's fixed component is 12 bits or larger and has a
scalable component.

Instead of adding an additional emergency spill slot, this patch further
optimizes the scalable stack offset computation sequences to reduce
register usage.

By emitting the sequence to compute the scalable component before the
fixed component, we can free up one scratch register to be reallocated
by the sequence for the fixed component. Doing this saves one register
and thus one additional emergency spill slot.

Compare:

    $x5 = LUI 1
    $x1 = ADDIW killed $x5, -1896
    $x1 = ADD $x2, killed $x1
    $x5 = PseudoReadVLENB
    $x6 = ADDI $x0, 50
    $x5 = MUL killed $x5, killed $x6
    $x1 = ADD killed $x1, killed $x5

versus:

    $x5 = PseudoReadVLENB
    $x1 = ADDI $x0, 50
    $x5 = MUL killed $x5, killed $x1
    $x1 = LUI 1
    $x1 = ADDIW killed $x1, -1896
    $x1 = ADD $x2, killed $x1
    $x1 = ADD killed $x1, killed $x5

Reviewed By: HsiangKai

Differential Revision: https://reviews.llvm.org/D100847
2021-04-21 10:51:07 +01:00
Craig Topper 78abad569c [RISCV] Add missing SEW=64 tests to vmslt-rv32.ll. NFC 2021-04-20 18:31:36 -07:00
Fraser Cormack 60622b82a7 [RISCV][NFC] Add tests for scalable-vector DAGCombiner improvements
These will all be improved by future patches.
2021-04-20 14:26:26 +01:00
Fraser Cormack b4a358a7ba [RISCV] Fix missing emergency slots for scalable stack offsets
This patch adds an additional emergency spill slot to RVV code. This is
required as RVV stack offsets may require an additional register to compute.

This patch includes an optimization by @HsiangKai <kai.wang@sifive.com>
to reduce the number of registers required for the computation of stack
offsets from 3 to 2. Otherwise we'd need two additional emergency spill
slots.

Reviewed By: HsiangKai

Differential Revision: https://reviews.llvm.org/D100574
2021-04-20 09:59:41 +01:00
Jun Ma 1ef5699d1a [DAGCombiner] Support fold zero scalar vector.
This patch changes ISD::isBuildVectorAllZeros to
ISD::isConstantSplatVectorAllZeros which handles zero sclar vector.

TestPlan: check-llvm

Differential Revision: https://reviews.llvm.org/D100813
2021-04-20 16:28:43 +08:00
Fraser Cormack 457da7f298 [SelectionDAG] Relax constraints on STEP_VECTOR step operand
This patch relaxes the requirement that the STEP_VECTOR step constant
must be of a type at least as large as the vector element type. This
does not permit its use on targets which have legal vector element types
larger than the largest legal scalar type, such as i64 vectors on RV32.

As such, the requirement has been loosened so that the step operand must
be any scalar type so long as the constant immediate is non-negative and
the value fits inside the vector element type.

This limits combining optimizations in certain circumstances but in
practice it's unlikely to be a hindrance.

Reviewed By: paulwalker-arm

Differential Revision: https://reviews.llvm.org/D100660
2021-04-20 08:41:42 +01:00
Ben Shi b7249bf3b5 [RISCV][test] Add a new test of addition
Reviewed by: craig.topper

Differential Revision: https://reviews.llvm.org/D100767
2021-04-20 12:11:56 +08:00
Craig Topper 7ed01a420a [RISCV] Pad v4i1/v2i1/v1i1 stores with 0s to make a full byte.
As noted in the FIXME there's a sort of agreement that the any
extra bits stored will be 0.

The generated code is pretty terrible. I was really hoping we
could use a tail undisturbed trick, but tail undisturbed no
longer applies to masked destinations in the current draft
spec.

Fingers crossed that it isn't common to do this. I doubt IR
from clang or the vectorizer would ever create this kind of store.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D100618
2021-04-19 11:05:18 -07:00
Fraser Cormack c9a93c3e01 [RISCV] Lower vector shuffles to vrgather operations
This patch extends the lowering of RVV fixed-length vector shuffles to
avoid the default stack expansion and instead lower to vrgather
instructions.

For "permute"-style shuffles where one vector is swizzled, we can lower
to one vrgather. For shuffles involving two vector operands, we lower to
one unmasked vrgather (or splat, where appropriate) followed by a masked
vrgather which blends in the second half.

On occasion, when it's not possible to create a legal BUILD_VECTOR for
the indices, we use vrgatherei16 instructions with 16-bit index types.

For 8-bit element vectors where we may have indices over 255, we have a
fairly blunt fallback to the stack expansion to avoid custom-splitting
of the vector types.

To enable the selection of masked vrgather instructions, this patch
extends the various RISCVISD::VRGATHER nodes to take a passthru operand.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D100549
2021-04-19 11:13:13 +01:00
David Sherwood 83f5fa519e [CodeGen] Improve code generation for clamping of constant indices with scalable vectors
When trying to clamp a constant index into a scalable vector we can
test if the index is less than the minimum number of elements in the
vector. If so, we can simply return the index because we know it is
guaranteed to fit inside the vector.

Differential Revision: https://reviews.llvm.org/D100639
2021-04-19 08:34:17 +01:00
Fraser Cormack ec0f7c6923 [RISCV] Rerun stack test through update_llc_test_checks.py
Adjusts formatting of comments only. Just to reduce diffs in future
patches.
2021-04-16 11:08:58 +01:00
Jim Lin 2893570e86 [RISCV] Don't emit save-restore call if function is a interrupt handler
It has to save all caller-saved registers before a call in the handler.
So don't emit a call that save/restore registers.

Reviewed By: simoncook, luismarques, asb

Differential Revision: https://reviews.llvm.org/D100532
2021-04-16 12:54:47 +08:00
Fraser Cormack eae0ac3a1f [RISCV] Pre-commit vector shuffle test cases
This codegen will be improved by future patches.
2021-04-15 10:31:13 +01:00
ShihPo Hung d5e962f1f2 [RISCV] Implement COPY for Zvlsseg registers
When copying Zvlsseg register tuples, we split the COPY to NF whole register moves
as below:

  $v10m2_v12m2 = COPY $v4m2_v6m2 # NF = 2
=>
  $v10m2 = PseudoVMV2R_V $v4m2
  $v12m2 = PseudoVMV2R_V $v6m2

This patch copies forwardCopyWillClobberTuple from AArch64 to check
register overlapping.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D100280
2021-04-13 18:55:51 -07:00
Fraser Cormack d737c47137 [RISCV] Support vector SET[U]LT and SET[U]GE with splatted immediates
This patch adds more optimized codegen for the above SETCC forms,
by matching the '.vi' vector forms when the immediate is a 5-bit signed
immediate plus 1. The immediate can be decremented and the corresponding
SET[U]LE or SET[U]GT forms can be matched.

This work was left as a TODO from D94168.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D100096
2021-04-12 18:36:45 +01:00
Craig Topper ff902080a9 [RISCV] Use SLLI/SRLI instead of SLLIW/SRLIW for (srl (and X, 0xffff), C) custom isel on RV64.
We don't need the sign extending behavior here and SLLI/SRLI
are able to compress to C.SLLI/C.SRLI.
2021-04-11 13:59:51 -07:00
Craig Topper 3ae71226ef [RISCV] Drop earlyclobber constraint from vwadd(u).wx, vwsub(u).wx, vfwadd.wf and vfwsub.wf.
The first source has the same EEW as the destination and the other
source is a scalar so the overlap constraints don't apply to
the unmasked version.

For the masked version we have a constraint that the destination
can't be V0 so that covers the only overlap issue there.

Reviewed By: khchen

Differential Revision: https://reviews.llvm.org/D100217
2021-04-11 10:19:45 -07:00
Craig Topper bc0e052730 [RISCV] Teach targetShrinkDemandedConstant to preserve (and X, 0xffff) when zext.h is supported.
Similar to what we do for zext.w.

Disable the (srl (and X, 0xffff), C) custom isel when zext.h is
available.
2021-04-11 10:03:35 -07:00
Craig Topper 48d69edade [RISCV] Add i8 and i16 srli and srai tests to Zbb/Zbp test files. NFC
These require the input to be zero or sign extended. If we have
sext.b, sext.h or zext.h instructions we can use them. Otherwise
we need to use a pair of shifts to accomplish the zero/sign extend
and the final shift.

We don't currently use zext.h when it is available.
2021-04-11 10:00:38 -07:00
Fraser Cormack a5693445ca [RISCV] Support OR/XOR/AND reductions on vector masks
This patch adds RVV codegen support for OR/XOR/AND reductions for both
scalable- and fixed-length vector types. There are a few possible
codegen strategies for each -- vmfirst.m, vmsbf.m, and vmsif.m could be
used to some extent -- but the vpopc.m instruction was chosen since it
produces the scalar result in one instruction, after which scalar
instructions can finish off the computation.

The reductions are lowered identically for both scalable- and
fixed-length vectors, although some alternate strategies may be more
optimal on fixed-length vectors since it's cheaper to get the length of
those types.

Other reduction types were not deemed to be relevant for mask vectors.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D100030
2021-04-08 09:46:38 +01:00
Hsiangkai Wang ba72bdef32 [RISCV] Add scalable offset under very large stack size.
If the stack size is larger than 12 bits, we have to use a scratch
register to store the stack size. Before we introduce the scalable stack
offset, we could simplify

%0 = ADDI %stack.0, 0

=>

%scratch = ... # sequence of instructions to move the offset into
%%scratch
%0 = ADD %fp, %scratch

However, if the offset contains scalable part, we need to consider it.

%0 = ADDI %stack.0, 0

=>

%scratch = ... # sequence of instructions to move the offset into
%%scratch
%scratch = ADD %fp, %scratch
%scalable_offset = ... # sequence of instructions for vscaled-offset.
%0 = ADD/SUB %scratch, %scalable_offset

Differential Revision: https://reviews.llvm.org/D100035
2021-04-08 14:46:05 +08:00
Hsiangkai Wang b8cd668115 [NFC][RISCV] Add test for scalable offset under large stack size.
This test case shows that we access wrong stack slots when the frame
object has scalable offset under large stack size.

Differential Revision: https://reviews.llvm.org/D100084
2021-04-08 14:46:05 +08:00
Craig Topper 56ea2e2fdd [RISCV] Add a special case to lowerSELECT for select of 2 constants with a SETLT condition.
If the constants have a difference of 1 we can convert one to
the other by adding or subtracting the condition.

We have a DAG combine for this, but it only runs before type
legalization. If the select is introduced later during type
legalization or op legalization we will miss it.

We don't need a specific condition, but some conditions are
harder to materialize than others on RISCV. I know that SETLT
will be a single instruction and it is what is used by the
motivating pattern from signed saturating add/sub.

Differential Revision: https://reviews.llvm.org/D99021
2021-04-07 13:47:17 -07:00
Craig Topper f087d7544a [RISCV] Support vslide1up/down intrinsics for SEW=64 on RV32.
This can't use our normal strategy of splatting the scalar and using
a .vv operation instead of .vx.

Instead this patch bitcasts the vector to the equivalent SEW=32
vector and inserts the scalar parts using two vslide1up/down. We
do that unmasked and apply the mask separately at the end with
a vmerge.

For vslide1up there maybe some other options here like getting
i64 into element 0 and using vslideup.vi with this vector as
vd and the original source as vs1. Masking would still need to
be done afterwards.

That idea doesn't work for vslide1down. We need to slidedown and
then insert a single scalar at vl-1 which we could do with a
vslideup, but that assumes vl > 0 which I don't think we can assume.

The i32 double slide1down implemented here is the best I could come
up with and I just made vslide1up consistent.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D99910
2021-04-07 10:44:53 -07:00
Craig Topper 67953311e2 [SelectionDAG] Teach SelectionDAG::FoldConstantArithmetic to handle SPLAT_VECTOR
This allows FoldConstantArithmetic to handle SPLAT_VECTOR in
addition to BUILD_VECTOR. This allows it to support scalable
vectors. I'm also allowing fixed length SPLAT_VECTOR which is
used by some targets, but I'm not familiar enough to write tests
for those targets.

I had to block this function from running on CONCAT_VECTORS to
avoid calling getNode for a CONCAT_VECTORS of 2 scalars.
This can happen because the 2 operand getNode calls this
function for any opcode. Previously we were protected because
CONCAT_VECTORs of BUILD_VECTOR is folded to a larger BUILD_VECTOR
before that call. But it's not always possible to fold a CONCAT_VECTORS
of SPLAT_VECTORs, and we don't even try.

This fixes PR49781 where DAG combine thought constant folding
should be possible, but FoldConstantArithmetic couldn't do it.

Reviewed By: david-arm

Differential Revision: https://reviews.llvm.org/D99682
2021-04-07 10:03:33 -07:00
Craig Topper cb1028a0b9 [RISCV] When custom iseling masked stores, copy the mask into V0 instead of virtual register.
I missed a few intrinsics in 3dd4aa7d09
when I did this for masked loads and masked segment loads/stores.

Found while trying to share more code between these custom isel
functions.
2021-04-05 21:28:32 -07:00
Craig Topper 391514436d [RISCV] Add more RV32 vslide1up intrinsic test cases. NFC
For some reason we only had 1 test case. This synchronizes the
test with vslide1down so we have the same number of tests for both.
2021-04-05 17:03:52 -07:00
Fraser Cormack af3a839c70 [RISCV] Add support for bitcasts between scalars and fixed-length vectors
This patch supports bitcasts from scalar types to fixed-length vectors
and vice versa. It custom-lowers and custom-legalizes them to
EXTRACT_VECTOR_ELT/INSERT_VECTOR_ELT operations, using a single-element
vectors to hold the scalar where appropriate.

Previously, some of these would fail to select, others would be expanded
through stack loads and stores. Effort was made to ensure the codegen
avoids the stack for both legal and illegal scalar types.

Some of the codegen could be improved, but on first glance it looks like
a general optimization of EXTRACT_VECTOR_ELT when extracting an i64
element on RV32.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D99667
2021-04-05 17:21:55 +01:00
Fraser Cormack 3f0df4d7b0 [RISCV] Expand scalable-vector truncstores and extloads
Caught in internal testing, these operations are assumed legal by
default, even for scalable vector types. Expand them back into separate
truncations and stores, or loads and extensions.

Also add explicit fixed-length vector tests for these operations, even
though they should have been correct already.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D99654
2021-04-05 17:03:45 +01:00
Fraser Cormack 0d0514dd9b [RISCV] Add a test showing incorrect codegen
This patch adds a test which shows how the compiler incorrectly sets the
size and alignment of a stack object used to indirectly pass vector
types to functions.

In the particular example, the test passes a <4 x i8> vector type to a
function and creates a stack object of size and alignment equal to 4
bytes. However, the code generated to set up that parameter has been
scalarized and stores each element as individual XLEN-sized values. Thus
on RV32 this stores 16 bytes and on RV64 32 bytes, both of which clobber
the stack. Similarly, the alignment is set up as the alignment
of the vector type, which is not necessarily the natural alignment of XLEN.

Reviewed By: luismarques

Differential Revision: https://reviews.llvm.org/D95025
2021-04-05 11:51:03 +01:00
Craig Topper 4708a05da0 [RISCV] Use gorciw for i32 orc.b intrinsic when Zbp is enabled.
The W version of orc.b does not exist in Zbp so we need to use
gorci encoding. If we have Zbp, we can use gorciw which can avoid a
sext.w in some cases.
2021-04-04 17:14:28 -07:00
Craig Topper a0e611cf72 [RISCV] Add signext attribute to i32 orc.b test for RV64 to match other Zbb tests.
Shows the sext.w at the end that would show up in C code. I'm thinking
orc.b would preserve sign bits from it's input, but I'm not sure.
2021-04-02 16:49:53 -07:00
Levy Hsu f78d932cf2 [RISCV] Add IR intrinsics for Zbc extension
Head files are included in a separate patch in case the name needs to be changed.

RV32 / 64:
clmul
clmulh
clmulr

Differential Revision: https://reviews.llvm.org/D99711
2021-04-02 12:09:13 -07:00
Levy Hsu 944adbf285 Recommit "[RISCV] Add IR intrinsic for Zbb extension"
Forgot to amend the Author.

Original commit message:

Header files are included in a separate patch in case the name needs to be changed.

RV32 / 64:
orc.b

Differential Revision: https://reviews.llvm.org/D99320
2021-04-02 11:50:19 -07:00
Craig Topper 1f0b309f24 Revert "[RISCV] Add IR intrinsic for Zbb extension"
This reverts commit 1808194590.

I forgot to change the author.
2021-04-02 11:47:02 -07:00
Craig Topper 1808194590 [RISCV] Add IR intrinsic for Zbb extension
Header files are included in a separate patch in case the name needs to be changed.

RV32 / 64:
orc.b
2021-04-02 11:23:57 -07:00
Levy Hsu b001d574d7 [RISCV] Add IR intrinsic for Zbr extension
Implementation for RISC-V Zbr extension intrinsic.

Header files are included in separate patch in case the name needs to be changed

RV32 / 64:
        crc32b
        crc32h
        crc32w
        crc32cb
        crc32ch
        crc32cw

RV64 Only:
        crc32d
        crc32cd

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D99009
2021-04-02 10:58:45 -07:00
Craig Topper d7ffa82a8e [RISCV] Improve 64-bit integer constant materialization for more cases.
For positive constants we try shifting left to remove leading zeros
and fill the bottom bits with 1s. We then materialize that constant
shift it right.

This patch adds a new strategy to try filling the bottom bits with
zeros instead. This catches some additional cases.
2021-04-02 10:18:08 -07:00
Fraser Cormack 411673e769 [RISCV] Test llvm.experimental.vector.insert intrinsics on RV32
RV32 is able to use the llvm.experimental.vector.insert intrinsics too.
This patch ensures they're tested.

Reviewed By: khchen, asb

Differential Revision: https://reviews.llvm.org/D99655
2021-04-02 11:49:54 +01:00
Fraser Cormack 3b48d849d4 [RISCV] Optimize more redundant VSETVLIs
D99717 introduced some test cases which showed that the output of one
vsetvli into another would not be picked up by the RISCVCleanupVSETVLI
pass. This patch teaches the optimization about such a pattern. The
pattern is quite common when using the RVV vsetvli intrinsic to pass the
VL onto other intrinsics.

The second test case introduced by D99717 is left unoptimized by this
patch. It is a rarer case and will require us to rewire any uses of the
redundant vset[i]vli's output to the previous one's.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D99730
2021-04-02 10:04:07 +01:00
Fraser Cormack a4ac847c8e [RISCV] Add some tests showing vsetvli cleanup opportunities
Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D99717
2021-04-02 09:43:04 +01:00
Craig Topper 438b6dd3e5 [RISCV] Add missing nxvXf64 intrinsics tests cases for floating-point compare for RV32. 2021-04-01 20:57:13 -07:00
Craig Topper 5a9a8c7cd4 [RISCV] Add more nxvi64 vector intrinsic tests for RV32. NFC
This confirms we handle most instrutions gracefully. We do
currently fail for vslide1up and vslide1down though.
2021-04-01 20:34:28 -07:00
Craig Topper 0187c3a45c [RISCV] Add nxvXi64 test cases to the RV32 Zvamo intrinsic test files. NFC 2021-04-01 17:08:20 -07:00
Craig Topper 766d27dc85 [RISCV] Add isel patterns to handle vrsub intrinsic with 2 vector operands.
This occurs when we type legalize an i64 scalar input on RV32. We
need to manually splat, which requires a vector input. Rather
than special case this in lowering just pattern match it.
2021-04-01 14:10:21 -07:00
Craig Topper dbbc95e3e5 [RISCV] Use softPromoteHalf legalization for fp16 without Zfh rather than PromoteFloat.
The default legalization strategy is PromoteFloat which keeps
half in single precision format through multiple floating point
operations. Conversion to/from float is done at loads, stores,
bitcasts, and other places that care about the exact size being 16
bits.

This patches switches to the alternative method softPromoteHalf.
This aims to keep the type in 16-bit format between every operation.
So we promote to float and immediately round for any arithmetic
operation. This should be closer to the IR semantics since we
are rounding after each operation and not accumulating extra
precision across multiple operations. X86 is the only other
target that enables this today. See https://reviews.llvm.org/D73749

I had to update getRegisterTypeForCallingConv to force f16 to
use f32 when the F extension is enabled. This way we can still
pass it in the lower bits of an FPR for ilp32f and lp64f ABIs.
The softPromoteHalf would otherwise always give i16 as the
argument type.

Reviewed By: asb, frasercrmck

Differential Revision: https://reviews.llvm.org/D99148
2021-04-01 12:41:57 -07:00
Craig Topper d157e3f387 [RISCV] Fix handling of nxvXi64 vmsgt(u).vx intrinsics on RV32.
We need to splat the scalar separately and use .vv, but there is
no vmsgt(u).vv. So add isel patterns to select vmslt(u).vv with
swapped operands.

We also need to get VT to use for the splat from an operand rather
than the result since the result VT is nxvXi1.

Reviewed By: HsiangKai

Differential Revision: https://reviews.llvm.org/D99704
2021-04-01 10:38:05 -07:00
Craig Topper b7c2e577cc [RISCV] Add custom type legalization to form MULHSU when possible.
There's no target independent ISD opcode for MULHSU, so custom
legalize 2*XLen multiplies ourselves. We have to be a little
careful to prefer MULHU or MULHSU.

I thought about doing this in isel by pattern matching the
(add (mul X, (srai Y, XLen-1)), (mulhu X, Y)) pattern. I decided
against this because the add might become part of a chain of adds.
I don't trust DAG combine not to reassociate with other adds making
it difficult to find both pieces again.

Reviewed By: asb

Differential Revision: https://reviews.llvm.org/D99479
2021-04-01 10:15:55 -07:00
Craig Topper dadcd940f0 [RISCV] Add MULHU and MULHS tests with a constant operand. 2021-04-01 10:15:55 -07:00
Craig Topper d61b40ed27 [RISCV] Improve 64-bit integer materialization for some cases.
This adds a new integer materialization strategy mainly targeted
at 64-bit constants like 0xffffffff where there are 32 or more trailing
ones with leading zeros. We can materialize these by using an addi -1
and srli to restore the leading zeros. This matches what gcc does.

I haven't limited to just these cases though. The implementation
here takes the constant, shifts out all the leading zeros and
shifts ones into the LSBs, creates the new sequence, adds an srli,
and checks if this is shorter than our original strategy.

I've separated the recursive portion into a standalone function
so I could append the new strategy outside of the recursion. Since
external users are no longer using the recursive function, I've
cleaned up the external interface to return the sequence instead of
taking a vector by reference.

Reviewed By: asb

Differential Revision: https://reviews.llvm.org/D98821
2021-04-01 09:12:52 -07:00
Simonas Kazlauskas 777a58e05b Support {S,U}REMEqFold before legalization
This allows these optimisations to apply to e.g. `urem i16` directly
before `urem` is promoted to i32 on architectures where i16 operations
are not intrinsically legal (such as on Aarch64). The legalization then
later can happen more directly and generated code gets a chance to avoid
wasting time on computing results in types wider than necessary, in the end.

Seems like mostly an improvement in terms of results at least as far as x86_64 and aarch64 are concerned, with a few regressions here and there. It also helps in preventing regressions in changes like {D87976}.

Reviewed By: lebedev.ri

Differential Revision: https://reviews.llvm.org/D88785
2021-04-01 01:35:41 +03:00
Craig Topper 2a8b7cab6a [RISCV] Add RISCVISD opcodes for CLZW and CTZW.
Our CLZW isel pattern is quite easily broken by surrounding code
preventing it from matching sometimes. This usually results in
failing to remove the and X, 0xffffffff inserted by type
legalization. The add with -32 that type legalization also inserts
will often gets combined into other add/sub nodes. That doesn't
usually result in extra code when we don't use clzw.

CTTZ seems to be less fragile, but I wanted to keep it consistent
with CTLZ.

Reviewed By: asb, HsiangKai

Differential Revision: https://reviews.llvm.org/D99317
2021-03-31 09:40:07 -07:00
Craig Topper 04f10ab367 [RISCV] Add isel patterns to select vsub_vx intrinsic to vadd.vi if it uses a small enough immediate
Also modify the simm5_plus1 check because Imm-1 is UB if Imm happens
to be INT64_MIN. I don't think the compiler would optimize based on that in this
usage, but it could fail UBSan or -ftrapv.

Reviewed By: HsiangKai, frasercrmck

Differential Revision: https://reviews.llvm.org/D99637
2021-03-31 09:26:41 -07:00
Fraser Cormack 10fc6e4358 [RISCV] Add support for the stepvector intrinsic
This adds almost everything required for supporting the new stepvector
intrinsic on RVV. It is lowered to the existing VID_VL SDNode.

The only exception is a limitation that RV32 cannot yet lower the
intrinsic on i64 vectors. This is because the step operand is
(currently) required to be at least as large as the vector element type.
I will look into patching that out and loosening the requirement to only
an integer pointer type.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D99594
2021-03-31 11:41:17 +01:00
Craig Topper a33fcafaf0 [RISCV] Pass 'half' in the lower 16 bits of an f32 value when F extension is enabled, but Zfh is not.
Without Zfh the half type isn't legal, but it could still be
used as an argument/return in IR. Clang will not generate this today.

Previously we promoted the half value to float for arguments and
returns if the F extension is enabled but Zfh isn't. Then depending on
which ABI is enabled we would pass it in either an FPR or a GPR in
float format.

If the F extension isn't enabled, it would get passed in the lower
16 bits of a GPR in half format.

With this patch the value will always in half format and will be
in the lower bits of a GPR or FPR. This should be consistent
with where the bits are located when Zfh is enabled.

I've based this implementation off of how this is done on ARM.

I've manually nan-boxed the value to 32 bits using integer ops.
It looks like flw, fsw, fmv.s, fmv.w.x, fmf.x.w won't
canonicalize nans so should leave the value alone. I think those
are the instructions that could get used on this value.

Reviewed By: kito-cheng

Differential Revision: https://reviews.llvm.org/D98670
2021-03-30 09:47:54 -07:00
Craig Topper 3dd4aa7d09 [RISCV] When custom iseling masked loads/stores, copy the mask into V0 instead of virtual register.
This matches what we do in our isel patterns. In our internal
testing we've found this is needed to make the fast register
allocator happy at -O0. Otherwise it may assign V0 to an earlier
operand and find itself with no registers left when it reaches
the mask operand. By using V0 explicitly, the fast register allocator
will see it when it checks for phys register usages before it
starts allocating vregs. I'll try to update this with a test case.

Unfortunately, this does appear to prevent some instruction reordering
by the pre-RA scheduler which leads to the increased spills seen in
some tests. I suspect that problem could already occur for other
instructions that already used V0 directly.

There's a lot of repeated code here that could do with some
wrapper functions. Not sure if that should be at the level of the
new code that deals with V0. That would require multiple output
parameters to pass the glue, chain and register back. Maybe it
should be at a higher level over the entire set of push_backs.

Reviewed By: frasercrmck, HsiangKai

Differential Revision: https://reviews.llvm.org/D99367
2021-03-29 10:20:43 -07:00
Roger Ferrer Ibanez ef76a333fa [RISCV] Fix offset computation for RVV
In D97111 we changed the RVV frame layout when using sp or bp to address
the stack slots so we could address the emergency stack slot. The idea
is to put the RVV objects as far as possible (in offset terms) from the
frame reference register (sp / fp / bp).

When using fp this happens naturally because the RVV objects are already
the top of the stack and due to the constraints of RVV (VLENB being a
power of two >= 128) the stack remains aligned. The rest of this summary
does not apply to this case.

When using sp / bp we need to skip the non-RVV stack slots. The size of
the the non-RVV objects is computed subtracting the callee saved
register size (whose computation is added in D97111 itself) to the total
size of the stack (which does not account for RVV stack slots). However,
when doing so we round to 16 bytes when computing that size and we end
emitting a smaller offset that may belong to a scalar stack slot (see
D98801). So this change removes that rounding.

Also, because we want the RVV objects be between the non-RVV stack slots
and the callee-saved register slots, we need to make sure the RVV
objects are properly aligned to 8 bytes. Adding a padding of 8 would
render the stack unaligned. So when allocating space for RVV (only when
we don't use fp) we need to have extra padding that preserves the stack
alignment. This way we can round to 8 bytes the offset that skips the
non-RVV objects and we do not misalign the whole stack in the way. In
some circumstances this means that the RVV objects may have padding
before (=lower offsets from sp/bp) and after (before the CSR stack
slots).

Differential Revision: https://reviews.llvm.org/D98802
2021-03-29 17:03:49 +00:00
Roger Ferrer Ibanez 3abd0bacc2 [NFC][RISCV] Add test showing wrong stack slot for GPR and RVV spilled registers
This testcase shows that we attempt to assign the same offset sp + 16 to
two different stack objects.

The fix will come in a later change.

Differential Revision: https://reviews.llvm.org/D98801
2021-03-29 17:03:18 +00:00
Roger Ferrer Ibanez 96d14ff505 [NFC][RISCV] Pass file through update_llc_tests to fix whitespace issues
While addressing RVV frame layout issues I found this file had
whitespace differences that made diffs noisier than they should be.

Differential Revision: https://reviews.llvm.org/D98800
2021-03-29 17:02:47 +00:00
Bradley Smith 9745dce8c3 [SelectionDAG][AArch64][SVE] Perform SETCC condition legalization in LegalizeVectorOps
This is currently performed in SelectionDAGLegalize, here we make it also
happen in LegalizeVectorOps, allowing a target to lower the SETCC condition
codes first in LegalizeVectorOps and then lower to a custom node afterwards,
without having to duplicate all of the SETCC condition legalization in the
target specific lowering.

As a result of this, fixed length floating point SETCC nodes can now be
properly lowered for SVE.

Differential Revision: https://reviews.llvm.org/D98939
2021-03-29 15:32:25 +01:00
Craig Topper 5a79909a14 [RISCV] Add a RV64 mulhsu test case. NFC 2021-03-28 15:54:44 -07:00
Craig Topper 7b35932b51 [RISCV] Add test case for mulhsu.
We don't yet use mulhsu, but we should.
2021-03-28 11:03:39 -07:00
Craig Topper 5692fc38e0 [RISCV] Add a pattern for (sext_inreg (mul (and X, 0xffffffff), (and Y, 0xffffffff)), i32) to suppress MULW formation
We have a special pattern for
(mul (and X, 0xffffffff), (and Y, 0xffffffff)), to optimize the
ANDs to shift. But if a sext_inreg coms first, we'll form a MULW
and limit the effectiveness of the special match. So this patch
adds a larger pattern to suppress the MULW formation by emitting
a sext.w and then the same output we use for the
(mul (and X, 0xffffffff), (and Y, 0xffffffff)). This should all
get CSEd.

This is the issue I was trying to fix with D99029, but that affected
many more tests.
2021-03-27 15:37:18 -07:00
Zakk Chen 9049cf77e3 [RISCV] Add constraint for RVV indexed loads.
Add the constraint when destination EEW not equals the source EEW for
correctness.

The RVV spec has three register overlap rules and I implement the first
stricter constraint because the others are difficult to enforce.

Reviewed By: frasercrmck, craig.topper

Differential Revision: https://reviews.llvm.org/D98920
2021-03-26 07:23:24 -07:00
Craig Topper 8f62a80328 [RISCV] Optimize (and (shl GPR:, uimm5:), 0xffffffff) to use 2 shifts instead of 3.
The and would normally become SLLI+SRLI, giving us 2 SLLI+SRLI. We
can detect this and combine the 2 SLLIs into 1.
2021-03-25 23:31:01 -07:00
Craig Topper 9b3c0f9a54 [RISCV] Add Zbb+Zbt command lines to the signed saturing add/sub tests.
This will enable cmov to be used for select. I improve the codegen
of select_cc in D99021, but that patch doesn't work for cmov.
2021-03-25 17:25:36 -07:00
Craig Topper c40cea6f08 [RISCV] Teach targetShrinkDemandedConstant to preserve (and X, 0xffffffff).
We look for this pattern frequently in isel patterns so its a
good idea to try to preserve it.

This also let's us remove our special isel handling for srliw
and use a direct pattern match of (srl (and X, 0xffffffff), C)
since no bits will be removed from the and mask.

Differential Revision: https://reviews.llvm.org/D99042
2021-03-25 09:03:25 -07:00
Fraser Cormack 99211352c1 [RISCV] Optimize select-like vector shuffles
This patch adds a small optimization for vector shuffle lowering,
detecting shuffles which can be re-expressed as vector selects.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D99270
2021-03-25 11:39:57 +00:00
Fraser Cormack 1e56e8717f [RISCV] Pre-commit shuffle test cases for D99270 2021-03-25 10:41:40 +00:00
Fraser Cormack 321a71a772 [RISCV] Optimize BUILD_VECTOR sequences that reveal hidden splats
This patch adds further optimization techniques to RVV BUILD_VECTOR
lowering. It teaches the compiler to find splats of larger vector
element types "hidden" in smaller ones. For example, a v4i8 build_vector
(0x1, 0x2, 0x1, 0x2) could be splat as v2i16 0x0201. This is generally
more optimal than the dominant-element BUILD_VECTORs and so takes
priority.

This optimization is currently limited to all-constant-or-undef
BUILD_VECTORs as those were found to be the most common. There's no
reason this couldn't be extended to other BUILD_VECTORs, but the
additional bit-manipulation instructions may require more sophisticated
heuristics.

There are some cases where the materialization of the larger constant
takes more scalar instructions than it does to build the vector with
vector instructions. We could add heuristics to try and catch this.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D99195
2021-03-25 10:35:31 +00:00
Craig Topper 32f6a15dfd [RISCV] Add more tests that can be improved by D99042. 2021-03-25 00:02:42 -07:00
Craig Topper c8cf8bc7ec [RISCV] Add some 32-bit ctlz and cttz idiom tests to rv64zbb.ll. NFC
This implements various idioms using ctlz/cttz like Log2, Log2_Ceil,
findFirstSetBit, etc.

Some of these demonstrate that we fail to use clzw because the
idiom breaks the isel patterns we use. The isel pattern we use
is (add (cttz (and X, 0xffffffff)), -32). Some of the idioms
cause the constant on the add to be different.
2021-03-24 21:52:48 -07:00
Fraser Cormack feff66a082 [RISCV] Further optimize BUILD_VECTORs with repeated elements
This patch builds upon the initial BUILD_VECTOR work introduced in
D98700. It further optimizes the lowering of BUILD_VECTOR by using
VSELECT operations to effectively insert repeated elements into the
vector with relatively few instructions. This allows us to optimize more
BUILD_VECTORs without significantly increasing the size of the generated
code.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D98969
2021-03-23 14:14:48 +00:00
Fraser Cormack 5bfbd9d938 [RISCV] Optimize all-constant mask BUILD_VECTORs
This patch adds an optimization for mask-vector BUILD_VECTOR nodes whose
elements are all constants or undef. It lowers such operations by
building up the vector via a series of integer operations, in which
multiple mask elements are inserted into a vector at a time via
i8/i16/i32/i64 element types. The final result is then bitcast from that
integer vector.

We restrict this optimization in certain circumstances when optimizing
for size. If we are required to use more than one integer insert
operation, then it will likely increase code size compared with using a
load from a constant pool.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D98860
2021-03-23 10:11:19 +00:00
Craig Topper 728cd5dde7 [RISCV] Rename Zb* extension tests to use lower case 'Z' in file names.
As discussed in D99009
2021-03-22 19:17:04 -07:00
Craig Topper 294efcd6f7 [RISCV] Add support for fixed vector masked gather/scatter.
I've split the gather/scatter custom handler to avoid complicating
it with even more differences between gather/scatter.

Tests are the scalable vector tests with the vscale removed and
dropped the tests that used vector.insert. We're probably not
as thorough on the splitting cases since we use 128 for VLEN here
but scalable vector use a known min size of 64.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D98991
2021-03-22 10:17:30 -07:00
Luís Marques 20f845d7c9 [RISCV][NFC] Add test of stack slot sizes of large split arguments
Illustrates bug 49500 <https://bugs.llvm.org/show_bug.cgi?id=49500>.
2021-03-22 13:41:11 +00:00
luxufan 02ffbac844 [RISCV] remove redundant instruction when eliminate frame index
The reason for generating mv a0, a0 instruction is when the stack object offset is large then int<12>. To deal this situation, in the elimintateFrameIndex function, it will
create a virtual register, which needs the register scavenger to scavenge it. If the machine instruction that contains the stack object and the opcode is ADDI(the addi
was generated by frameindexNode), and then this instruction's destination register was the same as the register that was generated by the register scavenger, then the
mv a0, a0 was generated. So to eliminnate this instruction, in the eliminateFrameIndex function, if the instrution opcode is ADDI, then the virtual register can't be created.

Differential Revision: https://reviews.llvm.org/D92479
2021-03-21 18:54:00 +08:00
Craig Topper 27bc30c39d [RISCV] Add test case to show a case where (mul (and X, 0xffffffff), (and Y, 0xffffffff)) optimization does not improve code.
If the mul add two users, one of which was a sext.w, the mul
would also be selected to a MULW before our pattern runs. This
causes the ANDs to now be used by the already selected MULW and
the mul we still need to select. They are unneeded on the MULW
since MULW only reads the lower bits. So they get selected to
SLLI+SRLI for the MULW use. The use for the
(mul (and X, 0xffffffff), (and Y, 0xffffffff)) manages to reuse
the SLLI.

The end result is increased register pressure and no improvement
to how soon we can start the MULW.
2021-03-20 17:54:28 -07:00
Craig Topper 07ed62b7d5 [RISCV] Disable (mul (and X, 0xffffffff), (and Y, 0xffffffff)) optimization when Zba is enabled.
This optimization is trying to save SRLI instructions needed to
implement the ANDs. If we have zext.w we won't save anything.
Because we don't check that the multiply is the only user of the
AND we might even increase instruction count.
2021-03-20 15:31:45 -07:00
Craig Topper 0874281d60 [RISCV] Add Zba command lines to xaluo.ll. NFC
Some of the patterns end up with 32 to 64 bit zero extends on RV64
which can be handled by zext.w.
2021-03-20 15:31:45 -07:00
Craig Topper b0d8823a8a [RISCV] Add isel pattern to optimize (mul (and X, 0xffffffff), (and Y, 0xffffffff)) on RV64
This patterns computes the full 64 bit product of a 32x32 unsigned
multiply. This requires a two pairs of SLLI+SRLI to zero the
upper 32 bits of the inputs.

We can do better than this by using two SLLI to move the lower
bits to the upper bits then use MULHU to compute the product. This
is the high half of a full 64x64 product. Since we put 32 0s in the lower
bits of the inputs we know the 128-bit product will have zeros in the
lower 64 bits. So the upper 64 bits, which MULHU computes, will contain
the original 64 bit product we were after.

The same trick would work for (mul (sext_inreg X, i32), (sext_inreg Y, i32))
using MULHS, but sext_inreg is sext.w which is already one instruction so we
wouldn't save anything.

Differential Revision: https://reviews.llvm.org/D99026
2021-03-20 14:55:46 -07:00
Fraser Cormack d399b82e2a [RISCV] Maintain fixed-length info when optimizing BUILD_VECTORs
I'm not sure how I failed to notice this before, but when optimizing
dominant-element BUILD_VECTORs we would lower via the scalable container type,
which lost us the information about the fixed length of the vector types. By
lowering via the fixed-length type we can preserve that information and
eliminate redundant vsetvli instructions.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D98938
2021-03-19 17:21:06 +00:00
Fraser Cormack 3bffa2c2aa [RISCV] Add missing CHECKs to vector test
Since the "LMUL-MAX=2" output for some test functions differed between
RV32 and RV64, the update_llc_test_checks script failed to emit a
unified LMULMAX2 check for them. I'm not sure why it didn't warn about
this.

This patch also takes the opportunity to add unified RV32/RV64 checks to
help shorten the test file when the output for LMULMAX1 and LMULMAX2 is
identical but differs between the two ISAs.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D98944
2021-03-19 16:52:16 +00:00
Fraser Cormack 550292ecb1 [RISCV] Fix missing scalable->fixed-length vector conversion
Returning the scalable-vector container type would present problems when
the fixed-length INSERT_VECTOR_ELT was used by later operations.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D98776
2021-03-19 16:49:47 +00:00
Hsiangkai Wang aa8d33a6d6 [RISCV] Spilling for Zvlsseg registers.
For Zvlsseg, we create several tuple register classes. When spilling for
these tuple register classes, we need to iterate NF times to load/store
these tuple registers.

Differential Revision: https://reviews.llvm.org/D98629
2021-03-19 07:46:16 +08:00
Craig Topper 182b831aeb [DAGCombiner][RISCV] Teach visitMGATHER/MSCATTER to remove gather/scatters with all zeros masks that use SPLAT_VECTOR.
Previously only all zeros BUILD_VECTOR was recognized.
2021-03-18 15:34:14 -07:00
Fraser Cormack 3495031a39 [RISCV] Support scalable-vector masked scatter operations
This patch adds support for masked scatter intrinsics on scalable vector
types. It is mostly an extension of the earlier masked gather support
introduced in D96263, since the addressing mode legalization is the
same.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D96486
2021-03-18 10:17:50 +00:00
Fraser Cormack 0331399dc9 [RISCV] Support scalable-vector masked gather operations
This patch supports the masked gather intrinsics in RVV.

The RVV indexed load/store instructions only support the "unsigned unscaled"
addressing mode; indices are implicitly zero-extended or truncated to XLEN and
are treated as byte offsets. This ISA supports the intrinsics directly, but not
the majority of various forms of the MGATHER SDNode that LLVM combines to. Any
signed or scaled indexing is extended to the XLEN value type and scaled
accordingly. This is done during DAG combining as widening the index types to
XLEN may produce illegal vectors that require splitting, e.g.
nxv16i8->nxv16i64.

Support for scalable-vector CONCAT_VECTORS was added to avoid spilling via the
stack when lowering split legalized index operands.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D96263
2021-03-18 09:26:18 +00:00
Fraser Cormack c2b4600ec8 [RISCV] Support bitcasts of fixed-length mask vectors
Without this patch, bitcasts of fixed-length mask vectors would go
through the stack.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D98779
2021-03-18 08:52:42 +00:00
ShihPo Hung fca5d63aa8 [RISCV] Fix isel pattern of masked vmslt[u]
This patch changes the operand order of masked vmslt[u]
from (mask, rs1, scalar, maskedoff, vl)
to (maskedoff, rs1, scalar, mask, vl).

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D98839
2021-03-17 20:18:11 -07:00
Zakk Chen 9998b00c2e [RISCV] Update RVV shift intrinsic tests to use XLEN bit as shift amount.
Fix the unexpected of using op1's element type as shift amount type.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D98501
2021-03-17 10:47:49 -07:00
Craig Topper 696ddef569 [RISCV] Support masked load/store for fixed vectors.
Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D98561
2021-03-17 10:26:15 -07:00
Fraser Cormack 70251759a2 [RISCV] Optimize "dominant element" BUILD_VECTORs
This patch adds an optimization path for BUILD_VECTOR nodes where the
majority of the elements are identical. These can be splatted, with the
remaining elements patched up with INSERT_VECTOR_ELTs. The threshold can
be tweaked as required - it is currently conservative. Undef elements
are disregarded when judging the dominance of a particular element. This
allows them to be covered by the splat value.

In addition, vectors of 2 elements are always optimized to a splat (for
the upper element) and an insert at element zero.

This optimization is disabled when optimizing for size.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D98700
2021-03-17 10:09:04 +00:00
Fangrui Song 6ab8927931 [RISCV] Support clang -fpatchable-function-entry && GNU function attribute 'patchable_function_entry'
Similar to D72215 (AArch64) and D72220 (x86).

```
% clang -target riscv32 -march=rv64g -c -fpatchable-function-entry=2 a.c && llvm-objdump -dr a.o
...
0000000000000000 <main>:
       0: 13 00 00 00   nop
       4: 13 00 00 00   nop

% clang -target riscv32 -march=rv64gc -c -fpatchable-function-entry=2 a.c && llvm-objdump -dr a.o
...
00000002 <main>:
       2: 01 00         nop
       4: 01 00         nop
```

Recently the mainline kernel started to use -fpatchable-function-entry=8 for riscv (https://git.kernel.org/linus/afc76b8b80112189b6f11e67e19cf58301944814).

Differential Revision: https://reviews.llvm.org/D98610
2021-03-16 10:02:35 -07:00
Craig Topper 229eeb187d [RISCV] Look through copies when trying to find an implicit def in addVSetVL.
The InstrEmitter can sometimes insert a copy after an IMPLICIT_DEF
before connecting it to the vector instruction. This occurs when
constrainRegClass reduces to a class with less than 4 registers.
I believe LMUL8 on masked instructions triggers this since the
result can only use the v8, v16, or v24 register group as the mask
is using v0.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D98567
2021-03-16 07:59:09 -07:00
Craig Topper a33ce06cf5 [RISCV] Improve i32 UADDSAT/USUBSAT on RV64.
The default promotion uses zero extends that become shifts. We
cam use sign extend instead which is better for RISCV.

I've used two different implementations based on whether we
have minu/maxu instructions.

Differential Revision: https://reviews.llvm.org/D98683
2021-03-16 07:44:06 -07:00
Craig Topper 41759c3d92 [RISCV] Add RISCVISD::BR_CC similar to RISCVISD::SELECT_CC.
This allows me to introduce similar combines for branches as
we have recently added for SELECT_CC. Some of them are less
useful for standalone setccs and only help branch instructions.
By having a BR_CC node its easier to only affect branches.

I'm using CondCodeSDNode to make isel patterns easier to
write so we can refer to the codes by name. SELECT_CC uses a
constant instead.

I've translated the condition code just like SELECT_CC so
we need less patterns for the swapped conditions. This
includes special cases for X < 1 and X > -1 that get translated
to blez and bgez by using a 0 constant.

computeKnownBitsForTargetNode support for SELECT_CC is added
to allow MaskedValueIsZero to work for cases where the true
and false values of the SELECT_CC are setccs and the
result of the SELECT_CC is used by a BR_CC. This was needed
to avoid regressions in some of the overflow tests.

Reviewed By: luismarques

Differential Revision: https://reviews.llvm.org/D98159
2021-03-15 11:54:01 -07:00
Philipp Tomsich 018e96f71f [RISCV] Add isel-patterns to optimize (a < 1) into blez (a <= 0)
The following code-sequence showed up in a testcase (isolated from
SPEC2017) for if-conversion and vectorization when searching for the
maximum in an array:
        addi    a2, zero, 1
        blt     a1, a2, .LBB0_5
which can be expressed as `bge zero,a1,.LBB0_5`/`blez a1,/LBB0_5`.

More generally, we want to express (a < 1) as (a <= 0).

This adds the required isel-pattern and updates the testcases.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D98449
2021-03-15 11:32:43 -07:00
Fraser Cormack 0035decae7 [CodeGen] Fix issues with scalable-vector INSERT/EXTRACT_SUBVECTORs
This patch addresses a few issues when dealing with scalable-vector
INSERT_SUBVECTOR and EXTRACT_SUBVECTOR nodes.

When legalizing in DAGTypeLegalizer::SplitVecRes_INSERT_SUBVECTOR, we
store the low and high halves to the stack separately. The offset for
the high half was calculated incorrectly.

Additionally, we can optimize this process when we can detect that the
subvector is contained entirely within the low/high split vector type.
While this optimization is valid on scalable vectors, when performing
the 'high' optimization, the subvector must also be a scalable vector.
Note that the 'low' optimization is still conservative: it may be
possible to insert v2i32 into the low half of a split nxv1i32/nxv1i32,
but we can't guarantee it. It is always possible to insert v2i32 into
nxv2i32 or v2i32 into nxv4i32+2 as we know vscale is at least 1.

Lastly, in SelectionDAG::isSplatValue, we early-exit on the extracted subvector value
type being a scalable vector, forgetting that we can also extract a
fixed-length vector from a scalable one.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D98495
2021-03-15 17:04:21 +00:00
Craig Topper 3dc5b533e0 [RISCV] Improve legalization of i32 UADDO/USUBO on RV64.
The default legalization uses zero extends that require pair of shifts
on RISCV. Instead we can take advantage of the fact that unsigned
compares work equally well on sign extended inputs. This allows
us to use addw/subw and sext.w.

Reviewed By: luismarques

Differential Revision: https://reviews.llvm.org/D98233
2021-03-15 09:30:23 -07:00
Fraser Cormack 0c5b789c73 [RISCV] Support fixed-length vectors in the calling convention
This patch adds fixed-length vector support to the calling convention
when RVV is used to lower fixed-length vectors. The scheme follows the
regular vector calling convention for the argument/return registers, but
uses scalable vector container types as the LocVTs, and converts to/from
the fixed-length vector value types as required.

Fixed-length vector types may be split when the combination of minimum
VLEN and the maximum allowable LMUL is not large enough to fully contain
the vector. In this case the behaviour differs between fixed-length
vectors passed as parameters and as return values:
1. For return values, vectors must be passed entirely via registers or
via the stack.
2. For parameters, unlike scalar values, split vectors continue to be
passed by value, and are split across multiple registers until there are
no remaining registers. Thus vector parameters may be found partly in
registers and partly on the stack.

As with scalable vectors, the first fixed-length mask vector is passed
via v0. Split mask fixed-length vectors are passed first via v0 and then
via the next available vector register: v8,v9,etc.

The handling of vector return values uses all available argument
registers v8-v23 which does not adhere to the calling convention we're
supposedly implementing, but since this issue affects both fixed-length
and scalable-vector values, it was left as-is.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D97954
2021-03-15 10:43:51 +00:00
Hsiangkai Wang a81dff1e58 [RISCV] Support inline asm for vector instructions.
Types of fractional LMUL and LMUL=1 are all using VR register class. When
using inline asm, it will use the first type in the register class as the
type for the register. It is not necessary the same as the value type. We
need to use INSERT_SUBVECTOR/EXTRACT_SUBVECToR/BITCAST to make it legal
to put the value in the corresponding register class.

Differential Revision: https://reviews.llvm.org/D97480
2021-03-15 11:02:18 +08:00
luxufan a9b9c64fd4 change rvv frame layout
This patch change the rvv frame layout that proposed in D94465. In patch D94465, In the eliminateFrameIndex function,
to eliminate the rvv frame index, create temp virtual register is needed. This virtual register should be scavenged by class
RegsiterScavenger. If the machine function has other unused registers, there is no problem. But if there isn't unused registers,
we need a emergency spill slot. Because of the emergency spill slot belongs to the scalar local variables field, to access emergency
spill slot, we need a temp virtual register again. This makes the compiler report the "Incomplete scavenging after 2nd pass" error.
So I change the rvv frame layout as follows:

```
|--------------------------------------|
|   arguments passed on the stack      |
|--------------------------------------|<--- fp
|   callee saved registers             |
|--------------------------------------|
|   rvv vector objects(local variables |
|   and outgoing arguments             |
|--------------------------------------|
|   realignment field                  |
|--------------------------------------|
|   scalar local variable(also contains|
|   emergency spill slot)              |
|--------------------------------------|<--- bp
|   variable-sized local variables     |
|--------------------------------------|<--- sp
```

Differential Revision: https://reviews.llvm.org/D97111
2021-03-13 16:05:55 +08:00
luxufan 5ddbd1fdbb [RISCV] Remove redundancy -mattr=+d in test file
Differential Revision: https://reviews.llvm.org/D97177
2021-03-13 15:17:51 +08:00
Craig Topper 2ea7014089 [DAGCombiner] Use isConstantSplatVectorAllZeros/Ones instead of isBuildVectorAllZeros/Ones in visitMSTORE and visitMLOAD.
This allows us to optimize when the mask is a splat_vector in
addition to build_vector.
2021-03-12 12:14:56 -08:00
Craig Topper 02da5e21ce [RISCV] Add test cases for masked load/store with all ones/zeros mask. NFC
These should be removed for all zeros mask or optimized to
unmasked for all ones.
2021-03-12 12:14:56 -08:00
Craig Topper 51151828ac [RISCV] Teach normaliseSetCC to canonicalize X > -1 to X >= 0 and X < 1 to 0 >= X.
This allows the use of BGE with X0 instead of puting -1/1 in a
register.

Reviewed By: jrtc27

Differential Revision: https://reviews.llvm.org/D98542
2021-03-12 11:50:10 -08:00
Craig Topper d701e37b42 [RISCV] Add test cases for failure to optimize select_cc with X < 1 or X > -1. NFC
We can use BGE with X0 to implement these, but we currently put
1 or -1 into a register.
2021-03-12 11:19:04 -08:00
Craig Topper 45d3ed0304 [RISCV] Add support for scalable vector masked load/store.
Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D98460
2021-03-12 10:32:33 -08:00
Simonas Kazlauskas a2eca31da2 Test cases for rem-seteq fold with illegal types
This also briefly tests a larger set of architectures than the more
exhaustive functionality tests for AArch64 and x86.

As requested in D88785

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D98339
2021-03-12 16:28:04 +02:00
Fraser Cormack 641f5700f9 [RISCV] Optimize INSERT_VECTOR_ELT sequences
This patch optimizes the codegen for INSERT_VECTOR_ELT in various ways.
Primarily, it removes the use of vslidedown during lowering, and the
vector element is inserted entirely using vslideup with a custom VL and
slide index.

Additionally, lowering of i64-element vectors on RV32 has been optimized
in several ways. When the 64-bit value to insert is the same as the
sign-extension of the lower 32-bits, the codegen can follow the regular
path. When this is not possible, a new sequence of two i32 vslide1up
instructions is used to get the vector element into a vector. This
sequence was suggested by @craig.topper. From there, the value is slid
into the final position for more consistent lowering across RV32 and
RV64.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D98250
2021-03-12 09:13:38 +00:00
Craig Topper 1d26bbcf9b [RISCV] Return false from isShuffleMaskLegal except for splats.
We don't support any other shuffles currently.

This changes the bswap/bitreverse tests that check for this in
their expansion code. Previously we expanded a byte swapping
shuffle through memory. Now we're scalarizing and doing bit
operations on scalars to swap bytes.

In the future we can probably use vrgather.vx to do a byte swap
shuffle.
2021-03-11 20:02:49 -08:00
Craig Topper 2ac7a3cff1 [RISCV] Add test cases for fixed vector bitreverse, bswap, ctlz, cttz, and ctpop.
Codegen needs to be improved, but I wanted to check for crashes.
2021-03-11 15:56:32 -08:00
Craig Topper c82f442954 [RISCV] Support fixed vector copysign.
Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D98394
2021-03-11 09:57:24 -08:00
Craig Topper 0dff8a9627 [RISCV] Handle vmv.x.s intrinsic for i64 vectors on RV32.
Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D98372
2021-03-11 09:39:50 -08:00
Craig Topper 9c841cb8e8 [RISCV] Support extract_vector_elt for fixed and scalable masked registers.
This uses a really simple approach of converting to an i8 vector
and extracting. This is probably not the best approach especially
if you know the index is constant.

Other ideas:
-Store to stack temporary using vse1, load as scalar and shift.
-Sort of bitcast the vector to a vector of i8, slide down the
 appropriate 8 bit element, copy to scalar, shift down the
 correct bit within the 8 bits we extracted. Not exactly sure
 how to describe such a bitcast from i1 vector to i8 vector
 within the type system for elements less than 8.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D98310
2021-03-11 09:26:44 -08:00
Craig Topper e9426dfbae [ValueTypes][RISCV] Add MVT for v1f16.
RISCV makes all fixed vector MVTs with size less than or equal
to a command line option legal.

This didn't include v1f16 because it was missing but did include v1f32 and v1f64.

One test is affected where we did test this type, but it is a horizontal
reduction so it is non-sensical. Perhaps we should canonicalize that
away somewhere.

I'm not sure if we should be making v1 types legal, but this will at
least make RISCV consistent across all types.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D98365
2021-03-11 09:23:18 -08:00
Craig Topper 47c7a6cfed [RISCV] Merge fixed-vectors-int-splat-rv32.ll and fixed-vectors-int-splat-rv64.ll.
The vXi64 test cases no longer crash on rv32.
2021-03-10 20:15:26 -08:00
Craig Topper 85ae96d8b2 [RISCV] Add v2i64 _vi_ and _iv_ test cases to fixed-vectors-int.ll since we no longer crash.
I think we were missing some build_vector or other support and
skipped these test cases. They work now but don't generate
optimal code.
2021-03-10 19:19:47 -08:00
Craig Topper 0c73a506e8 [RISCV] Starting fixing issues that prevent us from testing vXi64 intrinsics on RV32.
Currently we crash in type legalization any time an intrinsic
uses a scalar i64 on RV32.

This patch adds support for type legalizing this to prevent
crashing. I don't promise that it uses the best possible codegen
just that it is functional.

This first version handles 3 cases. vmv.v.x intrinsic, vmv.s.x
intrinsic and intrinsics that take a scalar input, splat it and
then do some operation.

For vmv.v.x we'll either rely on hardware sign extension for
constants or we'll convert it to multiple splats and bit
manipulation.

For vmv.s.x we use a really unoptimal sequence inspired by what
we do for an INSERT_VECTOR_ELT.

For the third case we'll either try to use the .vi form for
constants or convert to a complicated splat and bitmanip and use
the .vv form of the operation.

I've renamed the ExtendOperand field to SplatOperand now use it
specifically for the third case. The first two cases are handled
by custom lowering specifically for those intrinsics.

I haven't updated all tests yet, but I tried to cover a subset
that includes single-width, widening, and narrowing.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D97895
2021-03-10 09:45:38 -08:00
Craig Topper 1e39118638 [RISCV] Manually split vector operands to VECREDUCE when handling vXi64 vectors on RV32.
The type legalizer will visit the result before the operands. To
avoid creating an illegal target specific node or falling back to
scalarization, we need to manually split vector operands.

This still doesn't handle the case of non-power of 2 operands
which need to be widened. I'm not sure the type legalizer is
ready for it. I think we would need to insert an
INSERT_SUBVECTOR with the power of 2 type we want, with an undef
first operand, and the non-power of 2 orignal operand as the vector
to insert. Then fill in the neutral elements into the elements the
padded elements. Alternatively we INSERT_SUBVECTOR into a neutral vector.
From there we carry on splitting if needed to get to a legal type
then do the target specific code.

The problem with this is the type legalizer doesn't know how to
widen an insert_subvector yet. We would need to add that including
the handling for a non-undef first vector.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D98292
2021-03-10 09:27:38 -08:00