Commit Graph

909 Commits

Author SHA1 Message Date
Craig Topper d53707508a [RISCV] Remove RISCVISD::VLE_VL/VSE_VL. Use intrinsics instead.
Similar to what we do for other loads/stores, use the intrinsic
version that we already have custom isel for.

Reviewed By: rogfer01

Differential Revision: https://reviews.llvm.org/D121166
2022-03-09 22:44:28 -08:00
Craig Topper 845bfcede1 [RISCV] Rename 'SplatOperand' to 'ScalarOperand'. NFC
vslide1up/down have this flag set, but the value isn't a splat.
Rename for clarity.

Reviewed By: khchen

Differential Revision: https://reviews.llvm.org/D121037
2022-03-07 11:28:32 -08:00
Craig Topper bd5f124716 [RISCV] Add SimplifyDemandedBits support for FSR/FSL/FSRW/FSLW. 2022-03-05 21:26:51 -08:00
Craig Topper 232f57319d [RISCV] Move vslide1up/down intrinsics into lowerVectorIntrinsicSplats. NFC
Rename to lowerVectorIntrinsicScalars.

This allows us to share the code that checks if the scalar needs
to be type legalized.
2022-03-04 18:21:53 -08:00
Craig Topper 3d4e83f17d [RISCV] With Zbb, fold (sext_inreg (abs X)) -> (max X, (negw X))
With Zbb, abs is expanded to (max X, neg) by default. If X has 33 or
more sign bits, we can expand it a little early using negw instead of
neg to save a sext_inreg. If X started as a 32 bit value, type
legalization would have inserted a sext before the abs so X having
33 sign bits should always be true.

Note: I've used ISD::FREEZE here since we increase the number of uses.
Our default expansion for ABS doesn't do that, but I think that's a bug.

We can't do this with custom type legalization because ISD::FREEZE
doesn't propagate sign bits so later DAG combine won't expand be
able to see optmize it.

Alives2 https://alive2.llvm.org/ce/z/Gx3RNe

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D120597
2022-03-03 15:42:29 -08:00
Craig Topper 6cb42cd666 [RISCV] More correctly ignore Zfinx register classes in getRegForInlineAsmConstraint.
Until Zfinx is supported in CodeGen we need to convert all Zfinx
register classes to GPR.

Remove the zfinx-types.ll test which didn't test anything meaningful
since -mattr=zfinx isn't implemented completely in llc.

Follow up to D93298.
2022-03-02 11:22:46 -08:00
Craig Topper a1f8349d77 [RISCV] Don't combine ROTR ((GREV x, 24), 16)->(GREV x, 8) on RV64.
This miscompile was introduced in D119527.

This was a special pattern for rotate+bswap on RV32. It doesn't
work for RV64 since the rotate needs to be half the bitwidth. The
equivalent pattern for RV64 is ROTR ((GREV x, 56), 32) so match
that instead.

This could be generalized further as noted in the new FIXME.

Reviewed By: Chenbing.Zheng

Differential Revision: https://reviews.llvm.org/D120686
2022-03-02 09:47:06 -08:00
Shao-Ce SUN 0e38b29543 [RISCV] add the MC layer support of Zfinx extension
This patch added the MC layer support of Zfinx extension.

Authored-by: StephenFan
Co-Authored-by: Shao-Ce Sun

Reviewed By: asb

Differential Revision: https://reviews.llvm.org/D93298
2022-03-02 14:25:19 +08:00
Craig Topper b9d6e8c441 [RISCV] Lower VECTOR_SPLICE to RVV instructions.
This lowers VECTOR_SPLICE of scalable vectors to a slidedown follow by a slideup.
Fixed vectors are encouraged to use shufflevector instruction. The equivalent patch
for fixed vectors is D119039.

I've used a tail agnostic slidedown and limited the VL to only the
elements that will not be overwritten by the slideup. The slideup
uses VLMax for its VL. It unfortunately uses tail undisturbed policy
but it isn't required as there is no tail. We just need the merge
operand to carry the bits for the lower portion of the result.

Care was taken to ensure that either the slideup or slidedown will
be able to use a .vi instruction when the immediate is small. Which
one uses the immediate depends on the sign of the immediate.

Reviewed By: frasercrmck, ABataev

Differential Revision: https://reviews.llvm.org/D119303
2022-03-01 10:10:13 -08:00
Craig Topper e83db8c001 [RISCV] Only enable combineROTR_ROTL_RORW_ROLW with Zbp.
I think the immediate values we check for on the GREV nodes already
protect this, but better to be explicit.
2022-02-28 12:47:36 -08:00
Craig Topper b083157b7b [RISCV] Don't call combineROTR_ROTL_RORW_ROLW for SLLW/SRLW/SRAW nodes. NFC
I think the function does the correct thing internally, but it's
confusing to read.
2022-02-28 11:05:10 -08:00
Craig Topper f46890711f [RISCV] Custom type legalize i32 ISD::ABS on RV64 without Zbb.
Default type legalization will create sext_inreg+abs, but we may
not be able to remove the sext_inreg.

Instead this patch expands abs during type legalization to
Y = sraiw X, 31; subw(xor X, Y), Y) which doesn't require the input
to be sign extended.

This gives a big improvement for some neg-abs tests where the
abs is used more than the the neg. Previously the abs was expanded
a different way before and after type legalization. Now they are
expanded in a similar way enabling more CSE.

Reviewed By: asb

Differential Revision: https://reviews.llvm.org/D120636
2022-02-28 09:30:27 -08:00
Chenbing Zheng b20e80aa59 [RISCV] DAG Combine vcpop and vfirst with VL=0 to li imm
vcpop and vfirst are still useful when VL=0.
vcpop equivalents to li 0 and vfirst equivalents to li -1,
since no mask elements are active.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D120302
2022-02-25 14:44:25 +08:00
Craig Topper a975ca97c3 [RISCV] Fold (sext_inreg (fmv_x_anyexth X), i16) -> (fmv_x_signexth X).
Add a new ISD opcode to represent the sign extending behavior of
vmv.x.h. Keep the previous anyext opcode to allow the existing
(fmv_x_anyexth (fmv_h_x X)) combine to keep working without needing
to generate a sign extend.

For fmv.x.w we are able to match the sext_inreg in an isel pattern,
but a 16-bit sext_inreg is lowered to a shift pair before isel. This
seemed like a larger match than we should do in isel.

Reviewed By: asb

Differential Revision: https://reviews.llvm.org/D118974
2022-02-24 09:19:01 -08:00
Shao-Ce SUN 78b5f0fb05 [NFC][RISCV] Reuse ISD::NodeType in float extension
Reviewed By: asb

Differential Revision: https://reviews.llvm.org/D120412
2022-02-24 19:57:55 +08:00
Craig Topper 5b7ac107b1 [RISCV] Use SelectionDAG::getFreeze to simplify some code. NFC 2022-02-23 21:13:01 -08:00
Craig Topper c7d6448d03 [DAGCombiner][TargetLowering] Pass SDValue by value to isMulAddWithConstProfitable.
Internally to DAGCombiner the SDValues were passed by non-const
reference despite not being modified. They were then passed by
const reference to TLI.

This patch passes them by value which is consistent with the vast
majority of code.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D120420
2022-02-23 12:40:45 -08:00
Alex Bradbury c5bcfb983e [RISCV] Avoid infinite loop between DAGCombiner::visitMUL and RISCVISelLowering::transformAddImmMulImm
See https://github.com/llvm/llvm-project/issues/53831 for a full discussion.

The basic issue is that DAGCombiner::visitMUL and
RISCVISelLowering;:transformAddImmMullImm get stuck in a loop, as the
current checks in transformAddImmMulImm aren't sufficient to avoid all
cases where DAGCombiner::isMulAddWithConstProfitable might trigger a
transformation. This patch makes transformAddImmMulImm bail out if C0
(the constant used for multiplication) has more than one use.

Differential Revision: https://reviews.llvm.org/D120332
2022-02-23 11:05:46 +00:00
Zakk Chen f7dfc5d1af [RISCV] Optimize tail agnostic vmv.s.x which don't need to select tail value.
Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D120250
2022-02-21 14:53:37 -08:00
Craig Topper 90d240553d [RISCV] Teach shouldSinkOperands to sink splat operands of vp.fma intrinsics.
Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D120167
2022-02-21 11:52:59 -08:00
Craig Topper bbee9e77f3 [RISCV] Match shufflevector corresponding to slideup.
This generalizes isElementRotate to work when there's only a single
slide needed. I've removed matchShuffleAsSlideDown which is now
redundant.

Reviewed By: frasercrmck, khchen

Differential Revision: https://reviews.llvm.org/D119759
2022-02-17 08:19:10 -08:00
Zakk Chen eeb7754f68 [RISCV] Add the passthru operand for vmv.vv/vmv.vx/vfmv.vf IR intrinsics.
Add the passthru operand for
VMV_V_X_VL, VFMV_V_F_VL and SPLAT_VECTOR_SPLIT_I64_VL also.

The goal is support tail and mask policy in RVV builtins.
We focus on IR part first.
If the passthru operand is undef, we use tail agnostic, otherwise
use tail undisturbed.

Reviewed By: rogfer01

Differential Revision: https://reviews.llvm.org/D119688
2022-02-17 06:38:14 -08:00
Craig Topper cfbbcc544c [RISCV] Improve lowering of SHL_PARTS/SRL_PARTS/SRA_PARTS.
Part of the shift lowering creates a (sub XLEN-1, ShAmt). When this
value is used we know that ShAmt is [0..XLEN-1]. Since XLEN is a power
of 2 we can replace the sub with an xor. This allows us to use XORI
instead of LI+SUB.

Reviewed By: asb

Differential Revision: https://reviews.llvm.org/D119411
2022-02-16 09:22:11 -08:00
Zakk Chen b784719904 [RISCV] Add the passthru operand for RVV nomask binary intrinsics.
The goal is support tail and mask policy in RVV builtins.
We focus on IR part first.
If the passthru operand is undef, we use tail agnostic, otherwise
use tail undisturbed.

Add passthru operand for VSLIDE1UP_VL and VSLIDE1DOWN_VL to support
i64 scalar in rv32.

The masked VSLIDE1 would only emit mask undisturbed policy regardless
of giving mask agnostic policy until InsertVSETVLI supports mask agnostic.

Reviewed by: craig.topper, rogfer01

Differential Revision: https://reviews.llvm.org/D117989
2022-02-15 18:36:18 -08:00
Craig Topper ab6e02dded [RISCV] Match vwmulsu_vx with scalar splat input.
This is a more generic version of D119110 that uses MaskedValueIsZero
to do the matching and SimplifyDemandedBits to remove any unneeded
AND instructions.

Tests were taken from D119110.

Reviewed By: Chenbing.Zheng

Differential Revision: https://reviews.llvm.org/D119622
2022-02-15 08:45:21 -08:00
Craig Topper 478c237e21 [RISCV] Fix incorrect extend type in vwmulsu combine.
While matching widening multiply, if we matched an extend from i8->i32,
i16->i64 or i8->i64, we need to reintroduce a narrower extend. If we're
matching a vwmulsu we need to use a sext for op0 and a zext for op1.

This bug exists in LLVM 14 and will need to be backported.

Differential Revision: https://reviews.llvm.org/D119618
2022-02-12 12:47:20 -08:00
Chenbing.Zheng 9e975e558b [RISCV][NFC] Move some combine patterns to DAG combine.
Move some combine patterns to DAG combine,and
it dealt with fixme left in RISCVInstrInfoZb.td.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D119527
2022-02-12 02:52:21 +00:00
Craig Topper b0e77d5e48 [RISCV] Lower the shufflevector equivalent of vector.splice
We can lower a vector splice to a vslidedown and a vslideup.

The majority of the matching code here came from X86's code for matching
PALIGNR and VPALIGND/Q.

The slidedown and slideup lowering don't really require it to be concatenation,
but it happened to be an interesting pattern with existing analysis code I
could use.

This helps with cases where the scalar loop optimizer forwarded a load
result from a previous loop iteration. For example, this happens if the
loop uses x[i] and x[i+1] on the same iteration. The scalar optimizer
will forward x[i+1] load from the previous loop to satisfy x[i] on this
loop. When this get vectorized it results in one element of a vector
being forwarded from the previous loop to be concatenated with elements
loaded on this iteration.

Whether that's more efficient than doing a shifted loaded or reloading
the single scalar and using vslide1up is an interesting question.
But that's not something the backend can help with.

Reviewed By: khchen

Differential Revision: https://reviews.llvm.org/D119039
2022-02-10 09:39:35 -08:00
Craig Topper b861ddf365 [RISCV] Move the creation of VLMaxSentinel to isel. Use X0 during lowering.
The VLMaxSentinel is represented as TargetConstant, but that's included
in isa<ConstantSDNode>. To keep constant VLs and VLMax separate as long
as possible, use the X0 register during lowering and only convert to
VLMaxSentinel during isel.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D118845
2022-02-10 09:28:44 -08:00
Craig Topper 279b3b8179 [RISCV][VP] Lower VP_FMA to RVV instructions.
We already had FMA_VL node, but we didn't have masked patterns.
I have not added the fneg variations. I'll do those after I add
llvm.vp.fneg.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D119196
2022-02-09 11:33:12 -08:00
Craig Topper 63e711549c [RISCV] Lower VP_FNEG to RVV instructions
Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D119269
2022-02-09 10:56:39 -08:00
Fraser Cormack 62c4ac764b [RISCV] Optimize splats of extracted vector elements
This patch adds an optimization to splat-like operations where the
splatted value is extracted from a identically-sized vector. On RVV we
can splat that via vrgather.vx/vrgather.vi without dropping to scalar
beforehand.

We do have a similar VECTOR_SHUFFLE-specific optimization but that only
works on fixed-length vector types and for those with a constant splat
lane. This patch extends this optimization to make it work on
scalable-vector types and on unknown extract indices.

It is performed during fixed-vector BUILD_VECTOR lowering and during a
new DAGCombine on SPLAT_VECTOR for scalable vectors.

Reviewed By: craig.topper, khchen

Differential Revision: https://reviews.llvm.org/D118456
2022-02-08 10:35:25 +00:00
wangpc c53d99c37d [RISCV] Split f64 undef into two i32 undefs
So that no store instruction will be generated.

Reviewed By: asb

Differential Revision: https://reviews.llvm.org/D118222
2022-02-08 13:42:15 +08:00
Craig Topper c1cef111a3 Revert "[RISCV] Fold (sext_inreg (fmv_x_anyexth X), i16) -> (fmv_x_signexth X)."
This reverts commit 673d68cd92.

This hadn't been reviewed yet.
2022-02-05 12:51:01 -08:00
Craig Topper 673d68cd92 [RISCV] Fold (sext_inreg (fmv_x_anyexth X), i16) -> (fmv_x_signexth X).
Add a new ISD opcode to represent the sign extending behavior of
vmv.x.h. Keep the previous anyext opcode to allow the existing
(fmv_x_anyexth (fmv_h_x X)) combine to keep working without needing
to generate a sign extend.

For fmv.x.w we are able to match the sext_inreg in an isel pattern,
but a 16-bit sext_inreg is lowered to a shift pair before isel. This
seemed like a larger match than we should do in isel.

Differential Revision: https://reviews.llvm.org/D118974
2022-02-05 12:42:12 -08:00
Craig Topper 234e54bdd8 [RISCV] Add more types of shuffles isShuffleMaskLegal.
Add the vslidedown and interleave patterns that I recently implemented.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D118952
2022-02-04 09:13:13 -08:00
Craig Topper c83905a308 [RISCV] Add inline expansion for vector fround.
This avoids a crash for scalable vectors and or scalarization for
fixed vectors.

The algorithm is different enough that I don't think it makes sense
to merge with ceil/floor/trunc. Algorithm is adapted from gcc's X86
SSE2 output.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D117247
2022-02-04 09:12:09 -08:00
Craig Topper 2349fb0312 [RISCV] Remove RISCVISD::SPLAT_VECTOR_I64 in favor of RISCVISD::VMV_V_X_VL.
SPLAT_VECTOR_I64 has the same semantics as RISCVISD::VMV_V_X_VL, it
just assumed VLMax instead of carrying a VL operand.

Include order of RISCVInstrInfoVSDPatterns.td and RISCVInstrInfoVVLPatterns.td
has been swapped to avoid moving riscv_vmv_v_x_vl into
RISCVInstrInfoVSDPatterns.td and to allow moving other "_vl" SDNodes back to
RISCVInstrInfoVVLPatterns.td

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D118841
2022-02-03 08:30:25 -08:00
Craig Topper abc6716038 [RISCV] Remove unused variables. NFC 2022-02-02 19:23:16 -08:00
Craig Topper f1720abb54 [RISCV] Cleanup some places that assumed VLMaxSentinel and -1 constant mean the same thing. NFCI
VLMaxSentintel happens to be represented as -1 TargetConstant. A user
provided -1 would be an ISD::Constant. We shouldn't assume that they
are the same thing. I'm still not entirely convinced that we should be
treating -1 from the user as VLMAX.

Also fix one place that failed to use XLenVT for the VLMaxSentinel,
using MVT::i64 in code that only executes on RV32.
2022-02-02 12:23:12 -08:00
Craig Topper b73d151a11 [RISCV] Add DAG combines to transform ADD_VL/SUB_VL into widening add/sub.
This adds or reuses ISD opcodes for vadd.wv, vaddu.wv, vadd.vv, vaddu.vv
and a similar set for sub.

I've included support for narrowing scalar splats that have known
sign/zero bits similar to what was done for MUL_VL.

The conversion to vwadd.vv proceeds in two phases. First we'll form
a vwadd.wv by narrowing one of the operands. Then we'll visit the
vwadd.wv to try to narrow the other operand. This turned out to be
simpler than catching all the cases in one step. The forming of of
vwadd.wv can happen for either operand for add, but only the right
hand side for sub since sub isn't commutable.

An interesting quirk is that ADD_VL and VZEXT_VL/VSEXT_VL are formed
during vector op legalization, but VMV_V_X_VL isn't usually formed
until op legalization when BUILD_VECTORS are handled. This leads to
VWADD_W_VL forming in one DAG combine round, and then a later DAG combine
round sees the VMV_V_X_VL and needs to commute the operands to get the
splat in position. This alone necessitated a VWADD_W_VL combine function
which made forming vwadd.vv in two stages an easy choice.

I've left out trying hard to form vwadd.wx instructions for now. It would
only save an extend in the scalar domain which isn't as interesting.

Might need to review the test coverage a bit. Most of the vwadd.wv
instructions are coming from vXi64 tests on rv64. The tests were
copy pasted from the existing multiply tests.

Reviewed By: rogfer01

Differential Revision: https://reviews.llvm.org/D117954
2022-02-02 10:03:08 -08:00
Craig Topper 5a5037c602 [RISCV] Fix some 80 column violations in ComputeNumSignBitsForTargetNode. NFC 2022-02-01 21:43:11 -08:00
Craig Topper 2e45e8abb1 [RISCV] Add a fatal error if ISD::VSCALE is used with Zvl32b.
We convert VLEN to vscale by dividing by RVVBitsPerBlock which is
currently 64. This is only correct if VLEN is evenly divisible by
64. With only Zvl32b we can't assume that.

This patch adds a fatal_error to prevent generating code that may
be broken.

We probably need to look at how we size stack frame objects too.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D118583
2022-01-31 09:13:14 -08:00
Craig Topper 09606d6a63 [RISCV] Update the computeKnownBitsForTargetNode for RISCVISD::READ_VLENB to consider Zve/Zvl.
We had previously hardcoded this to assume that vector registers
are 128 bits. This was true when only V existed, but after Zve
extensions were added this became incorrect.

This patch adjusts it to support 128, 64, or 32 bit vectors depending
on Zvl. The 128-bit limit is artificial, but we don't have any test
coverage showing that we larger values so I was being conservative.

None of our lit tests depend on this code today due to the custom
lowering of ISD::VSCALE that inserts the appropriate left or right
shift to convert from VLENB to VSCALE. That code was added after
this code in computeKnownBitsForTargetNode.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D118582
2022-01-31 09:13:14 -08:00
Nikita Popov 0801940c17 [RISCV] Avoid pointer element type access for masked atomicrmw intrinsics
masked.atomicrmw.*.i32 intrinsics access an i32 (and then possibly
mask it), so hardcode MVT::i32 as the access type here, rather than
determining it from the pointer element type.

Differential Revision: https://reviews.llvm.org/D118336
2022-01-31 09:28:39 +01:00
Craig Topper 5fbc3cda9e [RISCV] Use existing variable intead of calling getOperand again. NFCI
This is a slight change because I'm using the ANY_EXTEND result
instead of the original operand, but getNode should constant fold.

While there, add a comment about why the code specifically checks
for a ConstantSDNode.
2022-01-30 18:42:19 -08:00
Craig Topper 744be8c502 [RISCV] Lower riscv_zip/unzip intrinsic to RISCVISD::SHFL/UNSHFL.
These are special versions of the more general shfli/unshfli
instructions. We can use the general ISD opcodes with the correct
immediates.
2022-01-30 13:27:41 -08:00
Craig Topper e1075186a6 [RISCV] Custom lower brev8 intrinsic to RISCVISD::GREV.
We can use the RISCVISD::GREV encoding that swaps the bits in
each byte.  This allows it to use the existing computeKnownBits
support for RISCVISD::GREV.
2022-01-30 12:41:09 -08:00
Craig Topper 524545317c [RISCV] Remove RISCVISD::BREV8 and use RISCVISD::GREV instead.
We already have an ISD opcode for the more general GREV/GREVI
instructon. We can just use it with the encoding that corresponds
to the behavior of brev8. This is similar to what we do for orc.b
where we use the GORC ISD opcode.
2022-01-29 22:45:43 -08:00
Craig Topper d8f929a567 [RISCV] Custom legalize BITREVERSE with Zbkb.
With Zbkb, a bitreverse can be split into a rev8 and a brev8.

Reviewed By: VincentWu

Differential Revision: https://reviews.llvm.org/D118430
2022-01-28 23:11:12 -08:00
jacquesguan 1276678982 [RISCV] Improve extract_vector_elt for fixed mask registers.
Now the backend promotes mask vector to an i8 vector and extract element from that. We could bitcast to a widen element vector, and extract from it to GPR, then use I instruction to extract the certain bit.

Differential Revision: https://reviews.llvm.org/D117389
2022-01-29 11:07:53 +08:00
Craig Topper ea05ee9059 [RISCV] Preserve VL when truncating i64 gather/scatter indices on RV32.
We were creating a truncate with the default for the type, but for
VP intrinsics we have a VL that we should use.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D118406
2022-01-28 09:25:30 -08:00
Kito Cheng a9d5bb926d [RISCV] Use __extendhfsf2/__truncsfhf2 for fp16 <-> fp32
`__gnu_h2f_ieee` and `__gnu_f2h_ieee` are introduce by ARM and set that as
default name for fp16 and fp32 conversion in LLVM.

However RISC-V GCC using default naming scheme for that, which is
`__extendhfsf2` and `__truncsfhf2` for that, that cause runtime ABI
incompatible issue.

Although we didn't have formal runtime ABI spec to specify those naming
convention yet, but I think it would be great to fix the incompatible
issue first.

And I've plan to create a runtime ABI spec undere psABI spec this year.

Reviewed By: asb

Differential Revision: https://reviews.llvm.org/D118207
2022-01-29 00:01:00 +08:00
Chenbing.Zheng 6d6c44a3f3 [RISCV] Add support for matching vwmulsu from fixed vectors
According to riscv-v-spec-1.0, widening signed(vs2)-unsigned integer multiply
vwmulsu.vv vd, vs2, vs1, vm # vector-vector
vwmulsu.vx vd, vs2, rs1, vm # vector-scalar

It is worth noting that signed op is only for vs2.
For vwmulsu.vv, we can swap two ops, and don't care which is sign extension,
but for vwmulsu.vx signExt can not be a vector extended from scalar (rs1).
I specifically added two functions ending with _swap in the test case.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D118215
2022-01-28 02:33:30 +00:00
Fraser Cormack 84e85e025e [SelectionDAG][VP] Provide expansion for VP_MERGE
This patch adds support for expanding VP_MERGE through a sequence of
vector operations producing a full-length mask setting up the elements
past EVL/pivot to be false, combining this with the original mask, and
culminating in a full-length vector select.

This expansion should work for any data type, though the only use for
RVV is for boolean vectors, which themselves rely on an expansion for
the VSELECT.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D118058
2022-01-27 09:00:41 +00:00
Wu Xinlong 615d71d9a3 [RISCV][CodeGen] Implement IR Intrinsic support for K extension
This revision implements IR Intrinsic support for RISCV Scalar Crypto extension according to the specification of version [[ https://github.com/riscv/riscv-crypto/releases/tag/v1.0.0-scalar | 1.0]]
Co-author:@ksyx & @VincentWu & @lihongliang & @achieveartificialintelligence

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D102310
2022-01-27 15:53:35 +08:00
Craig Topper f487a76430 [RISCV] Add hasStdExtZbp() to hasAndNotCompare. 2022-01-26 13:54:05 -08:00
Benjamin Kramer f15014ff54 Revert "Rename llvm::array_lengthof into llvm::size to match std::size from C++17"
This reverts commit ef82063207.

- It conflicts with the existing llvm::size in STLExtras, which will now
  never be called.
- Calling it without llvm:: breaks C++17 compat
2022-01-26 16:55:53 +01:00
serge-sans-paille ef82063207 Rename llvm::array_lengthof into llvm::size to match std::size from C++17
As a conquence move llvm::array_lengthof from STLExtras.h to
STLForwardCompat.h (which is included by STLExtras.h so no build
breakage expected).
2022-01-26 16:17:45 +01:00
Zakk Chen 510710d037 [RISCV][NFC] Add getVLOperand for RVV intrinsics.
Use the VLOperand information to get the VL.

Differential Revision: https://reviews.llvm.org/D118156
2022-01-25 17:37:58 -08:00
Zakk Chen 9273378b85 [RISCV] Add the passthru operand for RVV nomask load intrinsics.
The goal is support tail and mask policy in RVV builtins.
We focus on IR part first.
If the passthru operand is undef, we use tail agnostic, otherwise
use tail undisturbed.

Co-Authored-by: Hsiangkai Wang <Hsiangkai@gmail.com>

Reviewers: craig.topper, frasercrmck

Differential Revision: https://reviews.llvm.org/D117647
2022-01-25 17:31:36 -08:00
eopXD b089e4072a [RISCV] Don't allow i64 vector div by constant to use mulh with Zve64x
EEW=64 of mulh and its vairants requires V extension.

Authored by: Craig Topper <craig.topper@sifive.com> @craig.topper

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D117947
2022-01-25 09:55:05 -08:00
Nikita Popov aa97bc116d [NFC] Remove uses of PointerType::getElementType()
Instead use either Type::getPointerElementType() or
Type::getNonOpaquePointerElementType().

This is part of D117885, in preparation for deprecating the API.
2022-01-25 09:44:52 +01:00
Fraser Cormack d42678b453 [RISCV] Add side-effect-free vsetvli intrinsics
This patch introduces new intrinsics that enable the use of vsetvli in
contexts where only the returned vector length is of interest. The
pre-existing intrinsics are marked with side-effects, which prevents
even trivial optimizations on/across them.

These intrinsics are intended to be used in situations where the vector
length is fed in turn to RVV intrinsics or to vector-predication
intrinsics during loop vectorization, for example. Those codegen paths
ensure that instructions are generated with their own implicit vsetvli,
so the vector length and vtype can be relied upon to be correct.

No corresponding C builtins are planned at this stage, though that is a
possibility for the future if the need arises.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D117910
2022-01-24 13:52:08 +00:00
Fraser Cormack af773a1818 [RISCV][VP] Lower VP_MERGE to RVV instructions
This patch adds lowering of the llvm.vp.merge.* intrinsic
(ISD::VP_MERGE) to RVV vmerge/vfmerge instructions. It introduces a
special pseudo form of vmerge which allows a tied merge operand,
allowing us to specify the tail elements as being equal to the "on
false" operand, using a tied-def constraint and a "tail undisturbed"
policy.

While this strategy allows us to often lower the intrinsic to just one
instruction, it may be less efficient in fixed-vector types as the
number of tail elements may extend far beyond the length of the fixed
vector. Another strategy could be to use a vmerge/vfmerge instruction
with an AVL equal to the length of the vector type, and manipulate the
condition operand such that mask elements greater than the operation's
EVL are false.

I've also observed inefficient codegen in which our 'VF' patterns don't
match raw floating-point SPLAT_VECTORs, which occur in scalable-vector
code.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D117561
2022-01-24 11:05:05 +00:00
Fraser Cormack e7926e8d97 [RISCV] Match VF variants for masked VFRDIV/VFRSUB
This patch follows up on D117697 to help the simple binary operations
behave similarly in the presence of masks.

It also enables CGP sinking support for vp.fdiv and vp.fsub intrinsics,
now that VFRDIV and VFRSUB are consistently matched with a LHS splat for
masked and unmasked variants.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D117783
2022-01-24 10:59:43 +00:00
Craig Topper d44b6be6ea [RISCV] Don't Custom legalize f16/f32/f64 bitcasts if those types aren't Legal. 2022-01-22 11:55:18 -08:00
Craig Topper 48132bb1e4 [RISCV] Simplify interface to combineMUL_VLToVWMUL. NFC
Instead of passing the both the SDNode* and 2 of the operands
in two different orders, just pass the SDNode * and a bool to
indicate which operand order to test.

While there rename to combineMUL_VLToVWMUL_VL.
2022-01-21 11:43:06 -08:00
Fraser Cormack 4d268dc94a [RISCV] Enable CGP to sink splat operands of VP intrinsics
This patch brings better splat-matching to our VP support, by sinking
splat operands of VP intrinsics back into the same block as the VP
operation. The list of VP intrinsics we are interested in matches that
of the regular instructions.

Some optimization is still lacking. For instance, our VL nodes aren't
recognized as commutative, so splats must be on the RHS. Because of
this, we limit our sinking of splats to just the RHS operand for now.
Improvement in this regard can come in another patch.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D117703
2022-01-21 11:30:37 +00:00
Craig Topper fa8bb22466 [RISCV] Optimize vector_shuffles that are interleaving the lowest elements of two vectors.
RISCV only has a unary shuffle that requires places indices in a
register. For interleaving two vectors this means we need at least
two vrgathers and a vmerge to do a shuffle of two vectors.

This patch teaches shuffle lowering to use a widening addu followed
by a widening vmaccu to implement the interleave. First we extract
the low half of both V1 and V2. Then we implement
(zext(V1) + zext(V2)) + (zext(V2) * zext(2^eltbits - 1)) which
simplifies to (zext(V1) + zext(V2) * 2^eltbits). This further
simplifies to (zext(V1) + zext(V2) << eltbits). Then we bitcast the
result back to the original type splitting the wide elements in half.

We can only do this if we have a type with wider elements available.
Because we're using extends we also have to be careful with fractional
lmuls. Floating point types are supported by bitcasting to/from integer.

The tests test a varied combination of LMULs split across VLEN>=128 and
VLEN>=512 tests. There a few tests with shuffle indices commuted as well
as tests for undef indices. There's one test for a vXi64/vXf64 vector which
we can't optimize, but verifies we don't crash.

Reviewed By: rogfer01

Differential Revision: https://reviews.llvm.org/D117743
2022-01-20 14:44:47 -08:00
Craig Topper 94e69fbb4f [RISCV] Add DAG combine to fold (fp_to_int_sat (ffloor X)) -> (select X == nan, 0, (fcvt X, rdn))
Similar for ceil, trunc, round, and roundeven. This allows us to use
static rounding modes to avoid a libcall.

This is similar to D116771, but for the saturating conversions.

This optimization is done for AArch64 as isel patterns.
RISCV doesn't have instructions for ceil/floor/trunc/round/roundeven
so the operations don't stick around until isel to enable a pattern
match. Thus I've implemented a DAG combine.

I'm only handling saturating to i64 or i32. This could be extended
to other sizes in the future.

Reviewed By: asb

Differential Revision: https://reviews.llvm.org/D116864
2022-01-20 11:35:37 -08:00
Fraser Cormack 5a12024b95 [RISCV] Optimize lowering of floating-point -0.0
This idea has come up in several reviews -- D115978 and D105902 -- so I
can't take any credit for the idea. Instead of using a constant pool to
lower -0.0, we can emit a sequence of two instructions:

    fmv.[hwd].x freg, zero
    fsgnjn.[hsd] freg, freg, freg

This is only done when the floating-point type is legal.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D117687
2022-01-20 11:46:28 +00:00
Chenbing.Zheng 0be3da1fab [RISCV] Add intrinsic for Zbt extension
RV32: fsl, fsr, fsri
RV64: fsl, fsr, fsri, fslw, fsrw, fsriw

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D117468
2022-01-20 08:27:05 +00:00
Craig Topper 4060b81e76 [RISCV] Obey -riscv-v-fixed-length-vector-elen-max when lowering mask BUILD_VECTORs.
We may not be allowed to use vXiXLen vectors. Consult ELEN to
determine what is allowed. This will become even more important
when Zve32 is added.

Reviewed By: frasercrmck, arcbbb

Differential Revision: https://reviews.llvm.org/D117518
2022-01-19 10:47:37 -08:00
Craig Topper 5a6c622afd [RISCV] Remove special case for constant shift amount in FSHL/FSHR lowering to FSL/FSR.
Remove fshl/fshr with constant shift amount isel patterns. Replace
with fsr/fsl with constant isel patterns.

This hack was trying to preserve as much optimization opportunity
for fshl/fshr by constant as possible, but the conversion to
RISCVISD::FSR/FSL happens so late it probably isn't worth much.

The new isel patterns are needed by D117468 anyway.
2022-01-18 11:47:50 -08:00
Craig Topper aa7fc02feb Recommit "[RISCV] Make the operand order for RISCVISD::FSL(W)/FSR(W) match the instruction register numbering."
This reverts the revert commit e328385739.

Accidental demanded bits change has been removed. The demanded bits
code itself was remove in a pre-commit since it isn't tested.

Original commit message:
Previous we used the fshl/fshr operand ordering for simplicity. This
made things confusing when D117468 proposed adding intrinsics for
the instructions. We can't just use the generic funnel shifting
intrinsics because fsl/fsr have different functionality that should
be exposed to software.

Now we use rs1, rs3, rs2/shamt order which matches the instruction
printing order and the order used in this intrinsic header
https://github.com/riscv/riscv-bitmanip/blob/main-history/cproofs/rvintrin.h
2022-01-18 10:52:43 -08:00
Craig Topper b3a0ec7645 [RISCV] Remove DemandedBits handling for FSR/FSL until we have test cases for it.
Testing may be easier after D117468. Right now we get demanded bits
optimizations done on ISD::FSHL/FSHR before they become FSR/FSL. This
makes it hard to test.
2022-01-18 10:52:43 -08:00
Craig Topper e328385739 Revert "[RISCV] Make the operand order for RISCVISD::FSL(W)/FSR(W) match the instruction register numbering."
This reverts commit b634f8a663.

I broke the SimplifyDemandedBits code, but we don't have tests.
2022-01-18 10:36:03 -08:00
Craig Topper b634f8a663 [RISCV] Make the operand order for RISCVISD::FSL(W)/FSR(W) match the instruction register numbering.
Previous we used the fshl/fshr operand ordering for simplicity. This
made things confusing when D117468 proposed adding intrinsics for
the instructions. We can't just use the generic funnel shifting
intrinsics because fsl/fsr have different functionality that should
be exposed to software.

Now we use rs1, rs3, rs2/shamt order which matches the instruction
printing order and the order used in this intrinsic header
https://github.com/riscv/riscv-bitmanip/blob/main-history/cproofs/rvintrin.h
2022-01-18 09:47:28 -08:00
David Sherwood f4515ab858 Revert "[CodeGen][AArch64] Ensure isSExtCheaperThanZExt returns true for negative constants"
This reverts commit 197f3c0deb.

Reverting after miscompilation errors discovered with ffmpeg.
2022-01-18 08:40:20 +00:00
Han-Kuan Chen ec9cb3a79c [RISCV] Provide VLOperand in td.
Currently, users expected VL is the last operand. However, since some
intrinsics has tail policy in the last operand, this rule cannot be used
anymore.

Reviewed By: craig.topper, frasercrmck

Differential Revision: https://reviews.llvm.org/D117452
2022-01-17 20:25:47 -08:00
Han-Kuan Chen 3fc4b5896a [RISCV] Make SplatOperand start from 0.
Current SplatOperand starts from 1 because operand 0 (or 1) is intrinsic
id in SelectionDAG.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D117453
2022-01-17 20:14:59 -08:00
Craig Topper 116af698e2 [RISCV] When expanding CONCAT_VECTORS, don't create INSERT_SUBVECTORS for undef subvectors.
For fixed vectors, the undef will get expanded to an all zeros
build_vector. We don't want that so suppress creating the
insert_subvector.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D117379
2022-01-17 14:40:59 -08:00
Craig Topper 9c410838d2 [RISCV] Legalize fixed length (insert_subvector undef, X, 0) to a scalable insert.
We were considering this legal, but later the undef would become an all
zeros vector. This would cause us to need to re-legalize the insert later
into a vslideup with zero vector.

This patch catches the case and directly legalizes it to a scalable
insert.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D117377
2022-01-17 14:31:30 -08:00
David Sherwood 197f3c0deb [CodeGen][AArch64] Ensure isSExtCheaperThanZExt returns true for negative constants
When we know the value we're extending is a negative constant then it
makes sense to use SIGN_EXTEND because this may improve code quality in
some cases, particularly when doing a constant splat of an unpacked vector
type. For example, for SVE when splatting the value -1 into all elements
of a vector of type <vscale x 2 x i32> the element type will get promoted
from i32 -> i64. In this case we want the splat value to sign-extend from
(i32 -1) -> (i64 -1), whereas currently it zero-extends from
(i32 -1) -> (i64 0xFFFFFFFF). Sign-extending the constant means we can use
a single mov immediate instruction.

New tests added here:

  CodeGen/AArch64/sve-vector-splat.ll

I believe we see some code quality improvements in these existing
tests too:

  CodeGen/AArch64/reduce-and.ll
  CodeGen/AArch64/unfold-masked-merge-vector-variablemask.ll

The apparent regressions in CodeGen/AArch64/fast-isel-cmp-vec.ll only
occur because the test disables codegen prepare and branch folding.

Differential Revision: https://reviews.llvm.org/D114357
2022-01-17 11:08:57 +00:00
Craig Topper 4c1e1e05cb [RISCV] Add RISCVISD::BFPW to ComputeNumSignBitsForTargetNode. 2022-01-15 15:23:49 -08:00
Fraser Cormack 877d1b3d07 [SelectionDAG][VP] Add splitting/widening for VP_LOAD and VP_STORE
Original patch by @hussainjk.

This patch was split off from D109377 to keep vector legalization
(widening/splitting) separate from vector element legalization
(promoting).

While the original patch added a third overload of
SelectionDAG::getVPStore, this patch takes the liberty of collapsing
those all down to 1, as three overloads seems excessive for a
little-used node.

The original patch also used ModifyToType in places, but that method
still crashes on scalable vector types. Seeing as the other VP
legalization methods only work when all operands need identical
widening, this patch follows in that vein.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D117235
2022-01-15 11:41:29 +00:00
Chenbing.Zheng fdd33a0c75 [RISCV][NFC] Add a function to customLegalizeToWOp by Intrinsic
These cases follow the same pattern, so they can be combined
to a unqiue function.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D117378
2022-01-15 08:28:08 +00:00
Craig Topper 2baa1dffd1 [RISCV] Add basic support for matching shuffles to vslidedown.vi.
Specifically the unary shuffle case where the elements being
shifted in are undef. This handles the shuffles produce by expanding
llvm.reduce.mul.

I did not reduce the VL which would increase the number of vsetvlis,
but may improve the execution speed. We'd also want to narrow the
multiplies so we could share vsetvlis between the vslidedown.vi and
the next multiply.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D117239
2022-01-14 09:04:54 -08:00
Craig Topper ac6b4896ea [RISCV] Honor the VT when converting float point register names to register class for inline assembly.
It appears the code here was written for the inline asm clobbering
a specific register, but it also gets used for named input and
output registers.

For the input and output case, we should honor the VT so we
don't insert conversion instructions around the inline assembly.

For the clobber, case we need to pick the largest register class.

Reviewed By: asb, jrtc27

Differential Revision: https://reviews.llvm.org/D117279
2022-01-14 09:04:00 -08:00
jacquesguan 88c0e0806b [RISCV] Improve i64 splat vector lowering in RV32.
We could use vmv.v.i/vmv.v.x whose eew is 32 to lower the i64 splat vector if the i64 constant scalar could be splitted into two same i32 scalar.

Differential Revision: https://reviews.llvm.org/D117079
2022-01-14 14:06:01 +08:00
David Sherwood ba471ba8d2 Revert "[CodeGen][AArch64] Ensure isSExtCheaperThanZExt returns true for negative constants"
This reverts commit 31009f0b5a.

It seems to be causing SVE VLA buildbot failures and has introduced a
genuine regression. Reverting for now.
2022-01-13 15:59:43 +00:00
David Sherwood 31009f0b5a [CodeGen][AArch64] Ensure isSExtCheaperThanZExt returns true for negative constants
When we know the value we're extending is a negative constant then it
makes sense to use SIGN_EXTEND because this may improve code quality in
some cases, particularly when doing a constant splat of an unpacked vector
type. For example, for SVE when splatting the value -1 into all elements
of a vector of type <vscale x 2 x i32> the element type will get promoted
from i32 -> i64. In this case we want the splat value to sign-extend from
(i32 -1) -> (i64 -1), whereas currently it zero-extends from
(i32 -1) -> (i64 0xFFFFFFFF). Sign-extending the constant means we can use
a single mov immediate instruction.

New tests added here:

  CodeGen/AArch64/sve-vector-splat.ll

I believe we see some code quality improvements in these existing
tests too:

  CodeGen/AArch64/dag-numsignbits.ll
  CodeGen/AArch64/reduce-and.ll
  CodeGen/AArch64/unfold-masked-merge-vector-variablemask.ll

The apparent regressions in CodeGen/AArch64/fast-isel-cmp-vec.ll only
occur because the test disables codegen prepare and branch folding.

Differential Revision: https://reviews.llvm.org/D114357
2022-01-13 09:43:07 +00:00
Lian Wang 16877c5d2c [RISCV] Add bfp and bfpw intrinsic in zbf extension
Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D116994
2022-01-13 02:53:00 +00:00
Craig Topper 63b17eb9ec [RISCV] Add strictfp support for compares.
This adds support for STRICT_FSETCC(quiet) and STRICT_FSETCCS(signaling).

FEQ matches well to STRICT_FSETCC oeq.
FLT/FLE matches well to STRICT_FSETCCS olt/ole.

Others require commuting operands or multiple instructions.

STRICT_FSETCC olt/ole/ogt/oge/ult/ule/ugt/uge uses FLT/FLE,
but we need to save/restore FFLAGS around them to avoid spurious
exceptions. I've implemented pseudo instructions with a
CustomInserter to insert the save/restore CSR instructions.
Unfortunately, this doesn't honor exceptions for signaling NANs
but I'm not sure if signaling nans are really supported by the
constrained intrinsics.

STRICT_FSETCC one and ueq expand to a pair of FLT instructions
with a save/restore of fflags around each. This could be improved
in the future.

There may be some opportunities to generate better code for strict
comparisons mixed with nonans fast math flags. I've left FIXMEs in
the .td files for that.

Co-Authored-by: ShihPo Hung <shihpo.hung@sifive.com>

Reviewed By: arcbbb

Differential Revision: https://reviews.llvm.org/D116694
2022-01-11 20:01:41 -08:00
Craig Topper be1cc64cc1 [RISCV] Add DAG combine to fold (fp_to_int (ffloor X)) -> (fcvt X, rdn)
Similar for ceil, trunc, round, and roundeven. This allows us to use
static rounding modes to avoid a libcall.

This optimization is done for AArch64 as isel patterns.
RISCV doesn't have instructions for ceil/floor/trunc/round/roundeven
so the operations don't stick around until isel to enable a pattern
match. Thus I've implemented a DAG combine.

We only handle XLen types except i32 on RV64. i32 will be type
legalized to a RISCVISD node. All other types will be type legalized
to XLen and maintain the FP_TO_SINT/UINT ISD opcode.

Reviewed By: asb

Differential Revision: https://reviews.llvm.org/D116771
2022-01-11 09:05:57 -08:00
wangpc c6430fade3 [RISCV] Generate 32 bits jumptable entries when code model is small
The code can only address the whole RV32 address space or the lower 2 GiB
of the RV64 address space in small code model, so 32 bits entry is enough.
Cache hit ratio and code size have some improvements.

Reviewed By: asb

Differential Revision: https://reviews.llvm.org/D116435
2022-01-11 18:20:37 +08:00
wangpc 98d51c2542 [RISCV] Override TargetLowering::BuildSDIVPow2 to generate SELECT
When `Zbt` is enabled, we can generate SELECT for division by power
of 2, so that there is no data dependency.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D114856
2022-01-11 15:54:35 +08:00
jacquesguan b607cd3928 [RISCV] Use vmv.s.x to build one element splat vector.
When we want to create an splat vector that only the first element is initialized, we could use vmv.s.x or vfmv.s.f to build it.

Differential Revision: https://reviews.llvm.org/D116277
2022-01-11 10:21:18 +08:00
jacquesguan 6b8362eb8d [RISCV] Disable EEW=64 for index values when XLEN=32.
Disable EEW=64 for vector index load/store when XLEN=32.

Differential Revision: https://reviews.llvm.org/D106518
2022-01-10 10:51:27 +08:00
Kazu Hirata 435a5a3652 [llvm] Fix bugprone argument comments (NFC)
Identified with bugprone-argument-comment.
2022-01-08 11:56:38 -08:00
Craig Topper 75117fb340 [RISCV] Don't advertise i32->i64 zextload as free for RV64.
The zextload hook is only used to determine whether to insert a
zero_extend or any_extend for narrow types leaving a basic block.
Returning true from this hook tends to cause any load whose output
leaves the basic block to become an LWU instead of an LW.

Since we tend to prefer sexts for i32 compares on RV64, this can
cause extra sext.w instructions to be created in other basic blocks.

If we use LW instead of LWU this gives the MIR pass from D116397
a better chance of removing them.

Another option might be to teach getPreferredExtendForValue in
FunctionLoweringInfo.cpp about our preference for sign_extend of
i32 compares. That would cause SIGN_EXTEND to be chosen for any
value used by a compare instead of using the isZExtFree heuristic.
That will require code to convert from the llvm::Type* to EVT/MVT
as well as querying the type legalization actions to get the
promoted type in order to call TargetLowering::isSExtCheaperThanZExt.
That seemed like many extra steps when no other target wants it.
Though it would avoid us needing to lean on the MIR pass in some cases.

Reviewed By: asb

Differential Revision: https://reviews.llvm.org/D116567
2022-01-06 08:13:42 -08:00
Craig Topper 808c662665 [RISCV] Change RISCVISD::FCVT*RTZ opcodes to take rounding mode as an operand.
Pre-work for a future change that will use these opcodes with other
rounding modes.

Differential Revision: https://reviews.llvm.org/D116724
2022-01-06 08:12:12 -08:00
Victor Perez 5527139302 [RISCV][VP] Add RVV codegen for [nX]vXi1 vp.select
Expand [nX]vXi1 vp.select the same way as [nX]vXi1 vselect.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D115546
2022-01-02 23:12:32 -08:00
Craig Topper 15787ccd45 [RISCV] Add support for STRICT_LRINT/LLRINT/LROUND/LLROUND. Tests for other strict intrinsics.
This patch adds isel support for STRICT_LRINT/LLRINT/LROUND/LLROUND.

It also adds test cases for f32 and f64 constrained intrinsics that
correspond to the intrinsics in float-intrinsics.ll and
double-intrinsics.ll. Support for promoting the integer argument of
STRICT_FPOWI was added.

I've skipped adding tests for f16 intrinsics, since we don't have libcalls
for them and we have inconsistent support for promoting them in LegalizeDAG.
This will need to be examined more closely.

Reviewed By: asb

Differential Revision: https://reviews.llvm.org/D116323
2021-12-30 11:54:32 -08:00
Hsiangkai Wang a1c7ddf926 [RISCV] Support passing scalable vectur values through the stack.
After consuming all vector registers, the scalable vector values will be
passed indirectly. The pointer values will be saved in general
registers. If all general registers are used up, we will report an error to
notify users the compiler does not support passing scalable vector
values through the stack. In this patch, we remove the restriction. After
all general registers are used up, we use the stack to save the
pointers which point to the indirect passed scalable vector values.

Differential Revision: https://reviews.llvm.org/D116310
2021-12-28 09:26:36 +08:00
Kazu Hirata e7774f499b Use static_assert instead of assert (NFC)
Identified with misc-static-assert.
2021-12-26 14:26:44 -08:00
Jim Lin 02478a26f2 [RISCV] Use DAG variable directly instead of DCI.DAG
Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D116087
2021-12-24 13:06:55 +08:00
Craig Topper 0a35211b34 [RISCV] Don't allow vector types to be used with inline asm 'r' constraint
The 'r' constraint uses the GPR class. There is generic support
for bitcasting and extending/truncating non-integer VTs to the
required integer VT. This doesn't work for scalable vectors and
instead crashes.

To prevent this, explicitly reject vectors. Fixed vectors might
work without crashing, but it doesn't seem worthwhile to allow.

While there remove an unnecessary level of indentation in the
"vr" and "vm" constraint handling.

Differential Revision: https://reviews.llvm.org/D115810
2021-12-23 20:32:36 -06:00
Victor Perez 10b3675aa9 [RISCV][VP] Lower mask vector VP AND/OR/XOR to RVV instructions
For fixed and scalable vectors, each intrinsic x is lowered to vmx.mm,
dropping the mask, which is safe to do as masked-off elements are
undef anyway.

Differential Revision: https://reviews.llvm.org/D115339
2021-12-23 15:02:32 -06:00
Craig Topper 7704c503ec [RISCV] Use positive 0.0 for the neutral element in fadd reductions if nsz is present.
-0.0 requires a constant pool. +0.0 can be made with vmv.v.x x0.

Not doing this in getNeutralElement for fear of changing other targets.

Differential Revision: https://reviews.llvm.org/D115978
2021-12-23 10:38:00 -06:00
Craig Topper b7b260e19a [RISCV] Support strict FP conversion operations.
This adds support for strict conversions between fp types and between
integer and fp.

NOTE: RISCV has static rounding mode instructions, but the constrainted
intrinsic metadata is not used to select static rounding modes. Dynamic
rounding mode is always used.

Differential Revision: https://reviews.llvm.org/D115997
2021-12-23 09:40:58 -06:00
jacquesguan 28a3e7dea2 [RISCV] Override hasAndNotCompare to use more andn when have Zbb extension.
Enable transform (X & Y) == Y ---> (~X & Y) == 0 and (X & Y) != Y ---> (~X & Y) != 0 when have Zbb extension to use more andn instruction.

Differential Revision: https://reviews.llvm.org/D115922
2021-12-23 10:42:20 +08:00
Craig Topper 66bbefeb13 [RISCV] Revert Zfhmin related changes that aren't tested and depend on f16 being a legal type.
Our Zfhmin support is only MC layer, but these are CodeGen layer
interfaces. If f16 isn't a Legal type for CodeGen with Zfhmin, then
these interfaces should keep their non-Zfh behavior.

Reviewed By: luismarques

Differential Revision: https://reviews.llvm.org/D115822
2021-12-16 08:55:28 -08:00
Craig Topper 3926893439 [RISCV] Add isel support for scalar STRICT_FADD/FSUB/FMUL/FDIV/FSQRT.
Test that STRICT_FMINNUM/FMAXNUM are lowered to libcalls for f32/f64.
The RISC-V instructions don't match the behavior of fmin/fmax libcalls
with respect to SNaN.

Promoting FMINNUM/FMAXNUM for f16 needs more work outside of the
RISC-V backend.

Reviewed By: asb, arcbbb

Differential Revision: https://reviews.llvm.org/D115680
2021-12-14 10:50:55 -08:00
Craig Topper 3f1c403a2b [RISCV] Use AdjustInstrPostInstrSelection to insert a FRM dependency for scalar FP instructions with dynamic rounding mode.
In order to support constrained FP intrinsics we need to model FRM
dependency. Whether or not a instruction uses FRM is based on a 3
bit field in the instruction. Because of this we can't add
'Uses = [FRM]' to the tablegen descriptions.

This patch examines the immediate after isel and adds an implicit
use of FRM. This idea came from Roger Ferrer Ibanez.

Other ideas:
We could be overly conservative and just pretend all instructions with
frm field read the FRM register. Or we could have pseudoinstructions
for CodeGen with rounding mode.

Reviewed By: asb, frasercrmck, arcbbb

Differential Revision: https://reviews.llvm.org/D115555
2021-12-14 10:17:57 -08:00
Craig Topper b18b2a01ef [RISCV] Don't use VLMAX for start value splat in reduction lowering.
The reduction instructions only reads the first element. The
execution time for a splat may take longer with a larger VL.
We should use the smallest VL we can.

Reviewed By: frasercrmck, HsiangKai

Differential Revision: https://reviews.llvm.org/D115536
2021-12-13 09:06:42 -08:00
Kito Cheng 39c861719b [RISCV] Fix vm operand constraint to fit GCC's behavior
- `vm` constraint is used for masking operand, which always v0.

- Update testcase, only masking operand should use `vm`, vector mask operations
  should just use `vr` for any vector register.

 - Revise the description of `vm` constraint.

- This patch also fix issue on RISCVRegisterInfo.td and RISCVISelLowering.cpp.

  RISCVRegisterInfo.td:
  - The first VT in the list must be the largest total size since the
    SelectionDAGBuilder uses the first register in the list as the canonical
    type for the register.

  RISCVISelLowering.cpp:
  - Fix RISCVTargetLowering::splitValueIntoRegisterParts and
    RISCVTargetLowering::joinRegisterPartsIntoValue for handling vectors
    with different total size, that will happened on fractional LMUL since
    fractional LMUL is always occupy one vector register.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D112599
2021-12-09 14:46:49 +08:00
Craig Topper acdbd34cfb [RISCV] Loosen some restrictions on lowering constant BUILD_VECTORs using vid.v.
The immediate size check on StepNumerator did not take into account
that vmul.vi does not exist. It also did not account for power of 2
constants that can be done with vshl.vi.

This patch fixes this by moving the conversion from mul to shift
further up. Then we can consider the immediates separately for MUL
vs SHL. For MUL I've allowed simm12 which requires a single addi
before a vmul.vx. For SHL I've allowed any uimm5 which works with
vshl.vi. We could relax these further in the future. This is a
starting point that allows us to emit the same number of instructions
we were already using for smaller numerators.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D115081
2021-12-06 09:34:40 -08:00
Victor Perez 9eb7322748 [RISCV][VP] Add RVV codegen for vp.select
Lower vp.select instrinsic to VSELECT_VL.

Reviewed By: rogfer01

Differential Revision: https://reviews.llvm.org/D114629
2021-12-03 11:02:20 +00:00
Craig Topper 2f6beb7b0e [RISCV] Add inline expansion for vector ftrunc/fceil/ffloor.
This prevents scalarization of fixed vector operations or crashes
on scalable vectors.

We don't have direct support for these operations. To emulate
ftrunc we can convert to the same sized integer and back to fp using
round to zero. We don't need to do a convert if the value is large
enough to have no fractional bits or is a nan.

The ceil and floor lowering would be better if we changed FRM, but
we don't model FRM correctly yet. So I've used the trunc lowering
with a conditional add or subtract with 1.0 if the truncate rounded
in the wrong direction.

There are also missed opportunities to use masked instructions.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D113543
2021-12-01 11:25:28 -08:00
Craig Topper d8f9eaad89 [RISCV] Teach RISCVTargetLowering::shouldSinkOperands to handle udiv/sdiv/urem/srem.
The V extension supports .vx instructions for integer division and
remainder so we should sink splats for that operand.
2021-11-30 18:47:51 -08:00
David Green 9e8a71caf0 [DAG] Create fptosi.sat from clamped fptosi
This adds a fold in DAGCombine to create fptosi_sat from sequences for
smin(smax(fptosi(x))) nodes, where the min/max saturate the output of
the fp convert to a specific bitwidth (say INT_MIN and INT_MAX). Because
it is dealing with smin(/smax) in DAG they may currently be ISD::SMIN,
ISD::SETCC/ISD::SELECT, ISD::VSELECT or ISD::SELECT_CC nodes which need
to be handled similarly.

A shouldConvertFpToSat method was added to control when converting may
be profitable. The original fptosi will have a less strict semantics
than the fptosisat, with less values that need to produce defined
behaviour.

This especially helps on ARM/AArch64 where the vcvt instructions
naturally saturate the result.

Differential Revision: https://reviews.llvm.org/D111976
2021-11-30 15:29:14 +00:00
Hans Wennborg a87782c34d Revert "[DAG] Create fptosi.sat from clamped fptosi"
It causes builds to fail with this assert:

llvm/include/llvm/ADT/APInt.h:990:
bool llvm::APInt::operator==(const llvm::APInt &) const:
Assertion `BitWidth == RHS.BitWidth && "Comparison requires equal bit widths"' failed.

See comment on the code review.

> This adds a fold in DAGCombine to create fptosi_sat from sequences for
> smin(smax(fptosi(x))) nodes, where the min/max saturate the output of
> the fp convert to a specific bitwidth (say INT_MIN and INT_MAX). Because
> it is dealing with smin(/smax) in DAG they may currently be ISD::SMIN,
> ISD::SETCC/ISD::SELECT, ISD::VSELECT or ISD::SELECT_CC nodes which need
> to be handled similarly.
>
> A shouldConvertFpToSat method was added to control when converting may
> be profitable. The original fptosi will have a less strict semantics
> than the fptosisat, with less values that need to produce defined
> behaviour.
>
> This especially helps on ARM/AArch64 where the vcvt instructions
> naturally saturate the result.
>
> Differential Revision: https://reviews.llvm.org/D111976

This reverts commit 52ff3b0093.
2021-11-30 15:36:56 +01:00
David Green 52ff3b0093 [DAG] Create fptosi.sat from clamped fptosi
This adds a fold in DAGCombine to create fptosi_sat from sequences for
smin(smax(fptosi(x))) nodes, where the min/max saturate the output of
the fp convert to a specific bitwidth (say INT_MIN and INT_MAX). Because
it is dealing with smin(/smax) in DAG they may currently be ISD::SMIN,
ISD::SETCC/ISD::SELECT, ISD::VSELECT or ISD::SELECT_CC nodes which need
to be handled similarly.

A shouldConvertFpToSat method was added to control when converting may
be profitable. The original fptosi will have a less strict semantics
than the fptosisat, with less values that need to produce defined
behaviour.

This especially helps on ARM/AArch64 where the vcvt instructions
naturally saturate the result.

Differential Revision: https://reviews.llvm.org/D111976
2021-11-30 11:05:32 +00:00
Craig Topper b121d23a9c [RISCV] Promote f16 log/pow/exp/sin/cos/etc. to f32 libcalls.
Prevents crashes or cannot select errors.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D113822
2021-11-29 18:49:11 -08:00
Philipp Tomsich af57a71d18 [RISCV] Don't call setHasMultipleConditionRegisters(), so icmp is sunk
On RISC-V, icmp is not sunk (as the following snippet shows) which
generates the following suboptimal branch pattern:
```
  core_list_find:
	lh	a2, 2(a1)
	seqz	a3, a0         <<
	bltz	a2, .LBB0_5
	bnez	a3, .LBB0_9    << should sink the seqz
        [...]
	j	.LBB0_9
  .LBB0_5:
	bnez	a3, .LBB0_9    << should sink the seqz
	lh	a1, 0(a1)
        [...]
```
due to an icmp not being sunk.

The blocks after `codegenprepare` look as follows:
```
  define dso_local %struct.list_head_s* @core_list_find(%struct.list_head_s* readonly %list, %struct.list_data_s* nocapture readonly %info) local_unnamed_addr #0 {
  entry:
    %idx = getelementptr inbounds %struct.list_data_s, %struct.list_data_s* %info, i64 0, i32 1
    %0 = load i16, i16* %idx, align 2, !tbaa !4
    %cmp = icmp sgt i16 %0, -1
    %tobool.not37 = icmp eq %struct.list_head_s* %list, null
    br i1 %cmp, label %while.cond.preheader, label %while.cond9.preheader

  while.cond9.preheader:                            ; preds = %entry
    br i1 %tobool.not37, label %return, label %land.rhs11.lr.ph
```
where the `%tobool.not37` is the result of the icmp that is not sunk.
Note that it is computed in the basic-block up until what becomes the
`bltz` instruction and the `bnez` is a basic-block of its own.

Compare this to what happens on AArch64 (where the icmp is correctly sunk):
```
  define dso_local %struct.list_head_s* @core_list_find(%struct.list_head_s* readonly %list, %struct.list_data_s* nocapture readonly %info) local_unnamed_addr #0 {
  entry:
    %idx = getelementptr inbounds %struct.list_data_s, %struct.list_data_s* %info, i64 0, i32 1
    %0 = load i16, i16* %idx, align 2, !tbaa !6
    %cmp = icmp sgt i16 %0, -1
    br i1 %cmp, label %while.cond.preheader, label %while.cond9.preheader

  while.cond9.preheader:                            ; preds = %entry
    %1 = icmp eq %struct.list_head_s* %list, null
    br i1 %1, label %return, label %land.rhs11.lr.ph
```

This is caused by sinkCmpExpression() being skipped, if multiple
condition registers are supported.

Given that the check for multiple condition registers affect only
sinkCmpExpression() and shouldNormalizeToSelectSequence(), this change
adjusts the RISC-V target as follows:
 * we no longer signal multiple condition registers (thus changing
   the behaviour of sinkCmpExpression() back to sinking the icmp)
 * we override shouldNormalizeToSelectSequence() to let always select
   the preferred normalisation strategy for our backend

With both changes, the test results remain unchanged.  Note that without
the target-specific override to shouldNormalizeToSelectSequence(), there
is worse code (more branches) generated for select-and.ll and select-or.ll.

The original test case changes as expected:
```
  core_list_find:
	lh	a2, 2(a1)
	bltz	a2, .LBB0_5
	beqz	a0, .LBB0_9    <<
        [...]
	j	.LBB0_9
.LBB0_5:
	beqz	a0, .LBB0_9    <<
	lh	a1, 0(a1)
        [...]
```

Differential Revision: https://reviews.llvm.org/D98932
2021-11-19 08:32:59 -08:00
Zarko Todorovski 5b8bbbecfa [NFC][llvm] Inclusive language: reword and remove uses of sanity in llvm/lib/Target
Reworded removed code comments that contain `sanity check` and `sanity
test`.
2021-11-17 21:59:00 -05:00
Craig Topper 0274be28d7 [RISCV] Lower vector CTLZ_ZERO_UNDEF/CTTZ_ZERO_UNDEF by converting to FP and extracting the exponent.
If we have a large enough floating point type that can exactly
represent the integer value, we can convert the value to FP and
use the exponent to calculate the leading/trailing zeros.

The exponent will contain log2 of the value plus the exponent bias.
We can then remove the bias and convert from log2 to leading/trailing
zeros.

This doesn't work for zero since the exponent of zero is zero so we
can only do this for CTLZ_ZERO_UNDEF/CTTZ_ZERO_UNDEF. If we need
a value for zero we can use a vmseq and a vmerge to handle it.

We need to be careful to make sure the floating point type is legal.
If it isn't we'll continue using the integer expansion. We could split the vector
and concatenate the results but that needs some additional work and evaluation.

Differential Revision: https://reviews.llvm.org/D111904
2021-11-17 10:29:41 -08:00
Craig Topper 391b0ba603 [RISCV] Override TargetLowering::hasAndNot for Zbb.
Differential Revision: https://reviews.llvm.org/D113937
2021-11-15 18:44:07 -08:00
Craig Topper ee7a006ce4 [RISCV] Promote f16 ceil/floor/round/roundeven/nearbyint/rint/trunc intrinsics to f32 libcalls.
Previously these would crash. I don't think these can be generated
directly from C. Not sure if any optimizations can introduce them.

Reviewed By: asb

Differential Revision: https://reviews.llvm.org/D113527
2021-11-11 08:28:41 -08:00
Craig Topper 4183522e80 [RISCV] Promote f16 frem with Zfh.
Add riscv64 coverage for f32 and f64 frem.

Reviewed By: luismarques

Differential Revision: https://reviews.llvm.org/D113531
2021-11-10 17:35:07 -08:00
Craig Topper 9ee5cec688 [RISCV] Prevent bad legalizer behavior when bitcasting fixed vectors to i64 on RV32 with Zve32.
Similar to D113219, we need to make sure we don't create a vXi64
vector when it isn't legal. This fixes an error found by an
expensive checks build.
2021-11-10 11:58:49 -08:00
Craig Topper 57bc7b1089 [RISCV] Prevent crashes when bitcasting between fixed vectors and scalars.
Not all scalar element types are allowed in vectors so we may not
be able to bitcast to a 1 element vector to use insert/extract.

This will become a bigger issue when the Zve extensions are commited.
For now, I'm using the ELEN limit to limit the element types.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D113219
2021-11-10 09:21:52 -08:00
Craig Topper 376233113e [RISCV] Use TargetConstant for CSR number for READ_CSR/WRITE_CSR.
This is consistent with what we do for other operands that are required
to be constants.

I don't think this results in any real changes. The pattern match
code for isel treats ConstantSDNode and TargetConstantSDNode the same.
2021-11-08 15:10:24 -08:00
Craig Topper 304edbb553 [RISCV] SMUL_LOHI/UMUL_LOHI should expand for RVV.
These and MULHS/MULHU both default to Legal. Targets need to set
the ones they don't support to Expand.

I think MULHS/MULHU likely has priority in most places so this
change probably isn't directly testable. I found it while looking
at disabling MULHS/MULHU for nxvXi64 as required for Zve64x.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D113325
2021-11-08 09:38:36 -08:00
Ben Shi e32cf690df [RISCV] Optimize (add (mul r, c0), c1)
Optimize (add (mul x, c0), c1) ->
         (add (mul (add x, c1/c0+1), c0), c1%c0-c0),
if c1/c0+1 and c1%c0-c0 are simm12, while c1 is not.

Optimize (add (mul x, c0), c1) ->
         (add (mul (add x, c1/c0-1), c0), c1%c0+c0),
if c1/c0-1 and c1%c0+c0 are simm12, while c1 is not.

Reviewed By: craig.topper, asb

Differential Revision: https://reviews.llvm.org/D111141
2021-11-08 02:58:25 +00:00
Shao-Ce SUN 5c3d7184b4 [RISCV] Support Zfhmin extension
According to RISC-V Unprivileged ISA 15.6.

Reviewed By: asb

Differential Revision: https://reviews.llvm.org/D111866
2021-11-06 01:41:02 +08:00
Zakk Chen 0649dfebba [RISCV] Rename some assembler mnemonic and intrinsic functions for RVV 1.0.
Rename vpopc/vmandnot/vmornot to vcpop/vmandn/vmorn assembler mnemonic.

Reviewed By: frasercrmck, jrtc27, craig.topper

Differential Revision: https://reviews.llvm.org/D111062
2021-11-04 10:08:01 -07:00
Fraser Cormack d065b03801 [RISCV] Optimize vp.load with an all-ones mask
Similar to D110206, this patch optimizes unmasked vp.load intrinsics to
avoid the need of a vmset instruction to set the mask. It does so by
selecting a riscv_vle intrinsic rather than a riscv_vle_mask intrinsic.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D113022
2021-11-02 17:23:39 +00:00
Craig Topper ada5458521 [RISCV] Expand scalable vector bswap. Fix crash for bitreverse.
Fix LegalizeVectorOps to not try shuffle or unrolling expansions for
scalable vectors.

Differential Revision: https://reviews.llvm.org/D112236
2021-10-31 10:01:27 -07:00
Craig Topper 1387483e72 [RISCV] Replace most uses of RISCVSubtarget::hasStdExtV. NFCI
Add new hasVInstructions() which is currently equivalent.

Replace vector uses of hasStdExtZfh/F/D with new vector specific
versions. The vector spec no longer requires that the vectors implement the
same types as scalar. It only requires that the scalar type is
the maximum size the vectors can support. This is currently
implemented using the scalar rule we were using before.

Add new hasVInstructionsI64() begin using to qualify code that
requires i64 vector elements.

This is all NFC for now, but we can start using this to better
implement D112408 which introduces the Zve extensions.

Reviewed By: frasercrmck, eopXD

Differential Revision: https://reviews.llvm.org/D112496
2021-10-27 19:33:48 -07:00
Craig Topper 2783a5cfaf [RISCV] Add ICmp and FCmp to shouldSinkOperands. 2021-10-26 22:23:54 -07:00
Craig Topper d55be79d75 [RISCV] Expand scalable vector CTTZ/CTLZ/CTPOP.
Differential Revision: https://reviews.llvm.org/D112233
2021-10-21 10:50:04 -07:00
Craig Topper c4803bd416 [RISCV] Handle vector of pointer in getTgtMemIntrinsic for strided load/store.
getScalarSizeInBits() doesn't work if the scalar type is a pointer.
For that we need to go through DataLayout.
2021-10-07 10:11:56 -07:00
Craig Topper a2a07e8db3 [RISCV] Fold store of vmv.x.s to a vse with VL=1.
This can avoid a loss of decoupling with the scalar unit on cores
with decoupled scalar and vector units.

We should support FP too, but those use extract_element and not a
custom ISD node so it is a little different. I also left a FIXME
in the test for i64 extract and store on RV32.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D109482
2021-09-27 09:54:46 -07:00
Craig Topper 933182e948 [RISCV] Improve support for forming widening multiplies when one input is a scalar splat.
If one input of a fixed vector multiply is a sign/zero extend and
the other operand is a splat of a scalar, we can use a widening
multiply if the scalar value has sufficient sign/zero bits.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D110028
2021-09-27 09:37:07 -07:00
Fraser Cormack d48f6df1f8 [RISCV] Create the correct mask type when lowering EXTRACT_VECTOR_ELT
This particular case was creating a `VMSET_VL` using the old
fixed-length type in order to pass a mask to other custom nodes
operating on the scalable container type. This kind of thing wasn't
caught for us; I only noticed when experimenting with odd-length
vectors, where it was trying to generate an invalid `v3i1` MVT.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D110420
2021-09-27 09:43:40 +01:00
Hsiangkai Wang 7d39a8a921 [RISCV] (1/2) Add the tail policy argument to builtins/intrinsics.
Add the tail policy argument to LLVM IR intrinsics. There are two policies for tail elements. Tail agnostic means users do not care about the values in the tail elements and tail undisturbed means the values in the tail elements need to be kept after the operation. In order to let users control the tail policy, we add an additional argument at the end of the argument list.

For unmasked operations, we have no maskedoff and the tail policy is always tail agnostic. If users want to keep tail elements under unmasked operations, they could use all one mask in the masked operations to do it. So, we only add the additional argument for masked operations for most cases. There are exceptions listed below.

In this patch, we do not handle the following cases to reduce the complexity of the patch. There could be two separate patches for them.

* Use dest argument to control tail policy
vmerge.vvm/vmerge.vxm/vmerge.vim (add _t builtins with additional dest argument)
vfmerge.vfm (add _t builtins with additional dest argument)
vmv.v.v (add _t builtins with additional dest argument)
vmv.v.x (add _t builtins with additional dest argument)
vmv.v.i (add _t builtins with additional dest argument)
vfmv.v.f (add _t builtins with additional dest argument)
vadc.vvm/vadc.vxm/vadc.vim (add _t builtins with additional dest argument)
vsbc.vvm/vsbc.vxm (add _t builtins with additional dest argument)

* Always has tail argument for masked/unmasked intrinsics
Vector Single-Width Integer Multiply-Add Instructions (add _t and _mt builtins)
Vector Widening Integer Multiply-Add Instructions (add _t and _mt builtins)
Vector Single-Width Floating-Point Fused Multiply-Add Instructions (add _t and _mt builtins)
Vector Widening Floating-Point Fused Multiply-Add Instructions (add _t and _mt builtins)
Vector Reduction Operations (add _t and _mt builtins)
Vector Slideup Instructions (add _t and _mt builtins)
Vector Slidedown Instructions (add _t and _mt builtins)

Discussion: https://github.com/riscv/rvv-intrinsic-doc/pull/101

Differential Revision: https://reviews.llvm.org/D105092
2021-09-24 17:09:50 +08:00
Craig Topper 40b230f685 [RISCV] Limit transformAddImmMulImm to prevent an infinite loop.
This fixes an issue reported in D108607.
2021-09-23 15:53:11 -07:00
Fraser Cormack e7c879a69d [RISCV][VP] Add support for VP_REDUCE_* operations
This patch adds codegen support for lowering the vector-predicated
reduction intrinsics to RVV instructions. The process is similar to that
of the other reduction intrinsics, save for the fact that every VP
reduction has a start value. We reuse the existing custom "VL" nodes,
adding extra patterns where required to handle non-true masks.

To support these nodes, the `RISCVISD::VECREDUCE_*_VL` nodes have been
given an explicit "merge" operand. This is to faciliate the VP
reductions, where we must be careful to ensure that even if no operation
is performed (when VL=0) we still produce the start value. The RVV
reductions don't update the destination register under these conditions,
so we tie the splatted start value to the output register.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D107657
2021-09-23 11:11:05 +01:00
Craig Topper b33a1cc05b [RISCV] Optimize vp.store with an all ones mask to avoid a vmset.
We can use riscv_vse intrinsic instead of riscv_vse_mask. The code here
is based on similar code for handling masked.scatter and vp.scatter.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D110206
2021-09-22 09:12:47 -07:00
Craig Topper 7c975665b4 [RISCV] Make some arrays of constants 'static const'. NFC
This helps the compiler generate better code.
2021-09-21 10:52:47 -07:00
Craig Topper aeb63d464f [RISCV] Teach RISCVTargetLowering::shouldSinkOperands to sink splats for and/or/xor.
This requires a minor change to CodeGenPrepare to ensure that
shouldSinkOperands will be called for And.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D110106
2021-09-21 10:07:29 -07:00
Ben Shi b3052013b4 [RISCV] Optimize (add (mul x, c0), c1)
Optimize (add (mul x, c0), c1) -> (ADDI (MUL (ADDI, c1/c0), c0), c1%c0),
if c1/c0 and c1%c0 are simm12, while c1 is not.

Optimize (add (mul x, c0), c1) -> (MUL (ADDI, c1/c0), c0),
if c1%c0 is zero, and c1/c0 is simm12 while c1 is not.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D108607
2021-09-21 14:13:14 +00:00
Craig Topper a95ba81073 [RISCV] Teach RISCVTargetLowering::shouldSinkOperands to sink splats for FMA.
If either of the multiplicands is a splat, we can sink it to use
vfmacc.vf or similar.
2021-09-20 11:49:50 -07:00
Craig Topper 04ab6c85ef [RISCV] Teach RISCVTargetLowering::shouldSinkOperands to sink splats for FAdd/FSub/FMul/FDiv. 2021-09-20 10:25:46 -07:00
Craig Topper d85e347a28 [RISCV] Add a pass to recognize VLS strided loads/store from gather/scatter.
For strided accesses the loop vectorizer seems to prefer creating a
vector induction variable with a start value of the form
<i32 0, i32 1, i32 2, ...>. This value will be incremented each
loop iteration by a splat constant equal to the length of the vector.
Within the loop, arithmetic using splat values will be done on this
vector induction variable to produce indices for a vector GEP.

This pass attempts to dig through the arithmetic back to the phi
to create a new scalar induction variable and a stride. We push
all of the arithmetic out of the loop by folding it into the start,
step, and stride values. Then we create a scalar GEP to use as the
base pointer for a strided load or store using the computed stride.
Loop strength reduce will run after this pass and can do some
cleanups to the scalar GEP and induction variable.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D107790
2021-09-20 09:39:44 -07:00
Ben Shi dee5a8ca32 [RISCV] Optimize (add (shl x, c0), (shl y, c1)) with SH*ADD
Optimize (add (shl x, c0), (shl y, c1)) ->
         (SLLI (SH*ADD x, y), c1), if c0-c1 == 1/2/3.

Reviewed By: craig.topper, luismarques

Differential Revision: https://reviews.llvm.org/D108916
2021-09-19 16:35:12 +08:00
Craig Topper 1b736bda3b [RISCV] Enable CGP to sink splat operands of Add/Sub/Mul/Shl/LShr/AShr
LICM may have pulled out a splat, but with .vx instructions we
can fold it into an operation.

This patch enables CGP to reverse the LICM transform and move the
splat back into the loop.

I've started with the commutable integer operations and shifts, but we can
extend this with more operations in future patches.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D109394
2021-09-10 09:04:01 -07:00
Craig Topper a574f0e0c3 [RISCV] Disable use of i128 shift libcalls on RV32.
Since i128 isn't a legal C type on RV32, I don't believe
libgcc implements these functions for RV32. compiler-rt
does implement them because i128 support is enabled
in order to handle long double.

This is consistent with 32-bit X86 and ARM.

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D109383
2021-09-08 14:26:07 -07:00
Kazu Hirata 5c6338de16 [RISCV] Fix "set but not used" warnings 2021-09-07 09:19:31 -07:00
Fraser Cormack a823bdf3ab [RISCV][VP] Custom lower VP_STORE and VP_LOAD
This patch adds support for the vector-predicated `VP_STORE` and
`VP_LOAD` nodes. We do this in the same way we lower `MSTORE` and
`MLOAD`: to regular load/store instructions via intrinsics.

One necessary change was made to `SelectionDAGLegalize` so that
`VP_STORE` nodes' operation actions are taken from the stored "value"
operands, in the same vein as `STORE` or `MSTORE`.

Reviewed By: craig.topper, rogfer01

Differential Revision: https://reviews.llvm.org/D108999
2021-09-07 10:53:25 +01:00
Fraser Cormack f4dee8cb82 [RISCV][VP] Custom lower VP_SCATTER and VP_GATHER
This patch adds support for the `VP_SCATTER` and `VP_GATHER` nodes by
lowering them to RVV's `vsox`/`vlux` instructions, respectively. This
process is almost identical to the existing `MSCATTER`/`MGATHER` support.

One extra change was made to `SelectionDAGLegalize` so that
`VP_SCATTER`'s operation action is derived from its stored "value"
operand rather than its return type (which is always the chain).

Reviewed By: craig.topper, rogfer01

Differential Revision: https://reviews.llvm.org/D108987
2021-09-07 10:43:07 +01:00
Craig Topper 75620fadf5 [RISCV] Change how we encode AVL operands in vector pseudoinstructions to use GPRNoX0.
This patch changes the register class to avoid accidentally setting
the AVL operand to X0 through MachineIR optimizations.

There are cases where we really want to use X0, but we can't get that
past the MachineVerifier with the register class as GPRNoX0. So I've
use a 64-bit -1 as a sentinel for X0. All other immediate values should
be uimm5. I convert it to X0 at the earliest possible point in the VSETVLI
insertion pass to avoid touching the rest of the algorithm. In
SelectionDAG lowering I'm using a -1 TargetConstant to hide it from
instruction selection and treat it differently than if the user
used -1. A user -1 should be selected to a register since it doesn't
fit in uimm5.

This is the rest of the changes started in D109110. As mentioned there,
I don't have a failing test from MachineIR optimizations anymore.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D109116
2021-09-03 09:19:25 -07:00
Craig Topper ccbb4c8b4f [RISCV] Fold (RISCVISD::SELECT_CC X, Y, CC, Z, Z) -> Z.
If the true and false values are the same, we don't need a SELECT_CC.

This would normally be folded before a select is legalized to
select_cc. The test case exploits the late legalization of vscale
to trigger a case where they become identical after legalization.

This works around an issue found on a test case in D107957. In that
case the true/false values were both eventually 0 and the select was
used by a vector AVL operand. The select_cc got expanded to control
flow and a phi, but the phi inputs were both copies from X0. MachineIR
optimizations simplified this to a single copy from X0 going into the
vector instruction. This became the input of a vsetvli after vsetvli
insertion. Then register coalescing folded the copy into the vsetvli.
X0 as the source of a vsetvli is a special encoding and should not be
created by coalesing. We need to fix our vsetvli handling to make sure
this can never happen any other way, but removing the unneeded select
is still a worthwhile optimization.
2021-09-01 12:37:52 -07:00
Nick Desaulniers e9b3f25730 [RISCVISelLowering] avoid emitting libcalls to __mulodi4() and __multi3()
Similar to D108842, D108844, D108926, D108928, and D108936.

__has_builtin(builtin_mul_overflow) returns true for 32b RISCV targets,
but Clang is deferring to compiler RT when encountering long long types.

If the semantics of __has_builtin mean "the compiler resolves these,
always" then we shouldn't conditionally emit a libcall.

Link: https://bugs.llvm.org/show_bug.cgi?id=28629

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D108939
2021-08-31 11:23:56 -07:00
Craig Topper 0560a4adb3 [RISCV] Enable CONCAT_VECTORS for fixed FP vectors.
Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D108487
2021-08-30 08:47:45 -07:00
Craig Topper 0eeab8b282 [RISCV] Add -riscv-v-fixed-length-vector-elen-max to limit the ELEN used for fixed length vectorization.
This adds an ELEN limit for fixed length vectors. This will scalarize
any elements larger than this. It will also disable some fractional
LMULs. For example, if ELEN=32 then mf8 becomes illegal, i32/f32
vectors can't use any fractional LMULs, i16/f16 can only use mf2,
and i8 can use mf2 and mf4.

We may also need something for the scalable vectors, but that has
interactions with the intrinsics and we can't scalarize a scalable
vector.

Longer term this should come from one of the Zve* features
2021-08-27 10:17:35 -07:00
Craig Topper 1b9417454e [RISCV] Insert a sext_inreg when type legalizing i32 shl by constant on RV64.
Similar to what we do for add/sub/mul.

This can help remove some sext.w. There are some regressions on
some bswap tests, but I have an idea how to fix that for a follow up.

A new PACKW pattern is added to handle the new sext_inreg placement.

Differential Revision: https://reviews.llvm.org/D108663
2021-08-26 10:20:19 -07:00
Ben Shi f69fb7ac72 [DAGCombiner] Add target hook function to decide folding (mul (add x, c1), c2)
Reviewed by: lebedev.ri, spatel, craig.topper, luismarques, jrtc27

Differential Revision: https://reviews.llvm.org/D107711
2021-08-22 16:53:32 +08:00
Craig Topper 36d8316cc8 [RISCV] Reduce duplicate code for calling SimplifyDemandedBits.
This encapsulates the APInt creation and worklist management into
a helper function.

To keep one common interface I've use Log2_32 in places that
previously created a mask by subtracting 1 from a power of 2.

Differential Revision: https://reviews.llvm.org/D108324
2021-08-19 07:09:38 -07:00
Craig Topper 6d7ea597ef [RISCV] Insert sext_inreg when type legalizing add/sub/mul with constant LHS.
We already do this for non-constants RHS. This just removes the
special case. I believe the special case may have been needed
because the ANY_EXTEND of a constant used to create zero extended
constants, but we recently changed that to produce sign extended
constants.

D107658 is needed to prevent some regressions.

Reviewed By: luismarques

Differential Revision: https://reviews.llvm.org/D107697
2021-08-18 10:44:25 -07:00
Craig Topper d63f117210 [RISCV] Support RISCVISD::SELECT_CC in ComputeNumSignBitsForTargetNode. 2021-08-13 18:00:09 -07:00
Craig Topper 6f5edc3487 [RISCV] Fold (add (select lhs, rhs, cc, 0, y), x) -> (select lhs, rhs, cc, x, (add x, y))
Similar for sub except sub isn't commutative.

Modify the existing and/or/xor folds to also work on ISD::SELECT
and not just RISCVISD::SELECT_CC. This is needed to make sure
we do this transform before type legalization turns i32 add/sub
into add/sub+sign_extend_inreg on RV64. If we don't do this before
that, the sign_extend_inreg will still be after the select.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D107603
2021-08-10 09:02:56 -07:00
Fraser Cormack 2b4a1d4b86 [RISCV] Improve codegen for shuffles with LHS/RHS splats
Shuffles which are broken into separate halves reveal splats in which
a half is accessed via one index; such operations can be optimized to
use "vrgather.vi".

This optimization could be achieved by adding extra patterns to match
`vrgather_vv_vl` which uses a splat as an index operand, but this patch
instead identifies splat earlier. This way, future optimizations can
build on top of the data gathered here, e.g., to splat-gather dominant
indices and insert any leftovers.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D107449
2021-08-09 10:31:40 +01:00
Craig Topper 2f3b738960 [RISCV] Add optimizations for FMV_X_ANYEXTH similar to FMV_X_ANYEXTW_RV64.
This enables the fneg and fabs combines we have for FMV_X_ANYEXTW_RV64.
2021-08-08 18:30:48 -07:00
Craig Topper 88bc29f5f2 [RISCV] Introduce a RISCV CondCode enum instead of using ISD:SET* in MIR. NFC
Previously we converted ISD condition codes to integers and stored
them directly in our MIR instructions. The ISD enum kind of belongs
to SelectionDAG so that seems like incorrect layering.

This patch instead uses a CondCode node on RISCV::SELECT_CC until
isel and then converts it from ISD encoding to a RISCV specific value.
This value can be converted to/from the RISCV branch opcodes in the
RISCV namespace.

My larger motivation is to possibly support a microarchitectural
feature of some CPUs where a short forward branch over a single
instruction can be predicated internally. This will require a new
pseudo instruction for select that needs to carry a branch condition
and live probably until RISCVExpandPseudos. At that point it can be
expanded to control flow without other instructions ending up in the
predicated basic block. Using an ISD encoding in RISCVExpandPseudos
doesn't seem like correct layering.

Reviewed By: luismarques

Differential Revision: https://reviews.llvm.org/D107400
2021-08-08 17:25:37 -07:00
Craig Topper d4ee84ceee [RISCV] Support FP_TO_S/UINT_SAT for i32 and i64.
The fcvt fp to integer instructions saturate if their input is
infinity or out of range, but the instructions produce a maximum
integer for nan instead of 0 required for the ISD opcodes.

This means we can use the instructions to do the saturating
conversion, but we'll need to fix up the nan case at the end.

We can probably improve the i8 and i16 default codegen as well,
but I'll leave that for a follow up.

Reviewed By: luismarques

Differential Revision: https://reviews.llvm.org/D107230
2021-08-07 16:06:00 -07:00
Fraser Cormack cba6aab971 [RISCV] Support simple fractional steps in matching VID sequences
This patch extends the optimization of VID-sequence BUILD_VECTORs
introduced in D104921 to include simple fractional steps composed of a
separated integer numerator and denominator.

A notable limitation in this sequence detection is that only sequences
with steps N/1 or 1/D are found, meaning that the step between elements
and the frequency with which it changes is consistent across the whole
sequence. Fractional steps such as 2/3 won't be matched as those would
involve more complex tracking of state or some level of backtracking.

As is stands, however, this patch is sufficient to match common
interleave-type shuffle indices, for example matching `<0,0,1,1>` (or
commonly `<0,u,1,u>` or `<u,0,u,1>`) to an index sequence divided by 2.

While the optimization is relatively `undef`-tolerant, due to greedy
pattern-matching there even are some simple patterns which confuse the
sequence detection into identifying either a suboptimal sequence or no
sequence at all.

Currently only fractional-step sequences identified as having a
power-of-two denominator are actually lowered to RVV instructions. This
is to avoid introducing divisions into the generated code.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D106533
2021-08-03 10:38:24 +01:00
Hsiangkai Wang 8b33839f01 [RISCV] Rename vector inline constraint from 'v' to 'vr' and 'vm' in IR.
Differential Revision: https://reviews.llvm.org/D107139
2021-08-01 05:58:17 +08:00
Craig Topper 593059b328 [RISCV] Rename RISCVISD::FCVT_W_RV64 to FCVT_W_RTZ_RV64. NFC
fcvt.w(u) supports multiple rounding modes, but the ISD node
doesn't encode that. So name it to match the rounding mode it uses.
2021-07-31 11:14:59 -07:00
Fraser Cormack 02dd4b59bc [RISCV] Optimize floating-point "dominant value" BUILD_VECTORs
This patch aims to improve the performance of BUILD_VECTORs which are
identified as containing a dominant element. Given that most
floating-point constants themselves require a load from the constant
pool, it was possible for the optimization to actually increase the
number of individual loads on small vectors. The exception is the zero
constant -- +0.0 -- which can be materialized efficiently.

While this optimization could do with a proper cost model to weigh the
benfits of a single vector load vs. the manipulation of individual
elements -- even for integer vectors which often require several
instructions to materialize -- without a concrete RVV implementation to
work with any heuristic is likely to be both more obtuse and inaccurate.

Until then, this patch fixes at least one known obvious deficiency.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D106963
2021-07-29 09:22:34 +01:00
Ben Shi 264b8e2a20 [RISCV] Optimize mul in the zba extension with SH*ADD
This patch makes the following optimization, if the
immediate multiplier is not a simm12.

(mul x, (power_of_2 + 2)) => (SH1ADD x, (SLLI x, bits))
(mul x, (power_of_2 + 4)) => (SH2ADD x, (SLLI x, bits))
(mul x, (power_of_2 + 8)) => (SH3ADD x, (SLLI x, bits))

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D106648
2021-07-29 09:46:41 +08:00
Craig Topper 3106f85945 [RISCV] Fix grammar in a comment. NFC 2021-07-28 09:09:26 -07:00
Craig Topper 54588bcc05 [RISCV] Restrict performANY_EXTENDCombine to prevent an infinite loop.
The sign_extend we insert here can get turned into a zero_extend if
the sign bit is known zero. This can enable a setcc combine that
shrinks compares with zero_extend. This reduces the use count of
the zero_extend allowing other combines to turn it back into an
any_extend.

This restricts the combine to only cases where the result is used
by a CopyToReg. This works for my original motivating case. I
hope the CopyToReg use will prevent any converted extends from
turning back into an any_extend.

Reviewed By: luismarques

Differential Revision: https://reviews.llvm.org/D106754
2021-07-28 09:05:45 -07:00
Fraser Cormack 172487fe4c [RISCV] Add support for vector saturating add/sub operations
This patch adds support for lowering the saturating vector add/sub
intrinsics to RVV instructions, for both fixed-length and
scalable-vector forms alike.

Note that some of the DAG combines are still not triggering for the
scalable-vector tests. These require a bit more work in the DAGCombiner
itself.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D106651
2021-07-27 10:04:14 +01:00
Craig Topper c63dbd8501 [RISCV] Custom lower (i32 (fptoui/fptosi X)).
I stumbled onto a case where our (sext_inreg (assertzexti32 (fptoui X)), i32)
isel pattern can cause an fcvt.wu and fcvt.lu to be emitted if
the assertzexti32 has an additional user. If we add a one use check
it would just cause a fcvt.lu followed by a sext.w when only need
a fcvt.wu to satisfy both users.

To mitigate this I've added custom isel and new ISD opcodes for
fcvt.wu. This allows us to keep know it started life as a conversion
to i32 without needing to match multiple nodes. ComputeNumSignBits
has been taught that this new nodes produces 33 sign bits. To
prevent regressions when we need to zero extend the result of an
(i32 (fptoui X)), I've added a DAG combine to convert it to an
(i64 (fptoui X)) before type legalization. In most cases this would
happen in InstCombine, but a zero_extend can be created for function
returns or arguments.

To keep everything consistent I've added new nodes for fptosi as well.

Reviewed By: luismarques

Differential Revision: https://reviews.llvm.org/D106346
2021-07-24 10:50:43 -07:00
Fraser Cormack b115c038d2 [RISCV] Fix a crash when lowering split float arguments
Lowering certain float vectors without legal vector types could cause a
crash due to a bad interaction between passing floats via GPRs and
argument splitting. Split vector floats appear just like scalar floats.
Under certain situations we choose to pass these float arguments via
GPRs and use an XLenVT location and set the 'BCvt' info to track how
they must be converted back to floating-point values. However, later
logic for handling split arguments may take over, in which case we lose
the previous information and set the 'Indirect' info, thus incorrectly
lowering to integer types.

I don't believe that we would have come across the notion of split
floating-point arguments before. This patch addresses the issue by
updating the lowering so that split arguments are only passed indirectly
when they are scalar integer types.

This has some change to how we lower some larger illegal float vectors,
as can be seen in 'fastcc-float.ll' where the vector is now passed
partly in registers and partly on the stack.

Reviewed By: luismarques

Differential Revision: https://reviews.llvm.org/D102852
2021-07-22 09:55:26 +01:00
Fraser Cormack 7b3a69bc16 [RISCV] Lower more BUILD_VECTOR sequences to RVV's VID
This relands a6ca88e908 which was originally
reverted due to overflow bugs in e3fa2b1eab.

This patch teaches the compiler to identify a wider variety of
`BUILD_VECTOR`s which form integer arithmetic sequences, and to lower
them to `vid.v` with modifications for non-unit steps and non-zero
addends.

The sequences handled by this optimization must either be monotonically
increasing or decreasing. Consecutive elements holding the same value
indicate a fractional step which, while simple mathematically,
becomes more complex to handle both in the realm of lossy integer
division and in the presence of `undef`s.

For example, a common "interleaving" shuffle index will be lowered by
LLVM to both `<0,u,1,u,2,...>` and `<u,0,u,1,u,...>` `BUILD_VECTOR`
nodes. Either of these would ideally be lowered to `vid.v` shifted right
by 1. Detection of this sequence in presence of general `undef` values
is more complicated, however: `<0,u,u,1,>` could match either
`<0,0,0,1,>` or `<0,0,1,1,>` depending on later values in the sequence.
Both are possible, so backtracking or multiple passes is inevitable.

Sticking to monotonic sequences keeps the logic simpler as it can be
done in one pass. Fractional steps will likely be a separate
optimization in a future patch.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D104921
2021-07-22 09:36:12 +01:00
Eli Friedman 0ca46a1757 [SelectionDAG] Fix the representation of ISD::STEP_VECTOR.
The existing rule about the operand type is strange.  Instead, just say
the operand is a TargetConstant with the right width.  (Legalization
ignores TargetConstants, so it doesn't matter if that width is legal.)

Highlights:

1. I had to substantially rewrite the AArch64 isel patterns to expect a
TargetConstant.  Nothing too exotic, but maybe a little hairy. Maybe
worth considering a target-specific node with some dagcombines instead
of this complicated nest of isel patterns.
2. Our behavior on RV32 for vectors of i64 has changed slightly. In
particular, we correctly preserve the width of the arithmetic through
legalization.  This changes the DAG a bit. Maybe room for
improvement here.
3. I explicitly defined the behavior around overflow. This is necessary
to make the DAGCombine transforms legal, and I don't think it causes any
practical issues.

Differential Revision: https://reviews.llvm.org/D105673
2021-07-21 10:58:40 -07:00
Craig Topper 81efb82570 [RISCV] Teach RISCVMatInt about cases where it can use LUI+SLLI to replace LUI+ADDI+SLLI for large constants.
If we need to shift left anyway we might be able to take advantage
of LUI implicitly shifting its immediate left by 12 to cover part
of the shift. This allows us to use more bits of the LUI immediate
to avoid an ADDI.

isDesirableToCommuteWithShift now considers compressed instruction
opportunities when deciding if commuting should be allowed.

I believe this is the same or similar to one of the optimizations
from D79492.

Reviewed By: luismarques, arcbbb

Differential Revision: https://reviews.llvm.org/D105417
2021-07-20 09:22:06 -07:00
Craig Topper 84877a098a [RISCV] Use unordered indexed loads for MGATHER.
I don't think the semantics of the llvm masked gather intrinsic care
about the order the elements are loaded. For example, type legalization
by splitting will chain them in parallel. This is different than
scatter which we do chain in order.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D106025
2021-07-20 08:46:02 -07:00
Craig Topper 50302feb1d [SelectionDAG][RISCV] Use isSExtCheaperThanZExt to control whether sext or zext is used for constant folding any_extend.
RISCV would prefer a sign extended constant since that works better
with our constant materialization. We have an existing TLI hook we
use to control sign extension of setcc operands in type legalization.
That hook happens to do the right check we need here, but might be
straying from its original purpose. With only RISCV defining this
hook in tree, I wasn't sure if it was worth adding another hook
with identical behavior.

This is an alternative to D105785 where I tried to handle this in
the RISCV backend by not creating ANY_EXTENDs in some places.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D105918
2021-07-19 09:25:28 -07:00
Craig Topper d0f8047d37 [RISCV] Teach computeKnownBitsForTargetNode that VLENB will never be more than 65536/8. 2021-07-17 11:24:20 -07:00
Craig Topper 173332d175 [RISCV] Manually emit the best shift for VSCALE lowering to improve codegen.
We assume VLENB is a multiple of 8 and previously relied on shift
pairs being optimized to an AND+SHL/SHR and computeKnownBits
removing the AND. This doesn't happen if (vlenb >> 3) gets CSEd
to have multiple uses. This patch manually emits the best shift
to workaround this.
2021-07-17 00:52:07 -07:00
Craig Topper 4dbb788068 [RISCV] Teach constant materialization that it can use zext.w at the end with Zba to reduce number of instructions.
If the upper 32 bits are zero and bit 31 is set, we might be able to
use zext.w to fill in the zeros after using an lui and/or addi.

Most of this patch is plumbing the subtarget features into the constant
materialization.

Reviewed By: luismarques

Differential Revision: https://reviews.llvm.org/D105509
2021-07-16 09:35:56 -07:00
Craig Topper 0ce13f92b7 [RISCV] Add curly braces around a case body that declares variables. NFC
This is at the end of the switch so doesn't cause any issues now,
but if a new case is added it will break.
2021-07-16 09:35:56 -07:00
Fraser Cormack e3fa2b1eab Revert "[RISCV] Lower more BUILD_VECTOR sequences to RVV's VID"
This reverts commit a6ca88e908.

More caution is required to avoid overflow/underflow. Thanks to the
santizers for catching this.
2021-07-16 15:00:20 +01:00
Fraser Cormack a6ca88e908 [RISCV] Lower more BUILD_VECTOR sequences to RVV's VID
This patch teaches the compiler to identify a wider variety of
`BUILD_VECTOR`s which form integer arithmetic sequences, and to lower
them to `vid.v` with modifications for non-unit steps and non-zero
addends.

The sequences handled by this optimization must either be monotonically
increasing or decreasing. Consecutive elements holding the same value
indicate a fractional step which, while simple mathematically,
becomes more complex to handle both in the realm of lossy integer
division and in the presence of `undef`s.

For example, a common "interleaving" shuffle index will be lowered by
LLVM to both `<0,u,1,u,2,...>` and `<u,0,u,1,u,...>` `BUILD_VECTOR`
nodes. Either of these would ideally be lowered to `vid.v` shifted right
by 1. Detection of this sequence in presence of general `undef` values
is more complicated, however: `<0,u,u,1,>` could match either
`<0,0,0,1,>` or `<0,0,1,1,>` depending on later values in the sequence.
Both are possible, so backtracking or multiple passes is inevitable.

Sticking to monotonic sequences keeps the logic simpler as it can be
done in one pass. Fractional steps will likely be a separate
optimization in a future patch.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D104921
2021-07-16 10:35:13 +01:00
Fraser Cormack 03a4702c88 [RISCV] Fix the neutral element in vector 'fadd' reductions
Using positive zero as the neutral element in 'fadd' reductions, while
it generates better code, is incorrect. The correct neutral element is
negative zero: 0.0 + -0.0 = 0.0, whereas -0.0 + -0.0 = -0.0.

There are perhaps more optimal lowerings of negative zero avoiding
constant-pool loads which could be left as future work.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D105902
2021-07-14 10:18:38 +01:00
Craig Topper 1e670dc7d7 [RISCV] Use DIVUW/REMUW/DIVW instructions for i8/i16/i32 udiv/urem/sdiv when LHS is constant.
We don't really have optimizations for division with a constant
LHS. If we don't use a W instruction we end up needing to sign
or zero extend the RHS to use the 64-bit instruction.

I had to sign_extend i32 constants on the LHS instead of using
any_extend which becomes zero_extend. If we don't do this, constants
that were originally negative become harder to materialize. I think
this problem exists for more of our W instruction cases. For example
(i32 (shl -1, X)), but we don't have lit tests. I'll work on that
as a follow up.

I also left a FIXME for enabling W instruction for RHS constants
under -Oz.

Reviewed By: luismarques

Differential Revision: https://reviews.llvm.org/D105769
2021-07-13 10:33:57 -07:00
Fangrui Song 3d89fb4d13 [RISCV] Support machine constraint "S"
Similar to D46745, "S" represents an absolute symbolic operand, which
can be used to specify the access models, e.g.

  extern int var;
  void *addr_via_asm() {
    void *ret;
    asm("lui %0, %%hi(%1)\naddi %0,%0,%%lo(%1)" : "=r"(ret) : "S"(&var));
    return ret;
  }

'S' is documented in trunk GCC: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101275

Reviewed By: luismarques

Differential Revision: https://reviews.llvm.org/D105254
2021-07-13 09:30:09 -07:00
Fraser Cormack d991b7212b [RISCV] Pass undef VECTOR_SHUFFLE indices on to BUILD_VECTOR
Often when lowering vector shuffles, we split the shuffle into two
LHS/RHS shuffles which are then blended together. To do so we split the
original indices into two, indexed into each respective vector. These
two index vectors are then separately lowered as BUILD_VECTORs.

This patch forwards on any undef indices to the BUILD_VECTOR, rather
than having the VECTOR_SHUFFLE lowering decide on an optimal concrete
index. The motiviation for ths change is so that we don't duplicate
optimization logic between the two lowering methods and let BUILD_VECTOR
do what it does best.

Propagating undef in this way allows us, for example, to generate
`vid.v` to produce the LHS indices of commonly-used interleave-type
shuffles. I have designs on further optimizing interleave-type and other
common shuffle patterns in the near future.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D104789
2021-07-13 10:41:54 +01:00
Craig Topper 12d51f95fe [RISCV] Implement lround*/llround*/lrint*/llrint* with fcvt instruction with -fno-math-errno
These are fp->int conversions using either RMM or dynamic rounding modes.

The lround and lrint opcodes have a return type of either i32 or
i64 depending on sizeof(long) in the frontend which should follow
xlen. llround/llrint should always return i64 so we'll need a libcall
for those on rv32.

The frontend will only emit the intrinsics if -fno-math-errno is in
effect otherwise a libcall will be emitted which will not use
these ISD opcodes.

gcc also does this optimization.

Reviewed By: arcbbb

Differential Revision: https://reviews.llvm.org/D105206
2021-07-06 11:43:22 -07:00
Craig Topper 2b5e53111a [RISCV] Add support for matching vwmul(u) and vwmacc(u) from fixed vectors.
This adds a DAG combine to detect sext/zext inputs and emit a
new ISD opcode. The extends will either be removed or replaced
with narrower extends.

Isel patterns are used to match add and widening mul to vwmacc
similar to the recently added vmacc patterns.

There's still some work to be to match vmulsu.
We should also rewrite splats that were extended as scalars and
then splatted.

Reviewed By: arcbbb

Differential Revision: https://reviews.llvm.org/D104802
2021-07-06 10:24:31 -07:00
Craig Topper 3b6dfa381e [RISCV] Protect the SHL/SRA/SRL handlers in LowerOperation against being called for an illegal i32 shift amount.
It seems it is possible for DAG combine to create a shl with an
i64 result type and an i32 shift amount. This is ok before type
legalization since the type don't need to match in SelectionDAG.
This results in type legalization calling LowerOperation to
legalize just the amount. We weren't expecting this so we
asserted for not finding a fixed vector shift.

To fix this, I've added a check for the fixed vector case and
returned SDValue() to get the default type legalizer. I've
factored all shifts together and added a fixed vector specific
handler to avoid repeating similar code for each in
LowerOperation.

The particular case I found was exposed by D104581, but the bad
shift is created after that patch triggers.
2021-06-29 09:45:13 -07:00
Jim Lin 779d2b0a42 [RISCV][NFC] Combine the control flow for different RetOp of interrupt function
Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D104838
2021-06-26 17:28:03 +08:00
Craig Topper d4f4a1ba62 [RISCV] Add DAG combine to detect opportunities to replace (i64 (any_extend (i32 X)) with sign_extend.
If type legalization is going to insert a sign_extend for other users
of X and we can fold the sign_extend into ADDW/MULW/SUBW, it is
better to replace the ANY_EXTEND so we don't end up with a separate
ADD/MUL/SUB instruction for the users of the ANY_EXTEND.

I'm only handling setcc uses right now, but there are other
instructions that force sign_extends like ashr.

There are probably other *W instructions we could use in addition
to ADDW/SUBW/MULW.

My motivating case was a loop terminating compare and a phi use
as seen in the new test file.

Reviewed By: asb

Differential Revision: https://reviews.llvm.org/D104581
2021-06-25 23:16:37 -07:00
Fraser Cormack a4729f7f88 [RISCV] Lower RVV vector SELECTs to VSELECTs
This patch optimizes the code generation of vector-type SELECTs (LLVM
select instructions with scalar conditions) by custom-lowering to
VSELECTs (LLVM select instructions with vector conditions) by splatting
the condition to a vector. This avoids the default expansion path which
would either introduce control flow or fully scalarize.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D104772
2021-06-24 10:12:51 +01:00
Fraser Cormack fed1503e85 [RISCV][VP] Lower FP VP ISD nodes to RVV instructions
With the exception of `frem`, this patch supports the current set of VP
floating-point binary intrinsics by lowering them to to RVV instructions. It
does so by using the existing `RISCVISD *_VL` custom nodes as an intermediate
layer. Both scalable and fixed-length vectors are supported by using this
method.

The `frem` node is unsupported due to a lack of available instructions. For
fixed-length vectors we could scalarize but that option is not (currently)
available for scalable-vector types. The support is intentionally left out so
it equivalent for both vector types.

The matching of vector/scalar forms is currently lacking, as scalable vector
types do not lower to the custom `VFMV_V_F_VL` node. We could either make
floating-point scalable vector splats lower to this node, or support the
matching of multiple kinds of splat via a `ComplexPattern`, much like we do for
integer types.

Reviewed By: rogfer01

Differential Revision: https://reviews.llvm.org/D104237
2021-06-17 10:04:00 +01:00
Fraser Cormack c75e454cb9 [RISCV] Transform unaligned RVV vector loads/stores to aligned ones
This patch adds support for loading and storing unaligned vectors via an
equivalently-sized i8 vector type, which has support in the RVV
specification for byte-aligned access.

This offers a more optimal path for handling of unaligned fixed-length
vector accesses, which are currently scalarized. It also prevents
crashing when `LegalizeDAG` sees an unaligned scalable-vector load/store
operation.

Future work could be to investigate loading/storing via the largest
vector element type for the given alignment, in case that would be more
optimal on hardware. For instance, a 4-byte-aligned nxv2i64 vector load
could loaded as nxv4i32 instead of as nxv16i8.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D104032
2021-06-14 18:12:18 +01:00
Fraser Cormack 502edebd9d [ValueTypes][RISCV] Cap RVV fixed-length vectors by size
This patch changes RVV's policy for its supported list of fixed-length
vector types by capping by vector size rather than element count. Now
all 1024-byte vectors (of supported element types) are supported, rather
than all 256-element vectors.

This is a more natural fit for the architecture, and allows us to, for
example, improve the support for vector bitcasts.

This change necessitated the adding of some new simple types to avoid
"regressing" on the number of currently-supported vectors. We round out
the 1024-byte types by adding `v512i8`, `v1024i8`, `v512i16` and
`v512f16`.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D103884
2021-06-09 12:15:37 +01:00
Fraser Cormack e8f1f89103 [RISCV] Support CONCAT_VECTORS on scalable masks
This patch is a simple fix which registers CONCAT_VECTORS as
custom-lowered for scalable mask vectors. This follows the pattern of
all other scalable-vector types, as the default expansion of
CONCAT_VECTORS cannot handle scalable types, and even if it did it'd go
through the stack and generate worse code.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D103896
2021-06-09 09:07:44 +01:00
Craig Topper f30f8b4f12 [RISCV] Lower i8/i16 bswap/bitreverse to grevi/greviw with Zbp.
Include known bits support so we know we don't need to zext the
output if the input was already zero extended.

Reviewed By: luismarques

Differential Revision: https://reviews.llvm.org/D103757
2021-06-07 10:31:51 -07:00
Craig Topper 8bde5f06a1 [RISCV] Replace && with ||. Spotted by coverity.
We should be exiting when the shift amount is greater than
the bit width regardless of whether it is a power of 2.

Reported by Simon Pilgrim here https://reviews.llvm.org/D96661

This requires getting a shift amount that is out of bounds that
wasn't already optimized by SelectionDAG. This would be pretty
trick to construct a test for.

Or it would require a non-power of 2 shift amount and a mask
that has runs of ones and zeros of the next lowest power of 2 from
that shift amount. I tried a little to produce a test for this,
but didn't get it to work.
2021-06-06 13:09:51 -07:00
Nikita Popov 1ffa6499ea [TargetLowering] Use IRBuilderBase instead of IRBuilder<> (NFC)
Don't require a specific kind of IRBuilder for TargetLowering hooks.
This allows us to drop the IRBuilder.h include from TargetLowering.h.

Differential Revision: https://reviews.llvm.org/D103759
2021-06-06 16:29:50 +02:00
Nikita Popov 9914200393 [CodeGen] Add missing includes (NFC)
These currently rely on the IRBuilder.h include in TargetLowering.h.
Make them explicit.
2021-06-06 15:48:27 +02:00
Fraser Cormack 3b0a33d0ad [RISCV] Expand unaligned fixed-length vector memory accesses
RVV vectors must be aligned to their element types, so anything less is
unaligned.

For regular loads and stores, our custom-lowering of fixed-length
vectors meant that we opted out of LegalizeDAG's built-in unaligned
expansion. This patch adds that logic in to our custom lower function.

For masked intrinsics, we declare that anything unaligned is not legal,
leaving the ScalarizeMaskedMemIntrin pass to do the expansion for us.

Note that neither of these methods can handle the expansion of
scalable-vector memory ops, so those cases are left alone by this patch.
Scalable loads and stores already go through expansion by default but
hit an assertion, and scalable masked intrinsics will silently generate
incorrect code. It may be prudent to return an error in both of these
cases.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D102493
2021-06-02 09:27:44 +01:00
Fraser Cormack 4f500c402b [RISCV] Support vector types in combination with fastcc
This patch extends the RISC-V lowering of the 'fastcc' calling
convention to vector types, both fixed-length and scalable. Without this
patch, any function passing or returning vector types by value would
throw a compiler error.

Vectors are handled in 'fastcc' much as they are in the default calling
convention, the noticeable difference being the extended set of scalar
GPR registers that can be used to pass vectors indirectly.

Reviewed By: HsiangKai

Differential Revision: https://reviews.llvm.org/D102505
2021-06-01 10:31:18 +01:00
Fraser Cormack 2b37c405cc [RISCV] Scale scalably-typed split argument offsets by VSCALE
This patch fixes a bug in lowering scalable-vector types in RISC-V's
main calling convention. When scalable-vector types are split and passed
indirectly, the target is responsible for scaling the offset --
initially set to the known-minimum store size -- by the scalable factor.

Before this we were issuing overlapping loads or stores to the different
parts, leading to incorrect codegen.

Credit to @HsiangKai for spotting this.

Reviewed By: HsiangKai

Differential Revision: https://reviews.llvm.org/D103262
2021-05-31 10:43:13 +01:00
Fraser Cormack eb23936591 [RISCV] Support vector conversions between fp and i1
This patch custom lowers FP_TO_[US]INT and [US]INT_TO_FP conversions
between floating-point and boolean vectors. As the default action is
scalarization, this patch both supports scalable-vector conversions and
improves the code generation for fixed-length vectors.

The lowering for these conversions can piggy-back on the existing
lowering, which lowers the operations to a supported narrowing/widening
conversion and then either an extension or truncation.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D103312
2021-05-31 09:55:39 +01:00
Fraser Cormack 5a80dc4988 [VP][SelectionDAG] Add a target-configurable EVL operand type
This patch adds a way for the target to configure the type it uses for
the explicit vector length operands of VP SDNodes. The type must be a
legal integer type (there is still no target-independent legalization of
this operand) and must currently be at least as big as i32, the type
used by the IR intrinsics. An implicit zero-extension takes place on
targets which choose a larger type. All VP nodes should be created with
this type used for the EVL operand.

This allows 64-bit RISC-V to avoid custom legalization of all VP nodes,
keeping them in their target-independent form for that bit longer.

Reviewed By: simoll

Differential Revision: https://reviews.llvm.org/D103027
2021-05-27 15:27:36 +01:00
Fraser Cormack b7101e218c [DAGCombine][RISCV] Don't try to trunc-store combined vector stores
DAGCombine's `mergeStoresOfConstantsOrVecElts` optimization is told
whether it's to use vector types and also whether it's to issue a
truncating store. However, the truncating store code path assumes a
scalar integer `ConstantSDNode`, and when using vector types it creates
either a `BUILD_VECTOR` or `CONCAT_VECTORS` to store: neither of which
is a constant.

The `riscv64` target is able to expose a crash here because it switches
on both code paths at the same time. The `f32` is stored as `i32` which
must be promoted to `i64`, necessitating a truncating store.
It also decides later that it prefers a vector store of `v2f32`.

While vector truncating stores are legal, this combine is not able to
emit them. We also don't have a test case. This patch adds an assert to
catch this case more gracefully, and updates one of the caller functions
to the function to turn off the use of truncating stores when preferring
vectors.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D103173
2021-05-27 14:16:32 +01:00
Fraser Cormack 8c73a31c11 [RISCV] Allow passing fixed-length vectors via the stack
The vector calling convention dictates that when the vector argument
registers are exhaused, GPRs are used to pass the address via the stack.
When the GPRs themselves are exhausted, at best we would previously
crash with an assertion, and at worst we'd generate incorrect code.

This patch addresses this issue by passing fixed-length vectors via the
stack with their full fixed-length size and aligned to their element
type size. Since the calling convention lowering can't yet handle
scalable vector types, this patch adds a fatal error to make it clear
that we are lacking in this regard.

Reviewed By: HsiangKai

Differential Revision: https://reviews.llvm.org/D102422
2021-05-27 14:14:07 +01:00
Fraser Cormack 772b58a641 [SelectionDAG][RISCV] Don't unroll 0/1-type bool VSELECTs
This patch extends the cases in which the legalizer is able to express
VSELECT in terms of XOR/AND/OR. When dealing with a VSELECT between
boolean vector types, the mask itself is an all-ones or all-ones value
of the operand type, so a 0/1 boolean type behaves identically to a 0/-1
type.

This greatly helps RISC-V which relies on expansion for these nodes. It
also allows scalable-vector bool VSELECTs to use the default expansion,
where before it would crash in SelectionDAG::UnrollVectorOp.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D103147
2021-05-27 10:08:57 +01:00
Craig Topper 9065118b64 [RISCV] Optimize SEW=64 shifts by splat on RV32.
SEW=64 shifts only uses the log2(64) bits of shift amount. If we're
splatting a 64 bit value in 2 parts, we can avoid splatting the
upper bits and just let the low bits be sign extended. They won't
be read anyway.

For the purposes of SelectionDAG semantics of the generic ISD opcodes,
if hi was non-zero or bit 31 of the low is 1, the shift was already
undefined so it should be ok to replace high with sign extend of low.

In order do be able to find the split i64 value before it becomes
a stack operation, I added a new ISD opcode that will be expanded
to the stack spill in PreprocessISelDAG. This new node is conceptually
similar to BuildPairF64, but I expanded earlier so that we could
go through regular isel to get the right VLSE opcode for the LMUL.
BuildPairF64 is expanded in a CustomInserter.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D102521
2021-05-26 10:23:32 -07:00
Craig Topper b510e4cf1b [RISCV] Add a vsetvli insert pass that can be extended to be aware of incoming VL/VTYPE from other basic blocks.
This is a replacement for D101938 for inserting vsetvli
instructions where needed. This new version changes how
we track the information in such a way that we can extend
it to be aware of VL/VTYPE changes in other blocks. Given
how much it changes the previous patch, I've decided to
abandon the previous patch and post this from scratch.

For now the pass consists of a single phase that assumes
the incoming state from other basic blocks is unknown. A
follow up patch will extend this with a phase to collect
information about how VL/VTYPE change in each block and
a second phase to propagate this information to the entire
function. This will be used by a third phase to do the
vsetvli insertion.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D102737
2021-05-24 11:47:27 -07:00
Fraser Cormack 7a211ed110 [RISCV] Prevent store combining from infinitely looping
RVV code generation does not successfully custom-lower BUILD_VECTOR in all
cases. When it resorts to default expansion it may, on occasion, be expanded to
scalar stores through the stack. Unfortunately these stores may then be picked
up by the post-legalization DAGCombiner which merges them again. The merged
store uses a BUILD_VECTOR which is then expanded, and so on.

This patch addresses the issue by overriding the `mergeStoresAfterLegalization`
hook. A lack of granularity in this method (being passed the scalar type) means
we opt out in almost all cases when RVV fixed-length vector support is enabled.
The only exception to this rule are mask vectors, which are always either
custom-lowered or are expanded to a load from a constant pool.

Reviewed By: HsiangKai

Differential Revision: https://reviews.llvm.org/D102913
2021-05-24 10:19:32 +01:00
Fraser Cormack c74ab891fc [RISCV] Ensure small mask BUILD_VECTORs aren't expanded
The default expansion for BUILD_VECTORs -- save for going through
shuffles -- is to go through the stack. This method only works when the
type is at least byte-sized, so for v2i1 and v4i1 we would crash.

This patch ensures that small mask-type BUILD_VECTORs are always handled
without crashing. We lower to a SETCC of the equivalent i8 type.

This also exposes some pre-existing issues where the lowering when
optimizing for size results in larger code than without. Those will be
tackled in future patches.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D102767
2021-05-20 19:12:29 +01:00
Fraser Cormack 26bd2250c1 [RISCV] Ensure shuffle splat operands are type-legal
The use of `SelectionDAG::getSplatValue` isn't guaranteed to return a
type-legal splat value as it may implicitly extract a vector element
from another shuffle. It is not permitted to introduce an illegal type
when lowering shuffles.

This patch addresses the crash by adding a boolean flag to
`getSplatValue`, defaulting to false, which when set will ensure a
type-legal return value. If it is unable to do that it will fail to
return a splat value.

I've been through the existing uses of `getSplatValue` in other targets
and was unable to find a need or test cases showing a need to update
their uses. In some cases, the call is made during `LegalizeVectorOps`
which may still produce illegal scalar types. In other situations, the
illegally-typed splat value may be quickly patched up to a legal type
(such as any-extending the returned `extract_vector_elt` up to a legal
type) before `LegalizeDAG` notices.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D102687
2021-05-20 18:00:03 +01:00
Fraser Cormack ca2c245ba4 [RISCV] Support INSERT_VECTOR_ELT into i1 vectors
Like the element extraction of these vectors, we choose to promote up to
an i8 vector type and perform the insertion there.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D102697
2021-05-19 09:41:50 +01:00
Craig Topper 9c345407b4 [RISCV] Remove RISCVII:VSEW enum. Make encodeVYPE operate directly on SEW.
The VSEW encoding isn't a useful value to pass around. It's better
to use SEW or log2(SEW) directly. The only real ugliness is that
the vsetvli IR intrinsics use the VSEW encoding, but it's easy
enough to decode that when the intrinsic is processed.
2021-05-12 13:19:08 -07:00
Evandro Menezes 3a64b7080d [RISCV] Move instruction information into the RISCVII namespace (NFC)
Move instruction attributes into the `RISCVII` namespace and add associated helper functions.

Differential Revision: https://reviews.llvm.org/D102268
2021-05-11 16:32:42 -05:00
Craig Topper ce6e4f27dd [RISCV] Use fractional LMULs for fixed length types smaller than riscv-v-vector-bits-min.
My thought process is that if v2i64 is an LMUL=1 type then v2i32
should be an LMUL=1/2 type. We limit the fractional LMUL so that
SEW=64 clips to LMUL=1, SEW=32 clips to LMUL=1/2, etc. This
ensures there's always a fractional LMUL available to truncate a type.
This does reduce the number of vsetvlis in some cases.

Some tests increase vsetvlis because the best container type for a
mask type is dependent on the LMUL+SEW that the mask was produced
from, but you can't tell that from the type. I think this is
something we need to solve this in the machine IR when optimizing
vsetvlis.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D101215
2021-05-11 09:42:48 -07:00
Craig Topper 80b9510806 [RISCV] Correct VL for fixed length masked scatter.
We were incorrectly calling getVectorNumElements on a scalable
vector type. This shouldn't be allowed. This gives a warning on
EVT, but not MVT.
2021-05-10 09:50:08 -07:00
Fraser Cormack 6db0cedd23 [LegalizeVectorOps][RISCV] Add scalable-vector SELECT expansion
This patch extends VectorLegalizer::ExpandSELECT to permit expansion
also for scalable vector types. The only real change is conditionally
checking for BUILD_VECTOR or SPLAT_VECTOR legality depending on the
vector type.

We can use this to fix "cannot select" errors for scalable vector
selects on the RISCV target. Note that in future patches RISCV will
possibly custom-lower vector SELECTs to VSELECTs for branchless codegen.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D102063
2021-05-10 08:22:35 +01:00
Craig Topper 6660319cef [RISCV] Remove unused RISCV::VLEFF and VLEFF_MASK. NFC
Looks like these got left behind when vleff isel was moved to
X86ISelDAGToDAG.cpp
2021-05-06 09:41:29 -07:00
Fraser Cormack 6f17613bfb [RISCV][VP] Lower VP ISD nodes to RVV instructions
This patch supports all of the current set of VP integer binary
intrinsics by lowering them to to RVV instructions. It does so by using
the existing RISCVISD *_VL custom nodes as an intermediate layer. Both
scalable and fixed-length vectors are supported by using this method.

One notable change to the existing vector codegen strategy is that
scalable all-ones and all-zeros mask SPLAT_VECTORs are now lowered to
RISCVISD VMSET_VL and VMCLR_VL nodes to match their fixed-length
BUILD_VECTOR counterparts. This allows them to reuse the existing
"all-ones" VL patterns.

To reduce the size of the phabricator diff, some tests are intentionally
left out and will be added later if the patch is accepted.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D101826
2021-05-05 12:32:24 +01:00
Fraser Cormack cd6a52fede [RISCV] Cap legal fixed-length vectors to 256-element types
Previously, RISC-V would make legal all fixed-length vectors types whose
size are less than or equal to some function of the minimum value of
VLEN and the maximum-permissible LMUL grouping.

Due to vector legalization issues, this patch instead caps the legal
fixed-length vector types to those with 256 elements. This value was
chosen because it is the longest vector length which has corresponding
MVTs across all supported element types.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D101839
2021-05-05 09:51:08 +01:00
Fraser Cormack 46fa214a6f [RISCV] Lower splats of non-constant i1s as SETCCs
This patch adds support for splatting i1 types to fixed-length or
scalable vector types. It does so by lowering the operation to a SETCC
of the equivalent i8 type.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D101465
2021-05-04 09:14:05 +01:00
Fraser Cormack d23e4f6872 [RISCV] Add support for fmin/fmax vector reductions
Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D101518
2021-05-03 10:33:51 +01:00
Craig Topper ba63cdb8f2 [RISCV] Store SEW in RISCV vector pseudo instructions in log2 form.
This shrinks the immediate that isel table needs to emit for these
instructions. Hoping this allows me to change OPC_EmitInteger to
use a better variable length encoding for representing negative
numbers. Similar to what was done a few months ago for OPC_CheckInteger.

The alternative encoding uses less bytes for negative numbers, but
increases the number of bytes need to encode 64 which was a very
common number in the RISCV table due to SEW=64. By using Log2 this
becomes 6 and is no longer a problem.
2021-05-02 12:09:20 -07:00
Fraser Cormack 791766e6d2 [RISCV] Support STEP_VECTOR with a step greater than one
DAGCombiner was recently taught how to combine STEP_VECTOR nodes,
meaning the step value is no longer guaranteed to be one by the time it
reaches the backend for lowering.

This patch supports such cases on RISC-V by lowering to other step
values to a multiply following the vid.v instruction. It includes a
small optimization for common cases where the multiply can be expressed
as a shift left.

Reviewed By: rogfer01

Differential Revision: https://reviews.llvm.org/D100856
2021-04-30 09:36:18 +01:00
Craig Topper dcdda2bdf2 [RISCV] Teach DAG combine to fold (and (select_cc lhs, rhs, cc, -1, c), x) -> (select_cc lhs, rhs, cc, x, (and, x, c))
Similar for or/xor with 0 in place of -1.

This is the canonical form produced by InstCombine for something like `c ? x & y : x;` Since we have to use control flow to expand select we'll usually end up with a mv in basic block. By folding this we may be able to pull the and/or/xor into the block instead and avoid a mv instruction.

The code here is based on code from ARM that uses this to create predicated instructions. I'm doing it on SELECT_CC so it happens late, but we could do it on select earlier which is what ARM does. I'm not sure if we lose any combine opportunities if we do it earlier.

I left out add and sub because this can separate sext.w from the add/sub. It also made a conditional i64 addition/subtraction on RV32 worse. I guess both of those would be fixed by doing this earlier on select.

The select-binop-identity.ll test has not been commited yet, but I made the diff show the changes to it.

Reviewed By: luismarques

Differential Revision: https://reviews.llvm.org/D101485
2021-04-29 09:43:51 -07:00
Craig Topper 0c330afdfa [RISCV] Enable SPLAT_VECTOR for fixed vXi64 types on RV32.
This replaces D98479.

This allows type legalization to form SPLAT_VECTOR_PARTS so we don't
lose the splattedness when the scalar type is split.

I'm handling SPLAT_VECTOR_PARTS for fixed vectors separately so
we can continue using non-VL nodes for scalable vectors.

I limited to RV32+vXi64 because DAGCombiner::visitBUILD_VECTOR likes
to form SPLAT_VECTOR before seeing if it can replace the BUILD_VECTOR
with other operations. Especially interesting is a splat BUILD_VECTOR of
the extract_vector_elt which can become a splat shuffle, but won't if
we form SPLAT_VECTOR first. We either need to reorder visitBUILD_VECTOR
or add visitSPLAT_VECTOR.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D100803
2021-04-29 08:20:09 -07:00
Craig Topper 25391cec3a [RISCV] Teach computeKnownBits that vsetvli returns number less than 2^31.
This seems like a reasonable upper bound on VL. WG discussions for
the V spec would probably allow us to use 2^16 as an upper bound
on VLEN, but this is good enough for now.

This allows us to remove sext and zext if user happens to assign
the size_t result into an int and then uses it as a VL intrinsic
argument which is size_t.

Reviewed By: frasercrmck, rogfer01, arcbbb

Differential Revision: https://reviews.llvm.org/D101472
2021-04-29 08:07:59 -07:00
Fraser Cormack 43ad058a01 [RISCV] Fix stack slot for argument types (Bug 49500)
This is an complementary/alternative fix for D99068. It takes a slightly
different approach by explicitly summing up all of the required split
part type sizes and ensuring we allocate enough space for them. It also
takes the maximum alignment of each part.

Compared with D99068 there are fewer changes to the stack objects in
existing tests. However, @luismarques has shown in that patch that there
are opportunities to reduce our stack usage in the future.

Reviewed By: luismarques

Differential Revision: https://reviews.llvm.org/D99087
2021-04-29 09:10:48 +01:00
Craig Topper ce09dd54e6 [RISCV] Select 5 bit immediate for VSETIVLI during isel rather than peepholing in the custom inserter.
This adds a special operand type that is allowed to be either
an immediate or register. By giving it a unique operand type the
machine verifier will ignore it.

This perturbs a lot of tests but mostly it is just slightly different
instruction orders. Something bad did happen to some min/max reduction
tests. We're spilling vector registers when we weren't before.

Reviewed By: khchen

Differential Revision: https://reviews.llvm.org/D101246
2021-04-27 14:38:16 -07:00
Craig Topper 262a72f50f [RISCV] Use stack slot to handle SPLAT_VECTOR_PARTS on RV32.
Reduces the amount of vector ALU operations and reduces vector
register pressure.
2021-04-26 15:43:02 -07:00
Craig Topper e2cd92cb9b [RISCV] Match splatted load to scalar load + splat. Form strided load during isel.
This modifies my previous patch to push the strided load formation
to isel. This gives us opportunity to fold the splat into a .vx
operation first. Using a scalar register and a .vx operation reduces
vector register pressure which can be important for larger LMULs.

If we can't fold the splat into a .vx operation, then it can make
sense to use a strided load to free up the vector arithmetic
ALU to do actual arithmetic rather than tying it up with vmv.v.x.

Reviewed By: khchen

Differential Revision: https://reviews.llvm.org/D101138
2021-04-26 13:32:03 -07:00
Craig Topper 837442de9c [RISCV] Cleanup setOperationAction calls for INTRINSIC_WO_CHAIN/INTRINSIC_W_CHAIN
We have several extensions that need i32 to be Custom for
INTRINSIC_WO_CHAIN with RV64 so enable it for all RV64.

For V extension, make i32 Custom for RV64 and i64 Custom for RV32.
When the i32 or i64 is legal, the operation action doesn't matter.
LegalizeDAG checks MVT::Other rather than the real type.
2021-04-25 23:44:28 -07:00
Craig Topper 8f5cd49405 [RISCV] Teach DAG combine what bits Zbp instructions demanded from their inputs.
This teaches DAG combine that shift amount operands for grev, gorc
shfl, unshfl only read a few bits.

This also teaches DAG combine that grevw, gorcw, shflw, unshflw,
bcompressw, bdecompressw only consume the lower 32 bits of their
inputs.

In the future we can teach SimplifyDemandedBits to also propagate
demanded bits of the output to the inputs in some cases.
2021-04-25 21:54:06 -07:00
Levy Hsu 8cf54c7ff5 [RISCV] [1/2] Add IR intrinsic for Zbe extension
RV32/64:
bcompress
bdecompress

RV64 ONLY:
bcompressw
bdecompressw

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D101143
2021-04-25 19:14:34 -07:00
Craig Topper bd28d86119 [RISCV] Removed getLMULForFixedLengthVector.
Use getContainerForFixedLengthVector and getRegClassIDForVecVT to
get the register class to use when making a fixed vector type legal.

Inline it into the other two call sites.

I'm looking into using fractional lmul for fixed length vectors
and getLMULForFixedLengthVector returned an integer making it
unable to express this. I considered returning the LMUL
enum, but that seemed like it would introduce more complexity to
convert it for use.
2021-04-23 16:56:46 -07:00
Craig Topper bcf321015b [RISCV] Move getLMULForFixedLengthVector out of RISCVSubtarget.
Make it a static function RISCVISelLowering, the only place it
is used.

I think I'm going to make this return a fractional LMULs in some
cases so I'm sorting out where it should live before I start
making changes.
2021-04-23 15:06:20 -07:00
Craig Topper baa107f018 [RISCV] Only expose one interface for getContainerForFixedLengthVector in the RISCVTargetLowering class
We can have RISCVISelDAGToDAG.cpp call the VT only version by
finding the RISCVTargetLowering object via the Subtarget.

Make the static versions just global static functions in
RISCVISelLowering that can be called by static functions in that
file.
2021-04-23 15:06:10 -07:00
Fraser Cormack 83b8f8da82 [RISCV] Custom lower vector F(MIN|MAX)NUM to vf(min|max)
This patch adds support for both scalable- and fixed-length vector code
lowering of the llvm.minnum and llvm.maxnum intrinsics to the equivalent
RVV instructions.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D101035
2021-04-23 12:22:15 +01:00
Levy Hsu b49337bbb9 [RISCV] [1/2] Add IR intrinsic for Zbp extension
RV32/64:
    grev
    grevi
    gorc
    gorci
    shfl
    shfli
    unshfl
    unshfli

RV64 ONLY:
    grevw
    greviw
    gorcw
    gorciw
    shflw
    shfli     (For non-existing shfliw)
    unshfli   (For non-existing unshfliw)

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D100830
2021-04-22 16:34:51 -07:00
Craig Topper 70254ccb69 [RISCV] Turn splat shuffles of vector loads into strided load with stride of x0.
Implementations are allowed to optimize an x0 stride to perform
less memory accesses. This is the case in SiFive cores.

No idea if this is the case in other implementations. We might
need a tuning flag for this.

Reviewed By: frasercrmck, arcbbb

Differential Revision: https://reviews.llvm.org/D100815
2021-04-22 10:02:57 -07:00
Craig Topper 77f14c96e5 [RISCV] Use stack temporary to splat two GPRs into SEW=64 vector on RV32.
Rather than doing splatting each separately and doing bit manipulation
to merge them in the vector domain, copy the data to the stack
and splat it using a strided load with x0 stride. At least on
some implementations this vector load is optimized to not do
a load for each element.

This is equivalent to how we move i64 to f64 on RV32.

I've only implemented this for the intrinsic fallbacks in this
patch. I think we do similar splatting/shifting/oring in other
places. If this is approved, I'll refactor the others to share
the code.

Differential Revision: https://reviews.llvm.org/D101002
2021-04-22 09:50:07 -07:00
Serge Pavlov 740962e5d0 [RISCV] Custom lowering of SET_ROUNDING
Differential Revision: https://reviews.llvm.org/D91242
2021-04-22 15:04:55 +07:00
Craig Topper 58c5b4c2c3 [RISCV] Use TargetConstant for condition code of RISCVISD::SELECT_CC.
The value is always an immediate and can never be in a register.
This the kind of thing TargetConstant is for.

Saves a step GenDAGISel to convert a Constant to a TargetConstant.
2021-04-21 23:08:52 -07:00
Serge Pavlov 6e63dfdae2 [RISCV] Custom lowering of FLT_ROUNDS_
Differential Revision: https://reviews.llvm.org/D90854
2021-04-22 11:39:15 +07:00
Craig Topper f6d8cf7798 [RISCV] Teach lowerSPLAT_VECTOR_PARTS to detect cases where Hi is sign extended from Lo.
This recognizes the case when Hi is (sra Lo, 31). We can use
SPLAT_VECTOR_I64 rather than splatting the high bits and
combining them in the vector register.
2021-04-21 20:24:23 -07:00
Craig Topper 87afefcd22 [RISCV] Fix mistake in comment. NFC 2021-04-19 11:15:32 -07:00
Craig Topper 7ed01a420a [RISCV] Pad v4i1/v2i1/v1i1 stores with 0s to make a full byte.
As noted in the FIXME there's a sort of agreement that the any
extra bits stored will be 0.

The generated code is pretty terrible. I was really hoping we
could use a tail undisturbed trick, but tail undisturbed no
longer applies to masked destinations in the current draft
spec.

Fingers crossed that it isn't common to do this. I doubt IR
from clang or the vectorizer would ever create this kind of store.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D100618
2021-04-19 11:05:18 -07:00
Fraser Cormack c9a93c3e01 [RISCV] Lower vector shuffles to vrgather operations
This patch extends the lowering of RVV fixed-length vector shuffles to
avoid the default stack expansion and instead lower to vrgather
instructions.

For "permute"-style shuffles where one vector is swizzled, we can lower
to one vrgather. For shuffles involving two vector operands, we lower to
one unmasked vrgather (or splat, where appropriate) followed by a masked
vrgather which blends in the second half.

On occasion, when it's not possible to create a legal BUILD_VECTOR for
the indices, we use vrgatherei16 instructions with 16-bit index types.

For 8-bit element vectors where we may have indices over 255, we have a
fairly blunt fallback to the stack expansion to avoid custom-splitting
of the vector types.

To enable the selection of masked vrgather instructions, this patch
extends the various RISCVISD::VRGATHER nodes to take a passthru operand.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D100549
2021-04-19 11:13:13 +01:00
Craig Topper 1afdfc6169 [RISCV] Rename RISCVISD::GREVI(W)/GORCI(W) to RISCVISD::GREV(W)/GORC(W). Don't require second operand to be a constant.
Prep work for adding intrinsics for these instructions in the
future.
2021-04-13 11:04:28 -07:00
Craig Topper 7c9bbbf735 [RISCV] Rename RISCVISD::SHFLI to RISCVISD::SHFL and don't require the second operand to be an immediate.
Prep work for adding intrinsics in the future.

Left an assert that the input is constant in ReplaceNodeResults,
as the intrinsic shouldn't go through that path.
2021-04-12 23:46:50 -07:00
Craig Topper cb4c793e46 [RISCV] Update computeKnownBitsForTargetNode to treat READ_VLENB as being 16 byte aligned.
According to the 0.10 spec, VLEN is at least 128 bits and is a
power of 2.
2021-04-11 17:54:23 -07:00
Craig Topper bc0e052730 [RISCV] Teach targetShrinkDemandedConstant to preserve (and X, 0xffff) when zext.h is supported.
Similar to what we do for zext.w.

Disable the (srl (and X, 0xffff), C) custom isel when zext.h is
available.
2021-04-11 10:03:35 -07:00
Fraser Cormack a5693445ca [RISCV] Support OR/XOR/AND reductions on vector masks
This patch adds RVV codegen support for OR/XOR/AND reductions for both
scalable- and fixed-length vector types. There are a few possible
codegen strategies for each -- vmfirst.m, vmsbf.m, and vmsif.m could be
used to some extent -- but the vpopc.m instruction was chosen since it
produces the scalar result in one instruction, after which scalar
instructions can finish off the computation.

The reductions are lowered identically for both scalable- and
fixed-length vectors, although some alternate strategies may be more
optimal on fixed-length vectors since it's cheaper to get the length of
those types.

Other reduction types were not deemed to be relevant for mask vectors.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D100030
2021-04-08 09:46:38 +01:00
Serge Pavlov 65b1103798 [RISCV] DAG nodes and pseudo instructions for CSR access
New custom DAG nodes were added to represent operations on CSR. These
nodes are lowered to corresponding pseudo instruction. Using the pseudo
instructions allows to specify different scheduling information for
operations on different system registers. It also make possible to
specify dependencies of instructions on specific system registers.

Differential Revision: https://reviews.llvm.org/D98936
2021-04-08 10:36:36 +07:00
Craig Topper 56ea2e2fdd [RISCV] Add a special case to lowerSELECT for select of 2 constants with a SETLT condition.
If the constants have a difference of 1 we can convert one to
the other by adding or subtracting the condition.

We have a DAG combine for this, but it only runs before type
legalization. If the select is introduced later during type
legalization or op legalization we will miss it.

We don't need a specific condition, but some conditions are
harder to materialize than others on RISCV. I know that SETLT
will be a single instruction and it is what is used by the
motivating pattern from signed saturating add/sub.

Differential Revision: https://reviews.llvm.org/D99021
2021-04-07 13:47:17 -07:00
Craig Topper f087d7544a [RISCV] Support vslide1up/down intrinsics for SEW=64 on RV32.
This can't use our normal strategy of splatting the scalar and using
a .vv operation instead of .vx.

Instead this patch bitcasts the vector to the equivalent SEW=32
vector and inserts the scalar parts using two vslide1up/down. We
do that unmasked and apply the mask separately at the end with
a vmerge.

For vslide1up there maybe some other options here like getting
i64 into element 0 and using vslideup.vi with this vector as
vd and the original source as vs1. Masking would still need to
be done afterwards.

That idea doesn't work for vslide1down. We need to slidedown and
then insert a single scalar at vl-1 which we could do with a
vslideup, but that assumes vl > 0 which I don't think we can assume.

The i32 double slide1down implemented here is the best I could come
up with and I just made vslide1up consistent.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D99910
2021-04-07 10:44:53 -07:00
Craig Topper 01a23dccb1 [RISCV] Add an assertion to the ReplaceNodeResults handling of bitcasts to make sure the VT is always a scalar integer. 2021-04-06 16:48:40 -07:00
Craig Topper 2641c1f15e [RISCV] Don't custom type legalize fixed vector to scalar integer bitcasts if the fixed vector type isn't legal.
We encountered a hang in our internal code base. I'm having trouble
creating a test case because the test that hit it was testing some
code that is not upstream.
2021-04-06 15:00:33 -07:00
Craig Topper af2837675a [RISCV] Split RISCVISD::VMV_S_XF_VL into separate integer and FP.
It's a bit silly, but it allows us to write stricter type
constraints for isel. There's still some extra type checks in
the generated table due to some type interference limitations
around HWMode.
2021-04-05 12:57:35 -07:00
Fraser Cormack af3a839c70 [RISCV] Add support for bitcasts between scalars and fixed-length vectors
This patch supports bitcasts from scalar types to fixed-length vectors
and vice versa. It custom-lowers and custom-legalizes them to
EXTRACT_VECTOR_ELT/INSERT_VECTOR_ELT operations, using a single-element
vectors to hold the scalar where appropriate.

Previously, some of these would fail to select, others would be expanded
through stack loads and stores. Effort was made to ensure the codegen
avoids the stack for both legal and illegal scalar types.

Some of the codegen could be improved, but on first glance it looks like
a general optimization of EXTRACT_VECTOR_ELT when extracting an i64
element on RV32.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D99667
2021-04-05 17:21:55 +01:00
Fraser Cormack 3f0df4d7b0 [RISCV] Expand scalable-vector truncstores and extloads
Caught in internal testing, these operations are assumed legal by
default, even for scalable vector types. Expand them back into separate
truncations and stores, or loads and extensions.

Also add explicit fixed-length vector tests for these operations, even
though they should have been correct already.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D99654
2021-04-05 17:03:45 +01:00
Craig Topper 4708a05da0 [RISCV] Use gorciw for i32 orc.b intrinsic when Zbp is enabled.
The W version of orc.b does not exist in Zbp so we need to use
gorci encoding. If we have Zbp, we can use gorciw which can avoid a
sext.w in some cases.
2021-04-04 17:14:28 -07:00
Craig Topper 98d5db3e3a [RISCV] Lower orc.b intrinsic to RISCVISD::GORCI.
This will allow us to share any future known bits, demaned bits,
or sign bits improvements.
2021-04-04 12:31:41 -07:00
Craig Topper a2ea003fcb [RISCV] Don't convert fshr/fshl to target specific FSL/FSR node if shift amount is a constant.
As long as it's a constant we can directly pattern match it
without any problems. It's only when it isn't a constant that
we need to add an AND.

In theory this should allow more target independent optimizations
to remain active.
2021-04-03 23:13:30 -07:00
Levy Hsu 944adbf285 Recommit "[RISCV] Add IR intrinsic for Zbb extension"
Forgot to amend the Author.

Original commit message:

Header files are included in a separate patch in case the name needs to be changed.

RV32 / 64:
orc.b

Differential Revision: https://reviews.llvm.org/D99320
2021-04-02 11:50:19 -07:00
Craig Topper 1f0b309f24 Revert "[RISCV] Add IR intrinsic for Zbb extension"
This reverts commit 1808194590.

I forgot to change the author.
2021-04-02 11:47:02 -07:00
Craig Topper 1808194590 [RISCV] Add IR intrinsic for Zbb extension
Header files are included in a separate patch in case the name needs to be changed.

RV32 / 64:
orc.b
2021-04-02 11:23:57 -07:00
Craig Topper dbbc95e3e5 [RISCV] Use softPromoteHalf legalization for fp16 without Zfh rather than PromoteFloat.
The default legalization strategy is PromoteFloat which keeps
half in single precision format through multiple floating point
operations. Conversion to/from float is done at loads, stores,
bitcasts, and other places that care about the exact size being 16
bits.

This patches switches to the alternative method softPromoteHalf.
This aims to keep the type in 16-bit format between every operation.
So we promote to float and immediately round for any arithmetic
operation. This should be closer to the IR semantics since we
are rounding after each operation and not accumulating extra
precision across multiple operations. X86 is the only other
target that enables this today. See https://reviews.llvm.org/D73749

I had to update getRegisterTypeForCallingConv to force f16 to
use f32 when the F extension is enabled. This way we can still
pass it in the lower bits of an FPR for ilp32f and lp64f ABIs.
The softPromoteHalf would otherwise always give i16 as the
argument type.

Reviewed By: asb, frasercrmck

Differential Revision: https://reviews.llvm.org/D99148
2021-04-01 12:41:57 -07:00
Craig Topper d157e3f387 [RISCV] Fix handling of nxvXi64 vmsgt(u).vx intrinsics on RV32.
We need to splat the scalar separately and use .vv, but there is
no vmsgt(u).vv. So add isel patterns to select vmslt(u).vv with
swapped operands.

We also need to get VT to use for the splat from an operand rather
than the result since the result VT is nxvXi1.

Reviewed By: HsiangKai

Differential Revision: https://reviews.llvm.org/D99704
2021-04-01 10:38:05 -07:00
Craig Topper b7c2e577cc [RISCV] Add custom type legalization to form MULHSU when possible.
There's no target independent ISD opcode for MULHSU, so custom
legalize 2*XLen multiplies ourselves. We have to be a little
careful to prefer MULHU or MULHSU.

I thought about doing this in isel by pattern matching the
(add (mul X, (srai Y, XLen-1)), (mulhu X, Y)) pattern. I decided
against this because the add might become part of a chain of adds.
I don't trust DAG combine not to reassociate with other adds making
it difficult to find both pieces again.

Reviewed By: asb

Differential Revision: https://reviews.llvm.org/D99479
2021-04-01 10:15:55 -07:00
Craig Topper 2a8b7cab6a [RISCV] Add RISCVISD opcodes for CLZW and CTZW.
Our CLZW isel pattern is quite easily broken by surrounding code
preventing it from matching sometimes. This usually results in
failing to remove the and X, 0xffffffff inserted by type
legalization. The add with -32 that type legalization also inserts
will often gets combined into other add/sub nodes. That doesn't
usually result in extra code when we don't use clzw.

CTTZ seems to be less fragile, but I wanted to keep it consistent
with CTLZ.

Reviewed By: asb, HsiangKai

Differential Revision: https://reviews.llvm.org/D99317
2021-03-31 09:40:07 -07:00
Fraser Cormack 10fc6e4358 [RISCV] Add support for the stepvector intrinsic
This adds almost everything required for supporting the new stepvector
intrinsic on RVV. It is lowered to the existing VID_VL SDNode.

The only exception is a limitation that RV32 cannot yet lower the
intrinsic on i64 vectors. This is because the step operand is
(currently) required to be at least as large as the vector element type.
I will look into patching that out and loosening the requirement to only
an integer pointer type.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D99594
2021-03-31 11:41:17 +01:00
Craig Topper a33fcafaf0 [RISCV] Pass 'half' in the lower 16 bits of an f32 value when F extension is enabled, but Zfh is not.
Without Zfh the half type isn't legal, but it could still be
used as an argument/return in IR. Clang will not generate this today.

Previously we promoted the half value to float for arguments and
returns if the F extension is enabled but Zfh isn't. Then depending on
which ABI is enabled we would pass it in either an FPR or a GPR in
float format.

If the F extension isn't enabled, it would get passed in the lower
16 bits of a GPR in half format.

With this patch the value will always in half format and will be
in the lower bits of a GPR or FPR. This should be consistent
with where the bits are located when Zfh is enabled.

I've based this implementation off of how this is done on ARM.

I've manually nan-boxed the value to 32 bits using integer ops.
It looks like flw, fsw, fmv.s, fmv.w.x, fmf.x.w won't
canonicalize nans so should leave the value alone. I think those
are the instructions that could get used on this value.

Reviewed By: kito-cheng

Differential Revision: https://reviews.llvm.org/D98670
2021-03-30 09:47:54 -07:00
Craig Topper f069000b43 [RISCV] Remove floating point condition code legalization from lowerFixedLengthVectorSetccToRVV.
After D98939, this is done by LegalizeVectorOps making this code dead.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D99519
2021-03-30 09:11:56 -07:00
Craig Topper c40cea6f08 [RISCV] Teach targetShrinkDemandedConstant to preserve (and X, 0xffffffff).
We look for this pattern frequently in isel patterns so its a
good idea to try to preserve it.

This also let's us remove our special isel handling for srliw
and use a direct pattern match of (srl (and X, 0xffffffff), C)
since no bits will be removed from the and mask.

Differential Revision: https://reviews.llvm.org/D99042
2021-03-25 09:03:25 -07:00
Fraser Cormack 99211352c1 [RISCV] Optimize select-like vector shuffles
This patch adds a small optimization for vector shuffle lowering,
detecting shuffles which can be re-expressed as vector selects.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D99270
2021-03-25 11:39:57 +00:00
Fraser Cormack 321a71a772 [RISCV] Optimize BUILD_VECTOR sequences that reveal hidden splats
This patch adds further optimization techniques to RVV BUILD_VECTOR
lowering. It teaches the compiler to find splats of larger vector
element types "hidden" in smaller ones. For example, a v4i8 build_vector
(0x1, 0x2, 0x1, 0x2) could be splat as v2i16 0x0201. This is generally
more optimal than the dominant-element BUILD_VECTORs and so takes
priority.

This optimization is currently limited to all-constant-or-undef
BUILD_VECTORs as those were found to be the most common. There's no
reason this couldn't be extended to other BUILD_VECTORs, but the
additional bit-manipulation instructions may require more sophisticated
heuristics.

There are some cases where the materialization of the larger constant
takes more scalar instructions than it does to build the vector with
vector instructions. We could add heuristics to try and catch this.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D99195
2021-03-25 10:35:31 +00:00
Craig Topper 0f99c6c56e [RISCV] Remove duplicate DebugLoc variables from cases in ReplaceNodeResults. NFC
We already created a DebugLoc at the top of the function. We can
just use that one.
2021-03-24 20:23:03 -07:00
Fraser Cormack feff66a082 [RISCV] Further optimize BUILD_VECTORs with repeated elements
This patch builds upon the initial BUILD_VECTOR work introduced in
D98700. It further optimizes the lowering of BUILD_VECTOR by using
VSELECT operations to effectively insert repeated elements into the
vector with relatively few instructions. This allows us to optimize more
BUILD_VECTORs without significantly increasing the size of the generated
code.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D98969
2021-03-23 14:14:48 +00:00
Fraser Cormack 5bfbd9d938 [RISCV] Optimize all-constant mask BUILD_VECTORs
This patch adds an optimization for mask-vector BUILD_VECTOR nodes whose
elements are all constants or undef. It lowers such operations by
building up the vector via a series of integer operations, in which
multiple mask elements are inserted into a vector at a time via
i8/i16/i32/i64 element types. The final result is then bitcast from that
integer vector.

We restrict this optimization in certain circumstances when optimizing
for size. If we are required to use more than one integer insert
operation, then it will likely increase code size compared with using a
load from a constant pool.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D98860
2021-03-23 10:11:19 +00:00
Craig Topper 294efcd6f7 [RISCV] Add support for fixed vector masked gather/scatter.
I've split the gather/scatter custom handler to avoid complicating
it with even more differences between gather/scatter.

Tests are the scalable vector tests with the vscale removed and
dropped the tests that used vector.insert. We're probably not
as thorough on the splitting cases since we use 128 for VLEN here
but scalable vector use a known min size of 64.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D98991
2021-03-22 10:17:30 -07:00
Craig Topper 5d315691c4 [RISCV] Add missing bitcasts to the results of lowerINSERT_SUBVECTOR and lowerEXTRACT_SUBVECTOR when handling mask vectors.
Found by adding asserts to LegalizeDAG to catch incorrect result
types being returned.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D98964
2021-03-19 10:54:33 -07:00
Craig Topper 85f3f6b3cc [RISCV] Lower scalable vector masked loads to intrinsics to match fixed vectors and reduce isel patterns.
Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D98840
2021-03-19 10:39:35 -07:00
Fraser Cormack d399b82e2a [RISCV] Maintain fixed-length info when optimizing BUILD_VECTORs
I'm not sure how I failed to notice this before, but when optimizing
dominant-element BUILD_VECTORs we would lower via the scalable container type,
which lost us the information about the fixed length of the vector types. By
lowering via the fixed-length type we can preserve that information and
eliminate redundant vsetvli instructions.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D98938
2021-03-19 17:21:06 +00:00
Fraser Cormack 550292ecb1 [RISCV] Fix missing scalable->fixed-length vector conversion
Returning the scalable-vector container type would present problems when
the fixed-length INSERT_VECTOR_ELT was used by later operations.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D98776
2021-03-19 16:49:47 +00:00
Craig Topper c9861f722e [RISCV] Correct the output chain in lowerFixedLengthVectorMaskedLoadToRVV
We returned the input chain instead of the output chain from the
new load. This bypasses the load in the chain. I haven't found a
good way to test this yet. IR order prevents my initial attempts
at causing reordering.
2021-03-18 16:34:35 -07:00
Fraser Cormack 3495031a39 [RISCV] Support scalable-vector masked scatter operations
This patch adds support for masked scatter intrinsics on scalable vector
types. It is mostly an extension of the earlier masked gather support
introduced in D96263, since the addressing mode legalization is the
same.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D96486
2021-03-18 10:17:50 +00:00
Fraser Cormack 0331399dc9 [RISCV] Support scalable-vector masked gather operations
This patch supports the masked gather intrinsics in RVV.

The RVV indexed load/store instructions only support the "unsigned unscaled"
addressing mode; indices are implicitly zero-extended or truncated to XLEN and
are treated as byte offsets. This ISA supports the intrinsics directly, but not
the majority of various forms of the MGATHER SDNode that LLVM combines to. Any
signed or scaled indexing is extended to the XLEN value type and scaled
accordingly. This is done during DAG combining as widening the index types to
XLEN may produce illegal vectors that require splitting, e.g.
nxv16i8->nxv16i64.

Support for scalable-vector CONCAT_VECTORS was added to avoid spilling via the
stack when lowering split legalized index operands.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D96263
2021-03-18 09:26:18 +00:00
Fraser Cormack c2b4600ec8 [RISCV] Support bitcasts of fixed-length mask vectors
Without this patch, bitcasts of fixed-length mask vectors would go
through the stack.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D98779
2021-03-18 08:52:42 +00:00
Craig Topper 696ddef569 [RISCV] Support masked load/store for fixed vectors.
Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D98561
2021-03-17 10:26:15 -07:00
Fraser Cormack 70251759a2 [RISCV] Optimize "dominant element" BUILD_VECTORs
This patch adds an optimization path for BUILD_VECTOR nodes where the
majority of the elements are identical. These can be splatted, with the
remaining elements patched up with INSERT_VECTOR_ELTs. The threshold can
be tweaked as required - it is currently conservative. Undef elements
are disregarded when judging the dominance of a particular element. This
allows them to be covered by the splat value.

In addition, vectors of 2 elements are always optimized to a splat (for
the upper element) and an insert at element zero.

This optimization is disabled when optimizing for size.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D98700
2021-03-17 10:09:04 +00:00
Craig Topper 229eeb187d [RISCV] Look through copies when trying to find an implicit def in addVSetVL.
The InstrEmitter can sometimes insert a copy after an IMPLICIT_DEF
before connecting it to the vector instruction. This occurs when
constrainRegClass reduces to a class with less than 4 registers.
I believe LMUL8 on masked instructions triggers this since the
result can only use the v8, v16, or v24 register group as the mask
is using v0.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D98567
2021-03-16 07:59:09 -07:00
Craig Topper a33ce06cf5 [RISCV] Improve i32 UADDSAT/USUBSAT on RV64.
The default promotion uses zero extends that become shifts. We
cam use sign extend instead which is better for RISCV.

I've used two different implementations based on whether we
have minu/maxu instructions.

Differential Revision: https://reviews.llvm.org/D98683
2021-03-16 07:44:06 -07:00
Craig Topper 41759c3d92 [RISCV] Add RISCVISD::BR_CC similar to RISCVISD::SELECT_CC.
This allows me to introduce similar combines for branches as
we have recently added for SELECT_CC. Some of them are less
useful for standalone setccs and only help branch instructions.
By having a BR_CC node its easier to only affect branches.

I'm using CondCodeSDNode to make isel patterns easier to
write so we can refer to the codes by name. SELECT_CC uses a
constant instead.

I've translated the condition code just like SELECT_CC so
we need less patterns for the swapped conditions. This
includes special cases for X < 1 and X > -1 that get translated
to blez and bgez by using a 0 constant.

computeKnownBitsForTargetNode support for SELECT_CC is added
to allow MaskedValueIsZero to work for cases where the true
and false values of the SELECT_CC are setccs and the
result of the SELECT_CC is used by a BR_CC. This was needed
to avoid regressions in some of the overflow tests.

Reviewed By: luismarques

Differential Revision: https://reviews.llvm.org/D98159
2021-03-15 11:54:01 -07:00
Craig Topper 3dc5b533e0 [RISCV] Improve legalization of i32 UADDO/USUBO on RV64.
The default legalization uses zero extends that require pair of shifts
on RISCV. Instead we can take advantage of the fact that unsigned
compares work equally well on sign extended inputs. This allows
us to use addw/subw and sext.w.

Reviewed By: luismarques

Differential Revision: https://reviews.llvm.org/D98233
2021-03-15 09:30:23 -07:00
Fraser Cormack 0c5b789c73 [RISCV] Support fixed-length vectors in the calling convention
This patch adds fixed-length vector support to the calling convention
when RVV is used to lower fixed-length vectors. The scheme follows the
regular vector calling convention for the argument/return registers, but
uses scalable vector container types as the LocVTs, and converts to/from
the fixed-length vector value types as required.

Fixed-length vector types may be split when the combination of minimum
VLEN and the maximum allowable LMUL is not large enough to fully contain
the vector. In this case the behaviour differs between fixed-length
vectors passed as parameters and as return values:
1. For return values, vectors must be passed entirely via registers or
via the stack.
2. For parameters, unlike scalar values, split vectors continue to be
passed by value, and are split across multiple registers until there are
no remaining registers. Thus vector parameters may be found partly in
registers and partly on the stack.

As with scalable vectors, the first fixed-length mask vector is passed
via v0. Split mask fixed-length vectors are passed first via v0 and then
via the next available vector register: v8,v9,etc.

The handling of vector return values uses all available argument
registers v8-v23 which does not adhere to the calling convention we're
supposedly implementing, but since this issue affects both fixed-length
and scalable-vector values, it was left as-is.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D97954
2021-03-15 10:43:51 +00:00
Hsiangkai Wang a81dff1e58 [RISCV] Support inline asm for vector instructions.
Types of fractional LMUL and LMUL=1 are all using VR register class. When
using inline asm, it will use the first type in the register class as the
type for the register. It is not necessary the same as the value type. We
need to use INSERT_SUBVECTOR/EXTRACT_SUBVECToR/BITCAST to make it legal
to put the value in the corresponding register class.

Differential Revision: https://reviews.llvm.org/D97480
2021-03-15 11:02:18 +08:00
Craig Topper 51151828ac [RISCV] Teach normaliseSetCC to canonicalize X > -1 to X >= 0 and X < 1 to 0 >= X.
This allows the use of BGE with X0 instead of puting -1/1 in a
register.

Reviewed By: jrtc27

Differential Revision: https://reviews.llvm.org/D98542
2021-03-12 11:50:10 -08:00
Fraser Cormack 641f5700f9 [RISCV] Optimize INSERT_VECTOR_ELT sequences
This patch optimizes the codegen for INSERT_VECTOR_ELT in various ways.
Primarily, it removes the use of vslidedown during lowering, and the
vector element is inserted entirely using vslideup with a custom VL and
slide index.

Additionally, lowering of i64-element vectors on RV32 has been optimized
in several ways. When the 64-bit value to insert is the same as the
sign-extension of the lower 32-bits, the codegen can follow the regular
path. When this is not possible, a new sequence of two i32 vslide1up
instructions is used to get the vector element into a vector. This
sequence was suggested by @craig.topper. From there, the value is slid
into the final position for more consistent lowering across RV32 and
RV64.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D98250
2021-03-12 09:13:38 +00:00
Fraser Cormack 4d2d5855c7 [RISCV] Fix up stale VECREDUCE comments. NFC.
Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D98399
2021-03-12 08:49:46 +00:00
Craig Topper 1d26bbcf9b [RISCV] Return false from isShuffleMaskLegal except for splats.
We don't support any other shuffles currently.

This changes the bswap/bitreverse tests that check for this in
their expansion code. Previously we expanded a byte swapping
shuffle through memory. Now we're scalarizing and doing bit
operations on scalars to swap bytes.

In the future we can probably use vrgather.vx to do a byte swap
shuffle.
2021-03-11 20:02:49 -08:00
Craig Topper c82f442954 [RISCV] Support fixed vector copysign.
Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D98394
2021-03-11 09:57:24 -08:00
Craig Topper 0dff8a9627 [RISCV] Handle vmv.x.s intrinsic for i64 vectors on RV32.
Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D98372
2021-03-11 09:39:50 -08:00
Craig Topper 9c841cb8e8 [RISCV] Support extract_vector_elt for fixed and scalable masked registers.
This uses a really simple approach of converting to an i8 vector
and extracting. This is probably not the best approach especially
if you know the index is constant.

Other ideas:
-Store to stack temporary using vse1, load as scalar and shift.
-Sort of bitcast the vector to a vector of i8, slide down the
 appropriate 8 bit element, copy to scalar, shift down the
 correct bit within the 8 bits we extracted. Not exactly sure
 how to describe such a bitcast from i1 vector to i8 vector
 within the type system for elements less than 8.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D98310
2021-03-11 09:26:44 -08:00
Craig Topper 9106d04554 [RISCV][SelectionDAG] Introduce an ISD::SPLAT_VECTOR_PARTS node that can represent a splat of 2 i32 values into a nxvXi64 vector for riscv32.
On riscv32, i64 isn't a legal scalar type but we would like to
support scalable vectors of i64.

This patch introduces a new node that can represent a splat made
of multiple scalar values. I've used this new node to solve the current
crashes we experience when getConstant is used after type legalization.

For RISCV, we are now default expanding SPLAT_VECTOR to SPLAT_VECTOR_PARTS
when needed and then handling the SPLAT_VECTOR_PARTS later during
LegalizeOps. I've remove the special case I previously put in for
ABS for D97991 as the default expansion is now able to succesfully
use getConstant.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D98004
2021-03-10 09:46:18 -08:00
Craig Topper 0c73a506e8 [RISCV] Starting fixing issues that prevent us from testing vXi64 intrinsics on RV32.
Currently we crash in type legalization any time an intrinsic
uses a scalar i64 on RV32.

This patch adds support for type legalizing this to prevent
crashing. I don't promise that it uses the best possible codegen
just that it is functional.

This first version handles 3 cases. vmv.v.x intrinsic, vmv.s.x
intrinsic and intrinsics that take a scalar input, splat it and
then do some operation.

For vmv.v.x we'll either rely on hardware sign extension for
constants or we'll convert it to multiple splats and bit
manipulation.

For vmv.s.x we use a really unoptimal sequence inspired by what
we do for an INSERT_VECTOR_ELT.

For the third case we'll either try to use the .vi form for
constants or convert to a complicated splat and bitmanip and use
the .vv form of the operation.

I've renamed the ExtendOperand field to SplatOperand now use it
specifically for the third case. The first two cases are handled
by custom lowering specifically for those intrinsics.

I haven't updated all tests yet, but I tried to cover a subset
that includes single-width, widening, and narrowing.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D97895
2021-03-10 09:45:38 -08:00
Craig Topper 1e39118638 [RISCV] Manually split vector operands to VECREDUCE when handling vXi64 vectors on RV32.
The type legalizer will visit the result before the operands. To
avoid creating an illegal target specific node or falling back to
scalarization, we need to manually split vector operands.

This still doesn't handle the case of non-power of 2 operands
which need to be widened. I'm not sure the type legalizer is
ready for it. I think we would need to insert an
INSERT_SUBVECTOR with the power of 2 type we want, with an undef
first operand, and the non-power of 2 orignal operand as the vector
to insert. Then fill in the neutral elements into the elements the
padded elements. Alternatively we INSERT_SUBVECTOR into a neutral vector.
From there we carry on splitting if needed to get to a legal type
then do the target specific code.

The problem with this is the type legalizer doesn't know how to
widen an insert_subvector yet. We would need to add that including
the handling for a non-undef first vector.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D98292
2021-03-10 09:27:38 -08:00
Craig Topper 351844edf1 [RISCV] Add support for VECTOR_REVERSE for scalable vector types.
I've left mask registers to a future patch as we'll need
to convert them to full vectors, shuffle, and then truncate.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D97609
2021-03-09 10:03:45 -08:00
Craig Topper 77ac3166e5 [RISCV] Add support for fixed vector reductions.
I've included tests that require type legalization to split the
vector. The i64 version of these scalarizes on RV32 due to type
legalization visiting the result before the vector type. So we
have to abort our custom expansion to avoid creating target
specific nodes with an illegal type. Then type legalization ends
up scalarizing. We might be able to fix this by doing custom
splitting for large vectors in our handler to get down to a legal
type.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D98102
2021-03-09 09:39:59 -08:00
Craig Topper 1c7ad4dd88 [RISCV] Don't modify the SEW immediate on the V extension pseudo instructions after inserting VSETVLI.
Previously we set the value to -1, but the SEW information could
be useful for scheduling.

Reviewed By: frasercrmck, rogfer01

Differential Revision: https://reviews.llvm.org/D98062
2021-03-09 09:02:19 -08:00
Craig Topper 72ecf2f43f [RISCV] Optimize fixed vector ABS. Fix crash on scalable vector ABS for SEW=64 with RV32.
The default fixed vector expansion uses sra+xor+add since it can't
see that smax is legal due to our custom handling. So we select
smax(X, sub(0, X)) manually.

Scalable vectors are able to use the smax expansion automatically
for most cases. It crashes in one case because getConstant can't build a
SPLAT_VECTOR for nxvXi64 when i64 scalars aren't legal. So
we manually emit a SPLAT_VECTOR_I64 for that case.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D97991
2021-03-09 08:51:03 -08:00
Craig Topper 7a64cc4a76 [RISCV] Make use of DAG.getNeutralElement in lowerVECREDUCE to avoid repeating the same list of constants. NFC
Reviewed By: frasercrmck, khchen

Differential Revision: https://reviews.llvm.org/D98091
2021-03-08 09:11:10 -08:00
Fraser Cormack 18173c57bd [RISCV] Add new entry points to getContainerForFixedLengthVector
While working on adding fixed-length vectors to the calling convention,
it was necessary to be able to query for a fixed-length vector container
type without access to an instance of SelectionDAG.

This patch modifies the "main" getContainerForFixedLengthVector function
to use an instance of TargetLowering rather than SelectionDAG, and
preserves the SelectionDAG overload as a wrapper.

An additional non-static version of the function was also added to
simplify the common case in RISCVTargetLowering.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D97925
2021-03-08 09:26:19 +00:00
Craig Topper c91b3c9e63 [RISCV] Fold (select_cc (setlt X, Y), 0, ne, trueV, falseV) -> (select_cc X, Y, lt, trueV, falseV)
A setcc can be created during LegalizeDAG after select_cc has been
created. This combine will enable us to fold these late setccs.

Reviewed By: luismarques

Differential Revision: https://reviews.llvm.org/D98132
2021-03-07 09:44:56 -08:00
Craig Topper fdbd5d3206 [RISCV] Fold (select_cc (xor X, Y), 0, eq/ne, trueV, falseV) -> (select_cc X, Y, eq/ne, trueV, falseV)
This pattern occurs when lowering for overflow operations
introduce an xor after select_cc has already been formed.

I had to rework another combine that looked for select_cc of an xor
with 1. That xor will now get combined away so we just need to
look for the RHS of the select_cc being 1.

Reviewed By: luismarques

Differential Revision: https://reviews.llvm.org/D98130
2021-03-07 09:29:55 -08:00
Fraser Cormack 8e7ceffd0b [RISCV] Fix crash when inserting large fixed-length subvectors
This patch addresses a compiler crash resulting from passing a
fixed-length type to one that expects scalable vector types. An
assertion was added to prevent this regressing in the future.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D97868
2021-03-04 09:27:16 +00:00
Fraser Cormack d8e1d2ebf4 [RISCV] Preserve fixed-length VL on insert_vector_elt in more cases
This patch fixes up one case where the fixed-length-vector VL was
dropped (falling back to VLMAX) when inserting vector elements, as the
code would lower via ISD::INSERT_VECTOR_ELT (at index 0) which loses the
fixed-length vector information.

To this end, a custom node, VMV_S_XF_VL, was introduced to carry the VL
operand through to the final instruction. This node wraps the RVV
vmv.s.x and vmv.s.f instructions, which were being selected by
insert_vector_elt anyway.

There should be no observable difference in scalable-vector codegen.

There is still one outstanding drop from fixed-length VL to VLMAX, when
an i64 element is inserted into a vector on RV32; the splat (which is
custom legalized) has no notion of the original fixed-length vector
type.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D97842
2021-03-04 09:21:10 +00:00
Fraser Cormack c1695ddf7d [RISCV] Support fixed-length INSERT_VECTOR_ELT
This patch enables support for lowering INSERT_VECTOR_ELT on
fixed-length vector types. The strategy follows that for scalable vector
types.

This patch also includes a quick fix to prevent the compiler infinitely
looping between lowering BUILD_VECTOR as VECTOR_SHUFFLE and back again.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D97698
2021-03-02 16:48:38 +00:00
Fraser Cormack de2b70010a [RISCV] Lower CONCAT_VECTORS to INSERT_SUBVECTOR nodes
The default expansion of CONCAT_VECTORS goes through the stack. This
patch avoids that penalty by custom-lowering CONCAT_VECTORS to a series
of INSERT_SUBVECTOR nodes. Futher optimizations are possible, but this
is a good start.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D97692
2021-03-02 11:13:59 +00:00
Fraser Cormack 3fea9226ee [RISCV] Support INSERT_SUBVECTOR on vector masks
Like with EXTRACT_SUBVECTOR, INSERT_SUBVECTOR poses a problem
for vector masks as RVV isn't able to slide mask types around. We choose
instead to bitcast to equivalently-sized i8 types where we can, else we
zero-extend, perform the operation, and truncate back down.

One test was left disabled due to a crash in the legalizer.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D97559
2021-03-01 12:04:11 +00:00
Fraser Cormack e80ca3af82 [RISCV] Fix INSERT/EXTRACT_SUBVECTOR on fractional LMUL types
This patch fixes a bug where the lowering for INSERT_SUBVECTOR and
EXTRACT_SUBVECTOR would insist on first extracting a register-aligned
LMUL1 vector type before perfoming the slide up/down. This was even if
the vector was a fractional LMUL type, in which case the aligned
EXTRACT_SUBVECTOR was invalid.

This issue only occurred for scalable vector types, but a variety of
tests for both scalable and fixed-length vectors have been added to
ensure this does not regress in the future.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D97556
2021-03-01 11:51:05 +00:00
Fraser Cormack 4ea734e6ec [RISCV] Unify scalable- and fixed-vector INSERT_SUBVECTOR lowering
This patch unifies the two disparate paths for lowering INSERT_SUBVECTOR
operations under one roof. Consequently, with this patch it is possible to
support any fixed-length subvector insertion, not just "cast-like" ones.

As before, support for the insertion of mask vectors will come in a
separate patch.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D97543
2021-03-01 11:38:47 +00:00
Fraser Cormack bd4d421688 [RISCV] Support EXTRACT_SUBVECTOR on vector masks
This patch adds support for extracting subvectors from vector masks.
This can be either extracting a scalable vector from another, or a fixed-length
vector from a fixed-length or scalable vector.

Since RVV lacks a way to slide vector masks down on an element-wise
basis and we don't know the true length of the vector registers, in many
cases we must resort to using equivalently-sized i8 vectors to perform
the operation. When this is not possible we fall back and extend to a
suitable i8 vector.

Support was also added for fixed-length truncation to mask types.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D97475
2021-03-01 11:20:09 +00:00
Fraser Cormack 37014db013 [RISCV] Use existing method for the LMUL1 type. NFCI.
Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D97467
2021-02-26 09:44:05 +00:00
Craig Topper d7fca3f0bf [RISCV] Support fixed vector extract_element for FP types. 2021-02-25 16:30:28 -08:00
Fraser Cormack 02f435db0b [RISCV] Support fixed-length vector i2fp/fp2i conversions
This patch extends the support for scalable-vector int->fp and fp->int
conversions by additionally handling fixed-length vectors.

The existing scalable-vector lowering re-expresses widening/narrowing by
x4+ conversions as standard nodes. The fixed-length vector support slots
in at "the end" of this process by lowering the now equally-sized and
widening/narrowing by x2 nodes to our custom VL versions.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D97374
2021-02-25 13:47:58 +00:00
Fraser Cormack 9620ce90d7 [RISCV] Support fixed-length vector FP_ROUND & FP_EXTEND
This patch extends the support for vector FP_ROUND and FP_EXTEND by
including support for fixed-length vector types. Since fixed-length
vectors use "VL" nodes and scalable vectors can use the standard nodes,
there is slightly more to do in the fixed-length case. A helper function
was introduced to try and reduce the divergent paths. It is expected
that this function will similarly come in useful for lowering the
int-to-fp and fp-to-int operations for fixed-length vectors.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D97301
2021-02-25 12:16:06 +00:00
Fraser Cormack 84413e1947 [RISCV] Support fixed-length vector truncates
This patch extends support for our custom-lowering of scalable-vector
truncates to include those of fixed-length vectors. It does this by
co-opting the custom RISCVISD::TRUNCATE_VECTOR node and adding mask and
VL operands. This avoids unnecessary duplication of patterns and
inflation of the ISel table.

Some truncates go through CONCAT_VECTORS which currently isn't
efficiently handled, as it goes through the stack. This can be improved
upon in the future.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D97202
2021-02-25 12:11:34 +00:00
Fraser Cormack 3bc5ed3875 [RISCV] Support fixed-length vector sign/zero extension
This patch adds support for the custom lowering sign- and zero-extension
of fixed-length vector types. It does so through custom nodes. Since the
source and destination types are (necessarily) of different sizes, it is
possible that the source type is legal whilst the larger destination
type isn't. In this case the legalization makes heavy use of
EXTRACT_SUBVECTOR.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D97194
2021-02-25 12:05:17 +00:00
Fraser Cormack 821f8bb29a [RISCV] Unify scalable- and fixed-vector EXTRACT_SUBVECTOR lowering
This patch unifies the two disparate paths for lowering
EXTRACT_SUBVECTOR operations under one roof. Consequently, with this
patch it is possible to support any fixed-length subvector extraction,
not just "cast-like" ones.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D97192
2021-02-25 11:46:57 +00:00