Commit Graph

909 Commits

Author SHA1 Message Date
Kazu Hirata 435a5a3652 [llvm] Fix bugprone argument comments (NFC)
Identified with bugprone-argument-comment.
2022-01-08 11:56:38 -08:00
Craig Topper 75117fb340 [RISCV] Don't advertise i32->i64 zextload as free for RV64.
The zextload hook is only used to determine whether to insert a
zero_extend or any_extend for narrow types leaving a basic block.
Returning true from this hook tends to cause any load whose output
leaves the basic block to become an LWU instead of an LW.

Since we tend to prefer sexts for i32 compares on RV64, this can
cause extra sext.w instructions to be created in other basic blocks.

If we use LW instead of LWU this gives the MIR pass from D116397
a better chance of removing them.

Another option might be to teach getPreferredExtendForValue in
FunctionLoweringInfo.cpp about our preference for sign_extend of
i32 compares. That would cause SIGN_EXTEND to be chosen for any
value used by a compare instead of using the isZExtFree heuristic.
That will require code to convert from the llvm::Type* to EVT/MVT
as well as querying the type legalization actions to get the
promoted type in order to call TargetLowering::isSExtCheaperThanZExt.
That seemed like many extra steps when no other target wants it.
Though it would avoid us needing to lean on the MIR pass in some cases.

Reviewed By: asb

Differential Revision: https://reviews.llvm.org/D116567
2022-01-06 08:13:42 -08:00
Craig Topper 808c662665 [RISCV] Change RISCVISD::FCVT*RTZ opcodes to take rounding mode as an operand.
Pre-work for a future change that will use these opcodes with other
rounding modes.

Differential Revision: https://reviews.llvm.org/D116724
2022-01-06 08:12:12 -08:00
Victor Perez 5527139302 [RISCV][VP] Add RVV codegen for [nX]vXi1 vp.select
Expand [nX]vXi1 vp.select the same way as [nX]vXi1 vselect.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D115546
2022-01-02 23:12:32 -08:00
Craig Topper 15787ccd45 [RISCV] Add support for STRICT_LRINT/LLRINT/LROUND/LLROUND. Tests for other strict intrinsics.
This patch adds isel support for STRICT_LRINT/LLRINT/LROUND/LLROUND.

It also adds test cases for f32 and f64 constrained intrinsics that
correspond to the intrinsics in float-intrinsics.ll and
double-intrinsics.ll. Support for promoting the integer argument of
STRICT_FPOWI was added.

I've skipped adding tests for f16 intrinsics, since we don't have libcalls
for them and we have inconsistent support for promoting them in LegalizeDAG.
This will need to be examined more closely.

Reviewed By: asb

Differential Revision: https://reviews.llvm.org/D116323
2021-12-30 11:54:32 -08:00
Hsiangkai Wang a1c7ddf926 [RISCV] Support passing scalable vectur values through the stack.
After consuming all vector registers, the scalable vector values will be
passed indirectly. The pointer values will be saved in general
registers. If all general registers are used up, we will report an error to
notify users the compiler does not support passing scalable vector
values through the stack. In this patch, we remove the restriction. After
all general registers are used up, we use the stack to save the
pointers which point to the indirect passed scalable vector values.

Differential Revision: https://reviews.llvm.org/D116310
2021-12-28 09:26:36 +08:00
Kazu Hirata e7774f499b Use static_assert instead of assert (NFC)
Identified with misc-static-assert.
2021-12-26 14:26:44 -08:00
Jim Lin 02478a26f2 [RISCV] Use DAG variable directly instead of DCI.DAG
Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D116087
2021-12-24 13:06:55 +08:00
Craig Topper 0a35211b34 [RISCV] Don't allow vector types to be used with inline asm 'r' constraint
The 'r' constraint uses the GPR class. There is generic support
for bitcasting and extending/truncating non-integer VTs to the
required integer VT. This doesn't work for scalable vectors and
instead crashes.

To prevent this, explicitly reject vectors. Fixed vectors might
work without crashing, but it doesn't seem worthwhile to allow.

While there remove an unnecessary level of indentation in the
"vr" and "vm" constraint handling.

Differential Revision: https://reviews.llvm.org/D115810
2021-12-23 20:32:36 -06:00
Victor Perez 10b3675aa9 [RISCV][VP] Lower mask vector VP AND/OR/XOR to RVV instructions
For fixed and scalable vectors, each intrinsic x is lowered to vmx.mm,
dropping the mask, which is safe to do as masked-off elements are
undef anyway.

Differential Revision: https://reviews.llvm.org/D115339
2021-12-23 15:02:32 -06:00
Craig Topper 7704c503ec [RISCV] Use positive 0.0 for the neutral element in fadd reductions if nsz is present.
-0.0 requires a constant pool. +0.0 can be made with vmv.v.x x0.

Not doing this in getNeutralElement for fear of changing other targets.

Differential Revision: https://reviews.llvm.org/D115978
2021-12-23 10:38:00 -06:00
Craig Topper b7b260e19a [RISCV] Support strict FP conversion operations.
This adds support for strict conversions between fp types and between
integer and fp.

NOTE: RISCV has static rounding mode instructions, but the constrainted
intrinsic metadata is not used to select static rounding modes. Dynamic
rounding mode is always used.

Differential Revision: https://reviews.llvm.org/D115997
2021-12-23 09:40:58 -06:00
jacquesguan 28a3e7dea2 [RISCV] Override hasAndNotCompare to use more andn when have Zbb extension.
Enable transform (X & Y) == Y ---> (~X & Y) == 0 and (X & Y) != Y ---> (~X & Y) != 0 when have Zbb extension to use more andn instruction.

Differential Revision: https://reviews.llvm.org/D115922
2021-12-23 10:42:20 +08:00
Craig Topper 66bbefeb13 [RISCV] Revert Zfhmin related changes that aren't tested and depend on f16 being a legal type.
Our Zfhmin support is only MC layer, but these are CodeGen layer
interfaces. If f16 isn't a Legal type for CodeGen with Zfhmin, then
these interfaces should keep their non-Zfh behavior.

Reviewed By: luismarques

Differential Revision: https://reviews.llvm.org/D115822
2021-12-16 08:55:28 -08:00
Craig Topper 3926893439 [RISCV] Add isel support for scalar STRICT_FADD/FSUB/FMUL/FDIV/FSQRT.
Test that STRICT_FMINNUM/FMAXNUM are lowered to libcalls for f32/f64.
The RISC-V instructions don't match the behavior of fmin/fmax libcalls
with respect to SNaN.

Promoting FMINNUM/FMAXNUM for f16 needs more work outside of the
RISC-V backend.

Reviewed By: asb, arcbbb

Differential Revision: https://reviews.llvm.org/D115680
2021-12-14 10:50:55 -08:00
Craig Topper 3f1c403a2b [RISCV] Use AdjustInstrPostInstrSelection to insert a FRM dependency for scalar FP instructions with dynamic rounding mode.
In order to support constrained FP intrinsics we need to model FRM
dependency. Whether or not a instruction uses FRM is based on a 3
bit field in the instruction. Because of this we can't add
'Uses = [FRM]' to the tablegen descriptions.

This patch examines the immediate after isel and adds an implicit
use of FRM. This idea came from Roger Ferrer Ibanez.

Other ideas:
We could be overly conservative and just pretend all instructions with
frm field read the FRM register. Or we could have pseudoinstructions
for CodeGen with rounding mode.

Reviewed By: asb, frasercrmck, arcbbb

Differential Revision: https://reviews.llvm.org/D115555
2021-12-14 10:17:57 -08:00
Craig Topper b18b2a01ef [RISCV] Don't use VLMAX for start value splat in reduction lowering.
The reduction instructions only reads the first element. The
execution time for a splat may take longer with a larger VL.
We should use the smallest VL we can.

Reviewed By: frasercrmck, HsiangKai

Differential Revision: https://reviews.llvm.org/D115536
2021-12-13 09:06:42 -08:00
Kito Cheng 39c861719b [RISCV] Fix vm operand constraint to fit GCC's behavior
- `vm` constraint is used for masking operand, which always v0.

- Update testcase, only masking operand should use `vm`, vector mask operations
  should just use `vr` for any vector register.

 - Revise the description of `vm` constraint.

- This patch also fix issue on RISCVRegisterInfo.td and RISCVISelLowering.cpp.

  RISCVRegisterInfo.td:
  - The first VT in the list must be the largest total size since the
    SelectionDAGBuilder uses the first register in the list as the canonical
    type for the register.

  RISCVISelLowering.cpp:
  - Fix RISCVTargetLowering::splitValueIntoRegisterParts and
    RISCVTargetLowering::joinRegisterPartsIntoValue for handling vectors
    with different total size, that will happened on fractional LMUL since
    fractional LMUL is always occupy one vector register.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D112599
2021-12-09 14:46:49 +08:00
Craig Topper acdbd34cfb [RISCV] Loosen some restrictions on lowering constant BUILD_VECTORs using vid.v.
The immediate size check on StepNumerator did not take into account
that vmul.vi does not exist. It also did not account for power of 2
constants that can be done with vshl.vi.

This patch fixes this by moving the conversion from mul to shift
further up. Then we can consider the immediates separately for MUL
vs SHL. For MUL I've allowed simm12 which requires a single addi
before a vmul.vx. For SHL I've allowed any uimm5 which works with
vshl.vi. We could relax these further in the future. This is a
starting point that allows us to emit the same number of instructions
we were already using for smaller numerators.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D115081
2021-12-06 09:34:40 -08:00
Victor Perez 9eb7322748 [RISCV][VP] Add RVV codegen for vp.select
Lower vp.select instrinsic to VSELECT_VL.

Reviewed By: rogfer01

Differential Revision: https://reviews.llvm.org/D114629
2021-12-03 11:02:20 +00:00
Craig Topper 2f6beb7b0e [RISCV] Add inline expansion for vector ftrunc/fceil/ffloor.
This prevents scalarization of fixed vector operations or crashes
on scalable vectors.

We don't have direct support for these operations. To emulate
ftrunc we can convert to the same sized integer and back to fp using
round to zero. We don't need to do a convert if the value is large
enough to have no fractional bits or is a nan.

The ceil and floor lowering would be better if we changed FRM, but
we don't model FRM correctly yet. So I've used the trunc lowering
with a conditional add or subtract with 1.0 if the truncate rounded
in the wrong direction.

There are also missed opportunities to use masked instructions.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D113543
2021-12-01 11:25:28 -08:00
Craig Topper d8f9eaad89 [RISCV] Teach RISCVTargetLowering::shouldSinkOperands to handle udiv/sdiv/urem/srem.
The V extension supports .vx instructions for integer division and
remainder so we should sink splats for that operand.
2021-11-30 18:47:51 -08:00
David Green 9e8a71caf0 [DAG] Create fptosi.sat from clamped fptosi
This adds a fold in DAGCombine to create fptosi_sat from sequences for
smin(smax(fptosi(x))) nodes, where the min/max saturate the output of
the fp convert to a specific bitwidth (say INT_MIN and INT_MAX). Because
it is dealing with smin(/smax) in DAG they may currently be ISD::SMIN,
ISD::SETCC/ISD::SELECT, ISD::VSELECT or ISD::SELECT_CC nodes which need
to be handled similarly.

A shouldConvertFpToSat method was added to control when converting may
be profitable. The original fptosi will have a less strict semantics
than the fptosisat, with less values that need to produce defined
behaviour.

This especially helps on ARM/AArch64 where the vcvt instructions
naturally saturate the result.

Differential Revision: https://reviews.llvm.org/D111976
2021-11-30 15:29:14 +00:00
Hans Wennborg a87782c34d Revert "[DAG] Create fptosi.sat from clamped fptosi"
It causes builds to fail with this assert:

llvm/include/llvm/ADT/APInt.h:990:
bool llvm::APInt::operator==(const llvm::APInt &) const:
Assertion `BitWidth == RHS.BitWidth && "Comparison requires equal bit widths"' failed.

See comment on the code review.

> This adds a fold in DAGCombine to create fptosi_sat from sequences for
> smin(smax(fptosi(x))) nodes, where the min/max saturate the output of
> the fp convert to a specific bitwidth (say INT_MIN and INT_MAX). Because
> it is dealing with smin(/smax) in DAG they may currently be ISD::SMIN,
> ISD::SETCC/ISD::SELECT, ISD::VSELECT or ISD::SELECT_CC nodes which need
> to be handled similarly.
>
> A shouldConvertFpToSat method was added to control when converting may
> be profitable. The original fptosi will have a less strict semantics
> than the fptosisat, with less values that need to produce defined
> behaviour.
>
> This especially helps on ARM/AArch64 where the vcvt instructions
> naturally saturate the result.
>
> Differential Revision: https://reviews.llvm.org/D111976

This reverts commit 52ff3b0093.
2021-11-30 15:36:56 +01:00
David Green 52ff3b0093 [DAG] Create fptosi.sat from clamped fptosi
This adds a fold in DAGCombine to create fptosi_sat from sequences for
smin(smax(fptosi(x))) nodes, where the min/max saturate the output of
the fp convert to a specific bitwidth (say INT_MIN and INT_MAX). Because
it is dealing with smin(/smax) in DAG they may currently be ISD::SMIN,
ISD::SETCC/ISD::SELECT, ISD::VSELECT or ISD::SELECT_CC nodes which need
to be handled similarly.

A shouldConvertFpToSat method was added to control when converting may
be profitable. The original fptosi will have a less strict semantics
than the fptosisat, with less values that need to produce defined
behaviour.

This especially helps on ARM/AArch64 where the vcvt instructions
naturally saturate the result.

Differential Revision: https://reviews.llvm.org/D111976
2021-11-30 11:05:32 +00:00
Craig Topper b121d23a9c [RISCV] Promote f16 log/pow/exp/sin/cos/etc. to f32 libcalls.
Prevents crashes or cannot select errors.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D113822
2021-11-29 18:49:11 -08:00
Philipp Tomsich af57a71d18 [RISCV] Don't call setHasMultipleConditionRegisters(), so icmp is sunk
On RISC-V, icmp is not sunk (as the following snippet shows) which
generates the following suboptimal branch pattern:
```
  core_list_find:
	lh	a2, 2(a1)
	seqz	a3, a0         <<
	bltz	a2, .LBB0_5
	bnez	a3, .LBB0_9    << should sink the seqz
        [...]
	j	.LBB0_9
  .LBB0_5:
	bnez	a3, .LBB0_9    << should sink the seqz
	lh	a1, 0(a1)
        [...]
```
due to an icmp not being sunk.

The blocks after `codegenprepare` look as follows:
```
  define dso_local %struct.list_head_s* @core_list_find(%struct.list_head_s* readonly %list, %struct.list_data_s* nocapture readonly %info) local_unnamed_addr #0 {
  entry:
    %idx = getelementptr inbounds %struct.list_data_s, %struct.list_data_s* %info, i64 0, i32 1
    %0 = load i16, i16* %idx, align 2, !tbaa !4
    %cmp = icmp sgt i16 %0, -1
    %tobool.not37 = icmp eq %struct.list_head_s* %list, null
    br i1 %cmp, label %while.cond.preheader, label %while.cond9.preheader

  while.cond9.preheader:                            ; preds = %entry
    br i1 %tobool.not37, label %return, label %land.rhs11.lr.ph
```
where the `%tobool.not37` is the result of the icmp that is not sunk.
Note that it is computed in the basic-block up until what becomes the
`bltz` instruction and the `bnez` is a basic-block of its own.

Compare this to what happens on AArch64 (where the icmp is correctly sunk):
```
  define dso_local %struct.list_head_s* @core_list_find(%struct.list_head_s* readonly %list, %struct.list_data_s* nocapture readonly %info) local_unnamed_addr #0 {
  entry:
    %idx = getelementptr inbounds %struct.list_data_s, %struct.list_data_s* %info, i64 0, i32 1
    %0 = load i16, i16* %idx, align 2, !tbaa !6
    %cmp = icmp sgt i16 %0, -1
    br i1 %cmp, label %while.cond.preheader, label %while.cond9.preheader

  while.cond9.preheader:                            ; preds = %entry
    %1 = icmp eq %struct.list_head_s* %list, null
    br i1 %1, label %return, label %land.rhs11.lr.ph
```

This is caused by sinkCmpExpression() being skipped, if multiple
condition registers are supported.

Given that the check for multiple condition registers affect only
sinkCmpExpression() and shouldNormalizeToSelectSequence(), this change
adjusts the RISC-V target as follows:
 * we no longer signal multiple condition registers (thus changing
   the behaviour of sinkCmpExpression() back to sinking the icmp)
 * we override shouldNormalizeToSelectSequence() to let always select
   the preferred normalisation strategy for our backend

With both changes, the test results remain unchanged.  Note that without
the target-specific override to shouldNormalizeToSelectSequence(), there
is worse code (more branches) generated for select-and.ll and select-or.ll.

The original test case changes as expected:
```
  core_list_find:
	lh	a2, 2(a1)
	bltz	a2, .LBB0_5
	beqz	a0, .LBB0_9    <<
        [...]
	j	.LBB0_9
.LBB0_5:
	beqz	a0, .LBB0_9    <<
	lh	a1, 0(a1)
        [...]
```

Differential Revision: https://reviews.llvm.org/D98932
2021-11-19 08:32:59 -08:00
Zarko Todorovski 5b8bbbecfa [NFC][llvm] Inclusive language: reword and remove uses of sanity in llvm/lib/Target
Reworded removed code comments that contain `sanity check` and `sanity
test`.
2021-11-17 21:59:00 -05:00
Craig Topper 0274be28d7 [RISCV] Lower vector CTLZ_ZERO_UNDEF/CTTZ_ZERO_UNDEF by converting to FP and extracting the exponent.
If we have a large enough floating point type that can exactly
represent the integer value, we can convert the value to FP and
use the exponent to calculate the leading/trailing zeros.

The exponent will contain log2 of the value plus the exponent bias.
We can then remove the bias and convert from log2 to leading/trailing
zeros.

This doesn't work for zero since the exponent of zero is zero so we
can only do this for CTLZ_ZERO_UNDEF/CTTZ_ZERO_UNDEF. If we need
a value for zero we can use a vmseq and a vmerge to handle it.

We need to be careful to make sure the floating point type is legal.
If it isn't we'll continue using the integer expansion. We could split the vector
and concatenate the results but that needs some additional work and evaluation.

Differential Revision: https://reviews.llvm.org/D111904
2021-11-17 10:29:41 -08:00
Craig Topper 391b0ba603 [RISCV] Override TargetLowering::hasAndNot for Zbb.
Differential Revision: https://reviews.llvm.org/D113937
2021-11-15 18:44:07 -08:00
Craig Topper ee7a006ce4 [RISCV] Promote f16 ceil/floor/round/roundeven/nearbyint/rint/trunc intrinsics to f32 libcalls.
Previously these would crash. I don't think these can be generated
directly from C. Not sure if any optimizations can introduce them.

Reviewed By: asb

Differential Revision: https://reviews.llvm.org/D113527
2021-11-11 08:28:41 -08:00
Craig Topper 4183522e80 [RISCV] Promote f16 frem with Zfh.
Add riscv64 coverage for f32 and f64 frem.

Reviewed By: luismarques

Differential Revision: https://reviews.llvm.org/D113531
2021-11-10 17:35:07 -08:00
Craig Topper 9ee5cec688 [RISCV] Prevent bad legalizer behavior when bitcasting fixed vectors to i64 on RV32 with Zve32.
Similar to D113219, we need to make sure we don't create a vXi64
vector when it isn't legal. This fixes an error found by an
expensive checks build.
2021-11-10 11:58:49 -08:00
Craig Topper 57bc7b1089 [RISCV] Prevent crashes when bitcasting between fixed vectors and scalars.
Not all scalar element types are allowed in vectors so we may not
be able to bitcast to a 1 element vector to use insert/extract.

This will become a bigger issue when the Zve extensions are commited.
For now, I'm using the ELEN limit to limit the element types.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D113219
2021-11-10 09:21:52 -08:00
Craig Topper 376233113e [RISCV] Use TargetConstant for CSR number for READ_CSR/WRITE_CSR.
This is consistent with what we do for other operands that are required
to be constants.

I don't think this results in any real changes. The pattern match
code for isel treats ConstantSDNode and TargetConstantSDNode the same.
2021-11-08 15:10:24 -08:00
Craig Topper 304edbb553 [RISCV] SMUL_LOHI/UMUL_LOHI should expand for RVV.
These and MULHS/MULHU both default to Legal. Targets need to set
the ones they don't support to Expand.

I think MULHS/MULHU likely has priority in most places so this
change probably isn't directly testable. I found it while looking
at disabling MULHS/MULHU for nxvXi64 as required for Zve64x.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D113325
2021-11-08 09:38:36 -08:00
Ben Shi e32cf690df [RISCV] Optimize (add (mul r, c0), c1)
Optimize (add (mul x, c0), c1) ->
         (add (mul (add x, c1/c0+1), c0), c1%c0-c0),
if c1/c0+1 and c1%c0-c0 are simm12, while c1 is not.

Optimize (add (mul x, c0), c1) ->
         (add (mul (add x, c1/c0-1), c0), c1%c0+c0),
if c1/c0-1 and c1%c0+c0 are simm12, while c1 is not.

Reviewed By: craig.topper, asb

Differential Revision: https://reviews.llvm.org/D111141
2021-11-08 02:58:25 +00:00
Shao-Ce SUN 5c3d7184b4 [RISCV] Support Zfhmin extension
According to RISC-V Unprivileged ISA 15.6.

Reviewed By: asb

Differential Revision: https://reviews.llvm.org/D111866
2021-11-06 01:41:02 +08:00
Zakk Chen 0649dfebba [RISCV] Rename some assembler mnemonic and intrinsic functions for RVV 1.0.
Rename vpopc/vmandnot/vmornot to vcpop/vmandn/vmorn assembler mnemonic.

Reviewed By: frasercrmck, jrtc27, craig.topper

Differential Revision: https://reviews.llvm.org/D111062
2021-11-04 10:08:01 -07:00
Fraser Cormack d065b03801 [RISCV] Optimize vp.load with an all-ones mask
Similar to D110206, this patch optimizes unmasked vp.load intrinsics to
avoid the need of a vmset instruction to set the mask. It does so by
selecting a riscv_vle intrinsic rather than a riscv_vle_mask intrinsic.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D113022
2021-11-02 17:23:39 +00:00
Craig Topper ada5458521 [RISCV] Expand scalable vector bswap. Fix crash for bitreverse.
Fix LegalizeVectorOps to not try shuffle or unrolling expansions for
scalable vectors.

Differential Revision: https://reviews.llvm.org/D112236
2021-10-31 10:01:27 -07:00
Craig Topper 1387483e72 [RISCV] Replace most uses of RISCVSubtarget::hasStdExtV. NFCI
Add new hasVInstructions() which is currently equivalent.

Replace vector uses of hasStdExtZfh/F/D with new vector specific
versions. The vector spec no longer requires that the vectors implement the
same types as scalar. It only requires that the scalar type is
the maximum size the vectors can support. This is currently
implemented using the scalar rule we were using before.

Add new hasVInstructionsI64() begin using to qualify code that
requires i64 vector elements.

This is all NFC for now, but we can start using this to better
implement D112408 which introduces the Zve extensions.

Reviewed By: frasercrmck, eopXD

Differential Revision: https://reviews.llvm.org/D112496
2021-10-27 19:33:48 -07:00
Craig Topper 2783a5cfaf [RISCV] Add ICmp and FCmp to shouldSinkOperands. 2021-10-26 22:23:54 -07:00
Craig Topper d55be79d75 [RISCV] Expand scalable vector CTTZ/CTLZ/CTPOP.
Differential Revision: https://reviews.llvm.org/D112233
2021-10-21 10:50:04 -07:00
Craig Topper c4803bd416 [RISCV] Handle vector of pointer in getTgtMemIntrinsic for strided load/store.
getScalarSizeInBits() doesn't work if the scalar type is a pointer.
For that we need to go through DataLayout.
2021-10-07 10:11:56 -07:00
Craig Topper a2a07e8db3 [RISCV] Fold store of vmv.x.s to a vse with VL=1.
This can avoid a loss of decoupling with the scalar unit on cores
with decoupled scalar and vector units.

We should support FP too, but those use extract_element and not a
custom ISD node so it is a little different. I also left a FIXME
in the test for i64 extract and store on RV32.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D109482
2021-09-27 09:54:46 -07:00
Craig Topper 933182e948 [RISCV] Improve support for forming widening multiplies when one input is a scalar splat.
If one input of a fixed vector multiply is a sign/zero extend and
the other operand is a splat of a scalar, we can use a widening
multiply if the scalar value has sufficient sign/zero bits.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D110028
2021-09-27 09:37:07 -07:00
Fraser Cormack d48f6df1f8 [RISCV] Create the correct mask type when lowering EXTRACT_VECTOR_ELT
This particular case was creating a `VMSET_VL` using the old
fixed-length type in order to pass a mask to other custom nodes
operating on the scalable container type. This kind of thing wasn't
caught for us; I only noticed when experimenting with odd-length
vectors, where it was trying to generate an invalid `v3i1` MVT.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D110420
2021-09-27 09:43:40 +01:00
Hsiangkai Wang 7d39a8a921 [RISCV] (1/2) Add the tail policy argument to builtins/intrinsics.
Add the tail policy argument to LLVM IR intrinsics. There are two policies for tail elements. Tail agnostic means users do not care about the values in the tail elements and tail undisturbed means the values in the tail elements need to be kept after the operation. In order to let users control the tail policy, we add an additional argument at the end of the argument list.

For unmasked operations, we have no maskedoff and the tail policy is always tail agnostic. If users want to keep tail elements under unmasked operations, they could use all one mask in the masked operations to do it. So, we only add the additional argument for masked operations for most cases. There are exceptions listed below.

In this patch, we do not handle the following cases to reduce the complexity of the patch. There could be two separate patches for them.

* Use dest argument to control tail policy
vmerge.vvm/vmerge.vxm/vmerge.vim (add _t builtins with additional dest argument)
vfmerge.vfm (add _t builtins with additional dest argument)
vmv.v.v (add _t builtins with additional dest argument)
vmv.v.x (add _t builtins with additional dest argument)
vmv.v.i (add _t builtins with additional dest argument)
vfmv.v.f (add _t builtins with additional dest argument)
vadc.vvm/vadc.vxm/vadc.vim (add _t builtins with additional dest argument)
vsbc.vvm/vsbc.vxm (add _t builtins with additional dest argument)

* Always has tail argument for masked/unmasked intrinsics
Vector Single-Width Integer Multiply-Add Instructions (add _t and _mt builtins)
Vector Widening Integer Multiply-Add Instructions (add _t and _mt builtins)
Vector Single-Width Floating-Point Fused Multiply-Add Instructions (add _t and _mt builtins)
Vector Widening Floating-Point Fused Multiply-Add Instructions (add _t and _mt builtins)
Vector Reduction Operations (add _t and _mt builtins)
Vector Slideup Instructions (add _t and _mt builtins)
Vector Slidedown Instructions (add _t and _mt builtins)

Discussion: https://github.com/riscv/rvv-intrinsic-doc/pull/101

Differential Revision: https://reviews.llvm.org/D105092
2021-09-24 17:09:50 +08:00
Craig Topper 40b230f685 [RISCV] Limit transformAddImmMulImm to prevent an infinite loop.
This fixes an issue reported in D108607.
2021-09-23 15:53:11 -07:00
Fraser Cormack e7c879a69d [RISCV][VP] Add support for VP_REDUCE_* operations
This patch adds codegen support for lowering the vector-predicated
reduction intrinsics to RVV instructions. The process is similar to that
of the other reduction intrinsics, save for the fact that every VP
reduction has a start value. We reuse the existing custom "VL" nodes,
adding extra patterns where required to handle non-true masks.

To support these nodes, the `RISCVISD::VECREDUCE_*_VL` nodes have been
given an explicit "merge" operand. This is to faciliate the VP
reductions, where we must be careful to ensure that even if no operation
is performed (when VL=0) we still produce the start value. The RVV
reductions don't update the destination register under these conditions,
so we tie the splatted start value to the output register.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D107657
2021-09-23 11:11:05 +01:00
Craig Topper b33a1cc05b [RISCV] Optimize vp.store with an all ones mask to avoid a vmset.
We can use riscv_vse intrinsic instead of riscv_vse_mask. The code here
is based on similar code for handling masked.scatter and vp.scatter.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D110206
2021-09-22 09:12:47 -07:00
Craig Topper 7c975665b4 [RISCV] Make some arrays of constants 'static const'. NFC
This helps the compiler generate better code.
2021-09-21 10:52:47 -07:00
Craig Topper aeb63d464f [RISCV] Teach RISCVTargetLowering::shouldSinkOperands to sink splats for and/or/xor.
This requires a minor change to CodeGenPrepare to ensure that
shouldSinkOperands will be called for And.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D110106
2021-09-21 10:07:29 -07:00
Ben Shi b3052013b4 [RISCV] Optimize (add (mul x, c0), c1)
Optimize (add (mul x, c0), c1) -> (ADDI (MUL (ADDI, c1/c0), c0), c1%c0),
if c1/c0 and c1%c0 are simm12, while c1 is not.

Optimize (add (mul x, c0), c1) -> (MUL (ADDI, c1/c0), c0),
if c1%c0 is zero, and c1/c0 is simm12 while c1 is not.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D108607
2021-09-21 14:13:14 +00:00
Craig Topper a95ba81073 [RISCV] Teach RISCVTargetLowering::shouldSinkOperands to sink splats for FMA.
If either of the multiplicands is a splat, we can sink it to use
vfmacc.vf or similar.
2021-09-20 11:49:50 -07:00
Craig Topper 04ab6c85ef [RISCV] Teach RISCVTargetLowering::shouldSinkOperands to sink splats for FAdd/FSub/FMul/FDiv. 2021-09-20 10:25:46 -07:00
Craig Topper d85e347a28 [RISCV] Add a pass to recognize VLS strided loads/store from gather/scatter.
For strided accesses the loop vectorizer seems to prefer creating a
vector induction variable with a start value of the form
<i32 0, i32 1, i32 2, ...>. This value will be incremented each
loop iteration by a splat constant equal to the length of the vector.
Within the loop, arithmetic using splat values will be done on this
vector induction variable to produce indices for a vector GEP.

This pass attempts to dig through the arithmetic back to the phi
to create a new scalar induction variable and a stride. We push
all of the arithmetic out of the loop by folding it into the start,
step, and stride values. Then we create a scalar GEP to use as the
base pointer for a strided load or store using the computed stride.
Loop strength reduce will run after this pass and can do some
cleanups to the scalar GEP and induction variable.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D107790
2021-09-20 09:39:44 -07:00
Ben Shi dee5a8ca32 [RISCV] Optimize (add (shl x, c0), (shl y, c1)) with SH*ADD
Optimize (add (shl x, c0), (shl y, c1)) ->
         (SLLI (SH*ADD x, y), c1), if c0-c1 == 1/2/3.

Reviewed By: craig.topper, luismarques

Differential Revision: https://reviews.llvm.org/D108916
2021-09-19 16:35:12 +08:00
Craig Topper 1b736bda3b [RISCV] Enable CGP to sink splat operands of Add/Sub/Mul/Shl/LShr/AShr
LICM may have pulled out a splat, but with .vx instructions we
can fold it into an operation.

This patch enables CGP to reverse the LICM transform and move the
splat back into the loop.

I've started with the commutable integer operations and shifts, but we can
extend this with more operations in future patches.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D109394
2021-09-10 09:04:01 -07:00
Craig Topper a574f0e0c3 [RISCV] Disable use of i128 shift libcalls on RV32.
Since i128 isn't a legal C type on RV32, I don't believe
libgcc implements these functions for RV32. compiler-rt
does implement them because i128 support is enabled
in order to handle long double.

This is consistent with 32-bit X86 and ARM.

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D109383
2021-09-08 14:26:07 -07:00
Kazu Hirata 5c6338de16 [RISCV] Fix "set but not used" warnings 2021-09-07 09:19:31 -07:00
Fraser Cormack a823bdf3ab [RISCV][VP] Custom lower VP_STORE and VP_LOAD
This patch adds support for the vector-predicated `VP_STORE` and
`VP_LOAD` nodes. We do this in the same way we lower `MSTORE` and
`MLOAD`: to regular load/store instructions via intrinsics.

One necessary change was made to `SelectionDAGLegalize` so that
`VP_STORE` nodes' operation actions are taken from the stored "value"
operands, in the same vein as `STORE` or `MSTORE`.

Reviewed By: craig.topper, rogfer01

Differential Revision: https://reviews.llvm.org/D108999
2021-09-07 10:53:25 +01:00
Fraser Cormack f4dee8cb82 [RISCV][VP] Custom lower VP_SCATTER and VP_GATHER
This patch adds support for the `VP_SCATTER` and `VP_GATHER` nodes by
lowering them to RVV's `vsox`/`vlux` instructions, respectively. This
process is almost identical to the existing `MSCATTER`/`MGATHER` support.

One extra change was made to `SelectionDAGLegalize` so that
`VP_SCATTER`'s operation action is derived from its stored "value"
operand rather than its return type (which is always the chain).

Reviewed By: craig.topper, rogfer01

Differential Revision: https://reviews.llvm.org/D108987
2021-09-07 10:43:07 +01:00
Craig Topper 75620fadf5 [RISCV] Change how we encode AVL operands in vector pseudoinstructions to use GPRNoX0.
This patch changes the register class to avoid accidentally setting
the AVL operand to X0 through MachineIR optimizations.

There are cases where we really want to use X0, but we can't get that
past the MachineVerifier with the register class as GPRNoX0. So I've
use a 64-bit -1 as a sentinel for X0. All other immediate values should
be uimm5. I convert it to X0 at the earliest possible point in the VSETVLI
insertion pass to avoid touching the rest of the algorithm. In
SelectionDAG lowering I'm using a -1 TargetConstant to hide it from
instruction selection and treat it differently than if the user
used -1. A user -1 should be selected to a register since it doesn't
fit in uimm5.

This is the rest of the changes started in D109110. As mentioned there,
I don't have a failing test from MachineIR optimizations anymore.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D109116
2021-09-03 09:19:25 -07:00
Craig Topper ccbb4c8b4f [RISCV] Fold (RISCVISD::SELECT_CC X, Y, CC, Z, Z) -> Z.
If the true and false values are the same, we don't need a SELECT_CC.

This would normally be folded before a select is legalized to
select_cc. The test case exploits the late legalization of vscale
to trigger a case where they become identical after legalization.

This works around an issue found on a test case in D107957. In that
case the true/false values were both eventually 0 and the select was
used by a vector AVL operand. The select_cc got expanded to control
flow and a phi, but the phi inputs were both copies from X0. MachineIR
optimizations simplified this to a single copy from X0 going into the
vector instruction. This became the input of a vsetvli after vsetvli
insertion. Then register coalescing folded the copy into the vsetvli.
X0 as the source of a vsetvli is a special encoding and should not be
created by coalesing. We need to fix our vsetvli handling to make sure
this can never happen any other way, but removing the unneeded select
is still a worthwhile optimization.
2021-09-01 12:37:52 -07:00
Nick Desaulniers e9b3f25730 [RISCVISelLowering] avoid emitting libcalls to __mulodi4() and __multi3()
Similar to D108842, D108844, D108926, D108928, and D108936.

__has_builtin(builtin_mul_overflow) returns true for 32b RISCV targets,
but Clang is deferring to compiler RT when encountering long long types.

If the semantics of __has_builtin mean "the compiler resolves these,
always" then we shouldn't conditionally emit a libcall.

Link: https://bugs.llvm.org/show_bug.cgi?id=28629

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D108939
2021-08-31 11:23:56 -07:00
Craig Topper 0560a4adb3 [RISCV] Enable CONCAT_VECTORS for fixed FP vectors.
Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D108487
2021-08-30 08:47:45 -07:00
Craig Topper 0eeab8b282 [RISCV] Add -riscv-v-fixed-length-vector-elen-max to limit the ELEN used for fixed length vectorization.
This adds an ELEN limit for fixed length vectors. This will scalarize
any elements larger than this. It will also disable some fractional
LMULs. For example, if ELEN=32 then mf8 becomes illegal, i32/f32
vectors can't use any fractional LMULs, i16/f16 can only use mf2,
and i8 can use mf2 and mf4.

We may also need something for the scalable vectors, but that has
interactions with the intrinsics and we can't scalarize a scalable
vector.

Longer term this should come from one of the Zve* features
2021-08-27 10:17:35 -07:00
Craig Topper 1b9417454e [RISCV] Insert a sext_inreg when type legalizing i32 shl by constant on RV64.
Similar to what we do for add/sub/mul.

This can help remove some sext.w. There are some regressions on
some bswap tests, but I have an idea how to fix that for a follow up.

A new PACKW pattern is added to handle the new sext_inreg placement.

Differential Revision: https://reviews.llvm.org/D108663
2021-08-26 10:20:19 -07:00
Ben Shi f69fb7ac72 [DAGCombiner] Add target hook function to decide folding (mul (add x, c1), c2)
Reviewed by: lebedev.ri, spatel, craig.topper, luismarques, jrtc27

Differential Revision: https://reviews.llvm.org/D107711
2021-08-22 16:53:32 +08:00
Craig Topper 36d8316cc8 [RISCV] Reduce duplicate code for calling SimplifyDemandedBits.
This encapsulates the APInt creation and worklist management into
a helper function.

To keep one common interface I've use Log2_32 in places that
previously created a mask by subtracting 1 from a power of 2.

Differential Revision: https://reviews.llvm.org/D108324
2021-08-19 07:09:38 -07:00
Craig Topper 6d7ea597ef [RISCV] Insert sext_inreg when type legalizing add/sub/mul with constant LHS.
We already do this for non-constants RHS. This just removes the
special case. I believe the special case may have been needed
because the ANY_EXTEND of a constant used to create zero extended
constants, but we recently changed that to produce sign extended
constants.

D107658 is needed to prevent some regressions.

Reviewed By: luismarques

Differential Revision: https://reviews.llvm.org/D107697
2021-08-18 10:44:25 -07:00
Craig Topper d63f117210 [RISCV] Support RISCVISD::SELECT_CC in ComputeNumSignBitsForTargetNode. 2021-08-13 18:00:09 -07:00
Craig Topper 6f5edc3487 [RISCV] Fold (add (select lhs, rhs, cc, 0, y), x) -> (select lhs, rhs, cc, x, (add x, y))
Similar for sub except sub isn't commutative.

Modify the existing and/or/xor folds to also work on ISD::SELECT
and not just RISCVISD::SELECT_CC. This is needed to make sure
we do this transform before type legalization turns i32 add/sub
into add/sub+sign_extend_inreg on RV64. If we don't do this before
that, the sign_extend_inreg will still be after the select.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D107603
2021-08-10 09:02:56 -07:00
Fraser Cormack 2b4a1d4b86 [RISCV] Improve codegen for shuffles with LHS/RHS splats
Shuffles which are broken into separate halves reveal splats in which
a half is accessed via one index; such operations can be optimized to
use "vrgather.vi".

This optimization could be achieved by adding extra patterns to match
`vrgather_vv_vl` which uses a splat as an index operand, but this patch
instead identifies splat earlier. This way, future optimizations can
build on top of the data gathered here, e.g., to splat-gather dominant
indices and insert any leftovers.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D107449
2021-08-09 10:31:40 +01:00
Craig Topper 2f3b738960 [RISCV] Add optimizations for FMV_X_ANYEXTH similar to FMV_X_ANYEXTW_RV64.
This enables the fneg and fabs combines we have for FMV_X_ANYEXTW_RV64.
2021-08-08 18:30:48 -07:00
Craig Topper 88bc29f5f2 [RISCV] Introduce a RISCV CondCode enum instead of using ISD:SET* in MIR. NFC
Previously we converted ISD condition codes to integers and stored
them directly in our MIR instructions. The ISD enum kind of belongs
to SelectionDAG so that seems like incorrect layering.

This patch instead uses a CondCode node on RISCV::SELECT_CC until
isel and then converts it from ISD encoding to a RISCV specific value.
This value can be converted to/from the RISCV branch opcodes in the
RISCV namespace.

My larger motivation is to possibly support a microarchitectural
feature of some CPUs where a short forward branch over a single
instruction can be predicated internally. This will require a new
pseudo instruction for select that needs to carry a branch condition
and live probably until RISCVExpandPseudos. At that point it can be
expanded to control flow without other instructions ending up in the
predicated basic block. Using an ISD encoding in RISCVExpandPseudos
doesn't seem like correct layering.

Reviewed By: luismarques

Differential Revision: https://reviews.llvm.org/D107400
2021-08-08 17:25:37 -07:00
Craig Topper d4ee84ceee [RISCV] Support FP_TO_S/UINT_SAT for i32 and i64.
The fcvt fp to integer instructions saturate if their input is
infinity or out of range, but the instructions produce a maximum
integer for nan instead of 0 required for the ISD opcodes.

This means we can use the instructions to do the saturating
conversion, but we'll need to fix up the nan case at the end.

We can probably improve the i8 and i16 default codegen as well,
but I'll leave that for a follow up.

Reviewed By: luismarques

Differential Revision: https://reviews.llvm.org/D107230
2021-08-07 16:06:00 -07:00
Fraser Cormack cba6aab971 [RISCV] Support simple fractional steps in matching VID sequences
This patch extends the optimization of VID-sequence BUILD_VECTORs
introduced in D104921 to include simple fractional steps composed of a
separated integer numerator and denominator.

A notable limitation in this sequence detection is that only sequences
with steps N/1 or 1/D are found, meaning that the step between elements
and the frequency with which it changes is consistent across the whole
sequence. Fractional steps such as 2/3 won't be matched as those would
involve more complex tracking of state or some level of backtracking.

As is stands, however, this patch is sufficient to match common
interleave-type shuffle indices, for example matching `<0,0,1,1>` (or
commonly `<0,u,1,u>` or `<u,0,u,1>`) to an index sequence divided by 2.

While the optimization is relatively `undef`-tolerant, due to greedy
pattern-matching there even are some simple patterns which confuse the
sequence detection into identifying either a suboptimal sequence or no
sequence at all.

Currently only fractional-step sequences identified as having a
power-of-two denominator are actually lowered to RVV instructions. This
is to avoid introducing divisions into the generated code.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D106533
2021-08-03 10:38:24 +01:00
Hsiangkai Wang 8b33839f01 [RISCV] Rename vector inline constraint from 'v' to 'vr' and 'vm' in IR.
Differential Revision: https://reviews.llvm.org/D107139
2021-08-01 05:58:17 +08:00
Craig Topper 593059b328 [RISCV] Rename RISCVISD::FCVT_W_RV64 to FCVT_W_RTZ_RV64. NFC
fcvt.w(u) supports multiple rounding modes, but the ISD node
doesn't encode that. So name it to match the rounding mode it uses.
2021-07-31 11:14:59 -07:00
Fraser Cormack 02dd4b59bc [RISCV] Optimize floating-point "dominant value" BUILD_VECTORs
This patch aims to improve the performance of BUILD_VECTORs which are
identified as containing a dominant element. Given that most
floating-point constants themselves require a load from the constant
pool, it was possible for the optimization to actually increase the
number of individual loads on small vectors. The exception is the zero
constant -- +0.0 -- which can be materialized efficiently.

While this optimization could do with a proper cost model to weigh the
benfits of a single vector load vs. the manipulation of individual
elements -- even for integer vectors which often require several
instructions to materialize -- without a concrete RVV implementation to
work with any heuristic is likely to be both more obtuse and inaccurate.

Until then, this patch fixes at least one known obvious deficiency.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D106963
2021-07-29 09:22:34 +01:00
Ben Shi 264b8e2a20 [RISCV] Optimize mul in the zba extension with SH*ADD
This patch makes the following optimization, if the
immediate multiplier is not a simm12.

(mul x, (power_of_2 + 2)) => (SH1ADD x, (SLLI x, bits))
(mul x, (power_of_2 + 4)) => (SH2ADD x, (SLLI x, bits))
(mul x, (power_of_2 + 8)) => (SH3ADD x, (SLLI x, bits))

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D106648
2021-07-29 09:46:41 +08:00
Craig Topper 3106f85945 [RISCV] Fix grammar in a comment. NFC 2021-07-28 09:09:26 -07:00
Craig Topper 54588bcc05 [RISCV] Restrict performANY_EXTENDCombine to prevent an infinite loop.
The sign_extend we insert here can get turned into a zero_extend if
the sign bit is known zero. This can enable a setcc combine that
shrinks compares with zero_extend. This reduces the use count of
the zero_extend allowing other combines to turn it back into an
any_extend.

This restricts the combine to only cases where the result is used
by a CopyToReg. This works for my original motivating case. I
hope the CopyToReg use will prevent any converted extends from
turning back into an any_extend.

Reviewed By: luismarques

Differential Revision: https://reviews.llvm.org/D106754
2021-07-28 09:05:45 -07:00
Fraser Cormack 172487fe4c [RISCV] Add support for vector saturating add/sub operations
This patch adds support for lowering the saturating vector add/sub
intrinsics to RVV instructions, for both fixed-length and
scalable-vector forms alike.

Note that some of the DAG combines are still not triggering for the
scalable-vector tests. These require a bit more work in the DAGCombiner
itself.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D106651
2021-07-27 10:04:14 +01:00
Craig Topper c63dbd8501 [RISCV] Custom lower (i32 (fptoui/fptosi X)).
I stumbled onto a case where our (sext_inreg (assertzexti32 (fptoui X)), i32)
isel pattern can cause an fcvt.wu and fcvt.lu to be emitted if
the assertzexti32 has an additional user. If we add a one use check
it would just cause a fcvt.lu followed by a sext.w when only need
a fcvt.wu to satisfy both users.

To mitigate this I've added custom isel and new ISD opcodes for
fcvt.wu. This allows us to keep know it started life as a conversion
to i32 without needing to match multiple nodes. ComputeNumSignBits
has been taught that this new nodes produces 33 sign bits. To
prevent regressions when we need to zero extend the result of an
(i32 (fptoui X)), I've added a DAG combine to convert it to an
(i64 (fptoui X)) before type legalization. In most cases this would
happen in InstCombine, but a zero_extend can be created for function
returns or arguments.

To keep everything consistent I've added new nodes for fptosi as well.

Reviewed By: luismarques

Differential Revision: https://reviews.llvm.org/D106346
2021-07-24 10:50:43 -07:00
Fraser Cormack b115c038d2 [RISCV] Fix a crash when lowering split float arguments
Lowering certain float vectors without legal vector types could cause a
crash due to a bad interaction between passing floats via GPRs and
argument splitting. Split vector floats appear just like scalar floats.
Under certain situations we choose to pass these float arguments via
GPRs and use an XLenVT location and set the 'BCvt' info to track how
they must be converted back to floating-point values. However, later
logic for handling split arguments may take over, in which case we lose
the previous information and set the 'Indirect' info, thus incorrectly
lowering to integer types.

I don't believe that we would have come across the notion of split
floating-point arguments before. This patch addresses the issue by
updating the lowering so that split arguments are only passed indirectly
when they are scalar integer types.

This has some change to how we lower some larger illegal float vectors,
as can be seen in 'fastcc-float.ll' where the vector is now passed
partly in registers and partly on the stack.

Reviewed By: luismarques

Differential Revision: https://reviews.llvm.org/D102852
2021-07-22 09:55:26 +01:00
Fraser Cormack 7b3a69bc16 [RISCV] Lower more BUILD_VECTOR sequences to RVV's VID
This relands a6ca88e908 which was originally
reverted due to overflow bugs in e3fa2b1eab.

This patch teaches the compiler to identify a wider variety of
`BUILD_VECTOR`s which form integer arithmetic sequences, and to lower
them to `vid.v` with modifications for non-unit steps and non-zero
addends.

The sequences handled by this optimization must either be monotonically
increasing or decreasing. Consecutive elements holding the same value
indicate a fractional step which, while simple mathematically,
becomes more complex to handle both in the realm of lossy integer
division and in the presence of `undef`s.

For example, a common "interleaving" shuffle index will be lowered by
LLVM to both `<0,u,1,u,2,...>` and `<u,0,u,1,u,...>` `BUILD_VECTOR`
nodes. Either of these would ideally be lowered to `vid.v` shifted right
by 1. Detection of this sequence in presence of general `undef` values
is more complicated, however: `<0,u,u,1,>` could match either
`<0,0,0,1,>` or `<0,0,1,1,>` depending on later values in the sequence.
Both are possible, so backtracking or multiple passes is inevitable.

Sticking to monotonic sequences keeps the logic simpler as it can be
done in one pass. Fractional steps will likely be a separate
optimization in a future patch.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D104921
2021-07-22 09:36:12 +01:00
Eli Friedman 0ca46a1757 [SelectionDAG] Fix the representation of ISD::STEP_VECTOR.
The existing rule about the operand type is strange.  Instead, just say
the operand is a TargetConstant with the right width.  (Legalization
ignores TargetConstants, so it doesn't matter if that width is legal.)

Highlights:

1. I had to substantially rewrite the AArch64 isel patterns to expect a
TargetConstant.  Nothing too exotic, but maybe a little hairy. Maybe
worth considering a target-specific node with some dagcombines instead
of this complicated nest of isel patterns.
2. Our behavior on RV32 for vectors of i64 has changed slightly. In
particular, we correctly preserve the width of the arithmetic through
legalization.  This changes the DAG a bit. Maybe room for
improvement here.
3. I explicitly defined the behavior around overflow. This is necessary
to make the DAGCombine transforms legal, and I don't think it causes any
practical issues.

Differential Revision: https://reviews.llvm.org/D105673
2021-07-21 10:58:40 -07:00
Craig Topper 81efb82570 [RISCV] Teach RISCVMatInt about cases where it can use LUI+SLLI to replace LUI+ADDI+SLLI for large constants.
If we need to shift left anyway we might be able to take advantage
of LUI implicitly shifting its immediate left by 12 to cover part
of the shift. This allows us to use more bits of the LUI immediate
to avoid an ADDI.

isDesirableToCommuteWithShift now considers compressed instruction
opportunities when deciding if commuting should be allowed.

I believe this is the same or similar to one of the optimizations
from D79492.

Reviewed By: luismarques, arcbbb

Differential Revision: https://reviews.llvm.org/D105417
2021-07-20 09:22:06 -07:00
Craig Topper 84877a098a [RISCV] Use unordered indexed loads for MGATHER.
I don't think the semantics of the llvm masked gather intrinsic care
about the order the elements are loaded. For example, type legalization
by splitting will chain them in parallel. This is different than
scatter which we do chain in order.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D106025
2021-07-20 08:46:02 -07:00
Craig Topper 50302feb1d [SelectionDAG][RISCV] Use isSExtCheaperThanZExt to control whether sext or zext is used for constant folding any_extend.
RISCV would prefer a sign extended constant since that works better
with our constant materialization. We have an existing TLI hook we
use to control sign extension of setcc operands in type legalization.
That hook happens to do the right check we need here, but might be
straying from its original purpose. With only RISCV defining this
hook in tree, I wasn't sure if it was worth adding another hook
with identical behavior.

This is an alternative to D105785 where I tried to handle this in
the RISCV backend by not creating ANY_EXTENDs in some places.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D105918
2021-07-19 09:25:28 -07:00
Craig Topper d0f8047d37 [RISCV] Teach computeKnownBitsForTargetNode that VLENB will never be more than 65536/8. 2021-07-17 11:24:20 -07:00
Craig Topper 173332d175 [RISCV] Manually emit the best shift for VSCALE lowering to improve codegen.
We assume VLENB is a multiple of 8 and previously relied on shift
pairs being optimized to an AND+SHL/SHR and computeKnownBits
removing the AND. This doesn't happen if (vlenb >> 3) gets CSEd
to have multiple uses. This patch manually emits the best shift
to workaround this.
2021-07-17 00:52:07 -07:00
Craig Topper 4dbb788068 [RISCV] Teach constant materialization that it can use zext.w at the end with Zba to reduce number of instructions.
If the upper 32 bits are zero and bit 31 is set, we might be able to
use zext.w to fill in the zeros after using an lui and/or addi.

Most of this patch is plumbing the subtarget features into the constant
materialization.

Reviewed By: luismarques

Differential Revision: https://reviews.llvm.org/D105509
2021-07-16 09:35:56 -07:00
Craig Topper 0ce13f92b7 [RISCV] Add curly braces around a case body that declares variables. NFC
This is at the end of the switch so doesn't cause any issues now,
but if a new case is added it will break.
2021-07-16 09:35:56 -07:00
Fraser Cormack e3fa2b1eab Revert "[RISCV] Lower more BUILD_VECTOR sequences to RVV's VID"
This reverts commit a6ca88e908.

More caution is required to avoid overflow/underflow. Thanks to the
santizers for catching this.
2021-07-16 15:00:20 +01:00
Fraser Cormack a6ca88e908 [RISCV] Lower more BUILD_VECTOR sequences to RVV's VID
This patch teaches the compiler to identify a wider variety of
`BUILD_VECTOR`s which form integer arithmetic sequences, and to lower
them to `vid.v` with modifications for non-unit steps and non-zero
addends.

The sequences handled by this optimization must either be monotonically
increasing or decreasing. Consecutive elements holding the same value
indicate a fractional step which, while simple mathematically,
becomes more complex to handle both in the realm of lossy integer
division and in the presence of `undef`s.

For example, a common "interleaving" shuffle index will be lowered by
LLVM to both `<0,u,1,u,2,...>` and `<u,0,u,1,u,...>` `BUILD_VECTOR`
nodes. Either of these would ideally be lowered to `vid.v` shifted right
by 1. Detection of this sequence in presence of general `undef` values
is more complicated, however: `<0,u,u,1,>` could match either
`<0,0,0,1,>` or `<0,0,1,1,>` depending on later values in the sequence.
Both are possible, so backtracking or multiple passes is inevitable.

Sticking to monotonic sequences keeps the logic simpler as it can be
done in one pass. Fractional steps will likely be a separate
optimization in a future patch.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D104921
2021-07-16 10:35:13 +01:00
Fraser Cormack 03a4702c88 [RISCV] Fix the neutral element in vector 'fadd' reductions
Using positive zero as the neutral element in 'fadd' reductions, while
it generates better code, is incorrect. The correct neutral element is
negative zero: 0.0 + -0.0 = 0.0, whereas -0.0 + -0.0 = -0.0.

There are perhaps more optimal lowerings of negative zero avoiding
constant-pool loads which could be left as future work.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D105902
2021-07-14 10:18:38 +01:00
Craig Topper 1e670dc7d7 [RISCV] Use DIVUW/REMUW/DIVW instructions for i8/i16/i32 udiv/urem/sdiv when LHS is constant.
We don't really have optimizations for division with a constant
LHS. If we don't use a W instruction we end up needing to sign
or zero extend the RHS to use the 64-bit instruction.

I had to sign_extend i32 constants on the LHS instead of using
any_extend which becomes zero_extend. If we don't do this, constants
that were originally negative become harder to materialize. I think
this problem exists for more of our W instruction cases. For example
(i32 (shl -1, X)), but we don't have lit tests. I'll work on that
as a follow up.

I also left a FIXME for enabling W instruction for RHS constants
under -Oz.

Reviewed By: luismarques

Differential Revision: https://reviews.llvm.org/D105769
2021-07-13 10:33:57 -07:00
Fangrui Song 3d89fb4d13 [RISCV] Support machine constraint "S"
Similar to D46745, "S" represents an absolute symbolic operand, which
can be used to specify the access models, e.g.

  extern int var;
  void *addr_via_asm() {
    void *ret;
    asm("lui %0, %%hi(%1)\naddi %0,%0,%%lo(%1)" : "=r"(ret) : "S"(&var));
    return ret;
  }

'S' is documented in trunk GCC: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101275

Reviewed By: luismarques

Differential Revision: https://reviews.llvm.org/D105254
2021-07-13 09:30:09 -07:00
Fraser Cormack d991b7212b [RISCV] Pass undef VECTOR_SHUFFLE indices on to BUILD_VECTOR
Often when lowering vector shuffles, we split the shuffle into two
LHS/RHS shuffles which are then blended together. To do so we split the
original indices into two, indexed into each respective vector. These
two index vectors are then separately lowered as BUILD_VECTORs.

This patch forwards on any undef indices to the BUILD_VECTOR, rather
than having the VECTOR_SHUFFLE lowering decide on an optimal concrete
index. The motiviation for ths change is so that we don't duplicate
optimization logic between the two lowering methods and let BUILD_VECTOR
do what it does best.

Propagating undef in this way allows us, for example, to generate
`vid.v` to produce the LHS indices of commonly-used interleave-type
shuffles. I have designs on further optimizing interleave-type and other
common shuffle patterns in the near future.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D104789
2021-07-13 10:41:54 +01:00
Craig Topper 12d51f95fe [RISCV] Implement lround*/llround*/lrint*/llrint* with fcvt instruction with -fno-math-errno
These are fp->int conversions using either RMM or dynamic rounding modes.

The lround and lrint opcodes have a return type of either i32 or
i64 depending on sizeof(long) in the frontend which should follow
xlen. llround/llrint should always return i64 so we'll need a libcall
for those on rv32.

The frontend will only emit the intrinsics if -fno-math-errno is in
effect otherwise a libcall will be emitted which will not use
these ISD opcodes.

gcc also does this optimization.

Reviewed By: arcbbb

Differential Revision: https://reviews.llvm.org/D105206
2021-07-06 11:43:22 -07:00
Craig Topper 2b5e53111a [RISCV] Add support for matching vwmul(u) and vwmacc(u) from fixed vectors.
This adds a DAG combine to detect sext/zext inputs and emit a
new ISD opcode. The extends will either be removed or replaced
with narrower extends.

Isel patterns are used to match add and widening mul to vwmacc
similar to the recently added vmacc patterns.

There's still some work to be to match vmulsu.
We should also rewrite splats that were extended as scalars and
then splatted.

Reviewed By: arcbbb

Differential Revision: https://reviews.llvm.org/D104802
2021-07-06 10:24:31 -07:00
Craig Topper 3b6dfa381e [RISCV] Protect the SHL/SRA/SRL handlers in LowerOperation against being called for an illegal i32 shift amount.
It seems it is possible for DAG combine to create a shl with an
i64 result type and an i32 shift amount. This is ok before type
legalization since the type don't need to match in SelectionDAG.
This results in type legalization calling LowerOperation to
legalize just the amount. We weren't expecting this so we
asserted for not finding a fixed vector shift.

To fix this, I've added a check for the fixed vector case and
returned SDValue() to get the default type legalizer. I've
factored all shifts together and added a fixed vector specific
handler to avoid repeating similar code for each in
LowerOperation.

The particular case I found was exposed by D104581, but the bad
shift is created after that patch triggers.
2021-06-29 09:45:13 -07:00
Jim Lin 779d2b0a42 [RISCV][NFC] Combine the control flow for different RetOp of interrupt function
Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D104838
2021-06-26 17:28:03 +08:00
Craig Topper d4f4a1ba62 [RISCV] Add DAG combine to detect opportunities to replace (i64 (any_extend (i32 X)) with sign_extend.
If type legalization is going to insert a sign_extend for other users
of X and we can fold the sign_extend into ADDW/MULW/SUBW, it is
better to replace the ANY_EXTEND so we don't end up with a separate
ADD/MUL/SUB instruction for the users of the ANY_EXTEND.

I'm only handling setcc uses right now, but there are other
instructions that force sign_extends like ashr.

There are probably other *W instructions we could use in addition
to ADDW/SUBW/MULW.

My motivating case was a loop terminating compare and a phi use
as seen in the new test file.

Reviewed By: asb

Differential Revision: https://reviews.llvm.org/D104581
2021-06-25 23:16:37 -07:00
Fraser Cormack a4729f7f88 [RISCV] Lower RVV vector SELECTs to VSELECTs
This patch optimizes the code generation of vector-type SELECTs (LLVM
select instructions with scalar conditions) by custom-lowering to
VSELECTs (LLVM select instructions with vector conditions) by splatting
the condition to a vector. This avoids the default expansion path which
would either introduce control flow or fully scalarize.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D104772
2021-06-24 10:12:51 +01:00
Fraser Cormack fed1503e85 [RISCV][VP] Lower FP VP ISD nodes to RVV instructions
With the exception of `frem`, this patch supports the current set of VP
floating-point binary intrinsics by lowering them to to RVV instructions. It
does so by using the existing `RISCVISD *_VL` custom nodes as an intermediate
layer. Both scalable and fixed-length vectors are supported by using this
method.

The `frem` node is unsupported due to a lack of available instructions. For
fixed-length vectors we could scalarize but that option is not (currently)
available for scalable-vector types. The support is intentionally left out so
it equivalent for both vector types.

The matching of vector/scalar forms is currently lacking, as scalable vector
types do not lower to the custom `VFMV_V_F_VL` node. We could either make
floating-point scalable vector splats lower to this node, or support the
matching of multiple kinds of splat via a `ComplexPattern`, much like we do for
integer types.

Reviewed By: rogfer01

Differential Revision: https://reviews.llvm.org/D104237
2021-06-17 10:04:00 +01:00
Fraser Cormack c75e454cb9 [RISCV] Transform unaligned RVV vector loads/stores to aligned ones
This patch adds support for loading and storing unaligned vectors via an
equivalently-sized i8 vector type, which has support in the RVV
specification for byte-aligned access.

This offers a more optimal path for handling of unaligned fixed-length
vector accesses, which are currently scalarized. It also prevents
crashing when `LegalizeDAG` sees an unaligned scalable-vector load/store
operation.

Future work could be to investigate loading/storing via the largest
vector element type for the given alignment, in case that would be more
optimal on hardware. For instance, a 4-byte-aligned nxv2i64 vector load
could loaded as nxv4i32 instead of as nxv16i8.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D104032
2021-06-14 18:12:18 +01:00
Fraser Cormack 502edebd9d [ValueTypes][RISCV] Cap RVV fixed-length vectors by size
This patch changes RVV's policy for its supported list of fixed-length
vector types by capping by vector size rather than element count. Now
all 1024-byte vectors (of supported element types) are supported, rather
than all 256-element vectors.

This is a more natural fit for the architecture, and allows us to, for
example, improve the support for vector bitcasts.

This change necessitated the adding of some new simple types to avoid
"regressing" on the number of currently-supported vectors. We round out
the 1024-byte types by adding `v512i8`, `v1024i8`, `v512i16` and
`v512f16`.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D103884
2021-06-09 12:15:37 +01:00
Fraser Cormack e8f1f89103 [RISCV] Support CONCAT_VECTORS on scalable masks
This patch is a simple fix which registers CONCAT_VECTORS as
custom-lowered for scalable mask vectors. This follows the pattern of
all other scalable-vector types, as the default expansion of
CONCAT_VECTORS cannot handle scalable types, and even if it did it'd go
through the stack and generate worse code.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D103896
2021-06-09 09:07:44 +01:00
Craig Topper f30f8b4f12 [RISCV] Lower i8/i16 bswap/bitreverse to grevi/greviw with Zbp.
Include known bits support so we know we don't need to zext the
output if the input was already zero extended.

Reviewed By: luismarques

Differential Revision: https://reviews.llvm.org/D103757
2021-06-07 10:31:51 -07:00
Craig Topper 8bde5f06a1 [RISCV] Replace && with ||. Spotted by coverity.
We should be exiting when the shift amount is greater than
the bit width regardless of whether it is a power of 2.

Reported by Simon Pilgrim here https://reviews.llvm.org/D96661

This requires getting a shift amount that is out of bounds that
wasn't already optimized by SelectionDAG. This would be pretty
trick to construct a test for.

Or it would require a non-power of 2 shift amount and a mask
that has runs of ones and zeros of the next lowest power of 2 from
that shift amount. I tried a little to produce a test for this,
but didn't get it to work.
2021-06-06 13:09:51 -07:00
Nikita Popov 1ffa6499ea [TargetLowering] Use IRBuilderBase instead of IRBuilder<> (NFC)
Don't require a specific kind of IRBuilder for TargetLowering hooks.
This allows us to drop the IRBuilder.h include from TargetLowering.h.

Differential Revision: https://reviews.llvm.org/D103759
2021-06-06 16:29:50 +02:00
Nikita Popov 9914200393 [CodeGen] Add missing includes (NFC)
These currently rely on the IRBuilder.h include in TargetLowering.h.
Make them explicit.
2021-06-06 15:48:27 +02:00
Fraser Cormack 3b0a33d0ad [RISCV] Expand unaligned fixed-length vector memory accesses
RVV vectors must be aligned to their element types, so anything less is
unaligned.

For regular loads and stores, our custom-lowering of fixed-length
vectors meant that we opted out of LegalizeDAG's built-in unaligned
expansion. This patch adds that logic in to our custom lower function.

For masked intrinsics, we declare that anything unaligned is not legal,
leaving the ScalarizeMaskedMemIntrin pass to do the expansion for us.

Note that neither of these methods can handle the expansion of
scalable-vector memory ops, so those cases are left alone by this patch.
Scalable loads and stores already go through expansion by default but
hit an assertion, and scalable masked intrinsics will silently generate
incorrect code. It may be prudent to return an error in both of these
cases.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D102493
2021-06-02 09:27:44 +01:00
Fraser Cormack 4f500c402b [RISCV] Support vector types in combination with fastcc
This patch extends the RISC-V lowering of the 'fastcc' calling
convention to vector types, both fixed-length and scalable. Without this
patch, any function passing or returning vector types by value would
throw a compiler error.

Vectors are handled in 'fastcc' much as they are in the default calling
convention, the noticeable difference being the extended set of scalar
GPR registers that can be used to pass vectors indirectly.

Reviewed By: HsiangKai

Differential Revision: https://reviews.llvm.org/D102505
2021-06-01 10:31:18 +01:00
Fraser Cormack 2b37c405cc [RISCV] Scale scalably-typed split argument offsets by VSCALE
This patch fixes a bug in lowering scalable-vector types in RISC-V's
main calling convention. When scalable-vector types are split and passed
indirectly, the target is responsible for scaling the offset --
initially set to the known-minimum store size -- by the scalable factor.

Before this we were issuing overlapping loads or stores to the different
parts, leading to incorrect codegen.

Credit to @HsiangKai for spotting this.

Reviewed By: HsiangKai

Differential Revision: https://reviews.llvm.org/D103262
2021-05-31 10:43:13 +01:00
Fraser Cormack eb23936591 [RISCV] Support vector conversions between fp and i1
This patch custom lowers FP_TO_[US]INT and [US]INT_TO_FP conversions
between floating-point and boolean vectors. As the default action is
scalarization, this patch both supports scalable-vector conversions and
improves the code generation for fixed-length vectors.

The lowering for these conversions can piggy-back on the existing
lowering, which lowers the operations to a supported narrowing/widening
conversion and then either an extension or truncation.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D103312
2021-05-31 09:55:39 +01:00
Fraser Cormack 5a80dc4988 [VP][SelectionDAG] Add a target-configurable EVL operand type
This patch adds a way for the target to configure the type it uses for
the explicit vector length operands of VP SDNodes. The type must be a
legal integer type (there is still no target-independent legalization of
this operand) and must currently be at least as big as i32, the type
used by the IR intrinsics. An implicit zero-extension takes place on
targets which choose a larger type. All VP nodes should be created with
this type used for the EVL operand.

This allows 64-bit RISC-V to avoid custom legalization of all VP nodes,
keeping them in their target-independent form for that bit longer.

Reviewed By: simoll

Differential Revision: https://reviews.llvm.org/D103027
2021-05-27 15:27:36 +01:00
Fraser Cormack b7101e218c [DAGCombine][RISCV] Don't try to trunc-store combined vector stores
DAGCombine's `mergeStoresOfConstantsOrVecElts` optimization is told
whether it's to use vector types and also whether it's to issue a
truncating store. However, the truncating store code path assumes a
scalar integer `ConstantSDNode`, and when using vector types it creates
either a `BUILD_VECTOR` or `CONCAT_VECTORS` to store: neither of which
is a constant.

The `riscv64` target is able to expose a crash here because it switches
on both code paths at the same time. The `f32` is stored as `i32` which
must be promoted to `i64`, necessitating a truncating store.
It also decides later that it prefers a vector store of `v2f32`.

While vector truncating stores are legal, this combine is not able to
emit them. We also don't have a test case. This patch adds an assert to
catch this case more gracefully, and updates one of the caller functions
to the function to turn off the use of truncating stores when preferring
vectors.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D103173
2021-05-27 14:16:32 +01:00
Fraser Cormack 8c73a31c11 [RISCV] Allow passing fixed-length vectors via the stack
The vector calling convention dictates that when the vector argument
registers are exhaused, GPRs are used to pass the address via the stack.
When the GPRs themselves are exhausted, at best we would previously
crash with an assertion, and at worst we'd generate incorrect code.

This patch addresses this issue by passing fixed-length vectors via the
stack with their full fixed-length size and aligned to their element
type size. Since the calling convention lowering can't yet handle
scalable vector types, this patch adds a fatal error to make it clear
that we are lacking in this regard.

Reviewed By: HsiangKai

Differential Revision: https://reviews.llvm.org/D102422
2021-05-27 14:14:07 +01:00
Fraser Cormack 772b58a641 [SelectionDAG][RISCV] Don't unroll 0/1-type bool VSELECTs
This patch extends the cases in which the legalizer is able to express
VSELECT in terms of XOR/AND/OR. When dealing with a VSELECT between
boolean vector types, the mask itself is an all-ones or all-ones value
of the operand type, so a 0/1 boolean type behaves identically to a 0/-1
type.

This greatly helps RISC-V which relies on expansion for these nodes. It
also allows scalable-vector bool VSELECTs to use the default expansion,
where before it would crash in SelectionDAG::UnrollVectorOp.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D103147
2021-05-27 10:08:57 +01:00
Craig Topper 9065118b64 [RISCV] Optimize SEW=64 shifts by splat on RV32.
SEW=64 shifts only uses the log2(64) bits of shift amount. If we're
splatting a 64 bit value in 2 parts, we can avoid splatting the
upper bits and just let the low bits be sign extended. They won't
be read anyway.

For the purposes of SelectionDAG semantics of the generic ISD opcodes,
if hi was non-zero or bit 31 of the low is 1, the shift was already
undefined so it should be ok to replace high with sign extend of low.

In order do be able to find the split i64 value before it becomes
a stack operation, I added a new ISD opcode that will be expanded
to the stack spill in PreprocessISelDAG. This new node is conceptually
similar to BuildPairF64, but I expanded earlier so that we could
go through regular isel to get the right VLSE opcode for the LMUL.
BuildPairF64 is expanded in a CustomInserter.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D102521
2021-05-26 10:23:32 -07:00
Craig Topper b510e4cf1b [RISCV] Add a vsetvli insert pass that can be extended to be aware of incoming VL/VTYPE from other basic blocks.
This is a replacement for D101938 for inserting vsetvli
instructions where needed. This new version changes how
we track the information in such a way that we can extend
it to be aware of VL/VTYPE changes in other blocks. Given
how much it changes the previous patch, I've decided to
abandon the previous patch and post this from scratch.

For now the pass consists of a single phase that assumes
the incoming state from other basic blocks is unknown. A
follow up patch will extend this with a phase to collect
information about how VL/VTYPE change in each block and
a second phase to propagate this information to the entire
function. This will be used by a third phase to do the
vsetvli insertion.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D102737
2021-05-24 11:47:27 -07:00
Fraser Cormack 7a211ed110 [RISCV] Prevent store combining from infinitely looping
RVV code generation does not successfully custom-lower BUILD_VECTOR in all
cases. When it resorts to default expansion it may, on occasion, be expanded to
scalar stores through the stack. Unfortunately these stores may then be picked
up by the post-legalization DAGCombiner which merges them again. The merged
store uses a BUILD_VECTOR which is then expanded, and so on.

This patch addresses the issue by overriding the `mergeStoresAfterLegalization`
hook. A lack of granularity in this method (being passed the scalar type) means
we opt out in almost all cases when RVV fixed-length vector support is enabled.
The only exception to this rule are mask vectors, which are always either
custom-lowered or are expanded to a load from a constant pool.

Reviewed By: HsiangKai

Differential Revision: https://reviews.llvm.org/D102913
2021-05-24 10:19:32 +01:00
Fraser Cormack c74ab891fc [RISCV] Ensure small mask BUILD_VECTORs aren't expanded
The default expansion for BUILD_VECTORs -- save for going through
shuffles -- is to go through the stack. This method only works when the
type is at least byte-sized, so for v2i1 and v4i1 we would crash.

This patch ensures that small mask-type BUILD_VECTORs are always handled
without crashing. We lower to a SETCC of the equivalent i8 type.

This also exposes some pre-existing issues where the lowering when
optimizing for size results in larger code than without. Those will be
tackled in future patches.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D102767
2021-05-20 19:12:29 +01:00
Fraser Cormack 26bd2250c1 [RISCV] Ensure shuffle splat operands are type-legal
The use of `SelectionDAG::getSplatValue` isn't guaranteed to return a
type-legal splat value as it may implicitly extract a vector element
from another shuffle. It is not permitted to introduce an illegal type
when lowering shuffles.

This patch addresses the crash by adding a boolean flag to
`getSplatValue`, defaulting to false, which when set will ensure a
type-legal return value. If it is unable to do that it will fail to
return a splat value.

I've been through the existing uses of `getSplatValue` in other targets
and was unable to find a need or test cases showing a need to update
their uses. In some cases, the call is made during `LegalizeVectorOps`
which may still produce illegal scalar types. In other situations, the
illegally-typed splat value may be quickly patched up to a legal type
(such as any-extending the returned `extract_vector_elt` up to a legal
type) before `LegalizeDAG` notices.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D102687
2021-05-20 18:00:03 +01:00
Fraser Cormack ca2c245ba4 [RISCV] Support INSERT_VECTOR_ELT into i1 vectors
Like the element extraction of these vectors, we choose to promote up to
an i8 vector type and perform the insertion there.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D102697
2021-05-19 09:41:50 +01:00
Craig Topper 9c345407b4 [RISCV] Remove RISCVII:VSEW enum. Make encodeVYPE operate directly on SEW.
The VSEW encoding isn't a useful value to pass around. It's better
to use SEW or log2(SEW) directly. The only real ugliness is that
the vsetvli IR intrinsics use the VSEW encoding, but it's easy
enough to decode that when the intrinsic is processed.
2021-05-12 13:19:08 -07:00
Evandro Menezes 3a64b7080d [RISCV] Move instruction information into the RISCVII namespace (NFC)
Move instruction attributes into the `RISCVII` namespace and add associated helper functions.

Differential Revision: https://reviews.llvm.org/D102268
2021-05-11 16:32:42 -05:00
Craig Topper ce6e4f27dd [RISCV] Use fractional LMULs for fixed length types smaller than riscv-v-vector-bits-min.
My thought process is that if v2i64 is an LMUL=1 type then v2i32
should be an LMUL=1/2 type. We limit the fractional LMUL so that
SEW=64 clips to LMUL=1, SEW=32 clips to LMUL=1/2, etc. This
ensures there's always a fractional LMUL available to truncate a type.
This does reduce the number of vsetvlis in some cases.

Some tests increase vsetvlis because the best container type for a
mask type is dependent on the LMUL+SEW that the mask was produced
from, but you can't tell that from the type. I think this is
something we need to solve this in the machine IR when optimizing
vsetvlis.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D101215
2021-05-11 09:42:48 -07:00
Craig Topper 80b9510806 [RISCV] Correct VL for fixed length masked scatter.
We were incorrectly calling getVectorNumElements on a scalable
vector type. This shouldn't be allowed. This gives a warning on
EVT, but not MVT.
2021-05-10 09:50:08 -07:00
Fraser Cormack 6db0cedd23 [LegalizeVectorOps][RISCV] Add scalable-vector SELECT expansion
This patch extends VectorLegalizer::ExpandSELECT to permit expansion
also for scalable vector types. The only real change is conditionally
checking for BUILD_VECTOR or SPLAT_VECTOR legality depending on the
vector type.

We can use this to fix "cannot select" errors for scalable vector
selects on the RISCV target. Note that in future patches RISCV will
possibly custom-lower vector SELECTs to VSELECTs for branchless codegen.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D102063
2021-05-10 08:22:35 +01:00
Craig Topper 6660319cef [RISCV] Remove unused RISCV::VLEFF and VLEFF_MASK. NFC
Looks like these got left behind when vleff isel was moved to
X86ISelDAGToDAG.cpp
2021-05-06 09:41:29 -07:00
Fraser Cormack 6f17613bfb [RISCV][VP] Lower VP ISD nodes to RVV instructions
This patch supports all of the current set of VP integer binary
intrinsics by lowering them to to RVV instructions. It does so by using
the existing RISCVISD *_VL custom nodes as an intermediate layer. Both
scalable and fixed-length vectors are supported by using this method.

One notable change to the existing vector codegen strategy is that
scalable all-ones and all-zeros mask SPLAT_VECTORs are now lowered to
RISCVISD VMSET_VL and VMCLR_VL nodes to match their fixed-length
BUILD_VECTOR counterparts. This allows them to reuse the existing
"all-ones" VL patterns.

To reduce the size of the phabricator diff, some tests are intentionally
left out and will be added later if the patch is accepted.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D101826
2021-05-05 12:32:24 +01:00
Fraser Cormack cd6a52fede [RISCV] Cap legal fixed-length vectors to 256-element types
Previously, RISC-V would make legal all fixed-length vectors types whose
size are less than or equal to some function of the minimum value of
VLEN and the maximum-permissible LMUL grouping.

Due to vector legalization issues, this patch instead caps the legal
fixed-length vector types to those with 256 elements. This value was
chosen because it is the longest vector length which has corresponding
MVTs across all supported element types.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D101839
2021-05-05 09:51:08 +01:00
Fraser Cormack 46fa214a6f [RISCV] Lower splats of non-constant i1s as SETCCs
This patch adds support for splatting i1 types to fixed-length or
scalable vector types. It does so by lowering the operation to a SETCC
of the equivalent i8 type.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D101465
2021-05-04 09:14:05 +01:00
Fraser Cormack d23e4f6872 [RISCV] Add support for fmin/fmax vector reductions
Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D101518
2021-05-03 10:33:51 +01:00
Craig Topper ba63cdb8f2 [RISCV] Store SEW in RISCV vector pseudo instructions in log2 form.
This shrinks the immediate that isel table needs to emit for these
instructions. Hoping this allows me to change OPC_EmitInteger to
use a better variable length encoding for representing negative
numbers. Similar to what was done a few months ago for OPC_CheckInteger.

The alternative encoding uses less bytes for negative numbers, but
increases the number of bytes need to encode 64 which was a very
common number in the RISCV table due to SEW=64. By using Log2 this
becomes 6 and is no longer a problem.
2021-05-02 12:09:20 -07:00
Fraser Cormack 791766e6d2 [RISCV] Support STEP_VECTOR with a step greater than one
DAGCombiner was recently taught how to combine STEP_VECTOR nodes,
meaning the step value is no longer guaranteed to be one by the time it
reaches the backend for lowering.

This patch supports such cases on RISC-V by lowering to other step
values to a multiply following the vid.v instruction. It includes a
small optimization for common cases where the multiply can be expressed
as a shift left.

Reviewed By: rogfer01

Differential Revision: https://reviews.llvm.org/D100856
2021-04-30 09:36:18 +01:00
Craig Topper dcdda2bdf2 [RISCV] Teach DAG combine to fold (and (select_cc lhs, rhs, cc, -1, c), x) -> (select_cc lhs, rhs, cc, x, (and, x, c))
Similar for or/xor with 0 in place of -1.

This is the canonical form produced by InstCombine for something like `c ? x & y : x;` Since we have to use control flow to expand select we'll usually end up with a mv in basic block. By folding this we may be able to pull the and/or/xor into the block instead and avoid a mv instruction.

The code here is based on code from ARM that uses this to create predicated instructions. I'm doing it on SELECT_CC so it happens late, but we could do it on select earlier which is what ARM does. I'm not sure if we lose any combine opportunities if we do it earlier.

I left out add and sub because this can separate sext.w from the add/sub. It also made a conditional i64 addition/subtraction on RV32 worse. I guess both of those would be fixed by doing this earlier on select.

The select-binop-identity.ll test has not been commited yet, but I made the diff show the changes to it.

Reviewed By: luismarques

Differential Revision: https://reviews.llvm.org/D101485
2021-04-29 09:43:51 -07:00
Craig Topper 0c330afdfa [RISCV] Enable SPLAT_VECTOR for fixed vXi64 types on RV32.
This replaces D98479.

This allows type legalization to form SPLAT_VECTOR_PARTS so we don't
lose the splattedness when the scalar type is split.

I'm handling SPLAT_VECTOR_PARTS for fixed vectors separately so
we can continue using non-VL nodes for scalable vectors.

I limited to RV32+vXi64 because DAGCombiner::visitBUILD_VECTOR likes
to form SPLAT_VECTOR before seeing if it can replace the BUILD_VECTOR
with other operations. Especially interesting is a splat BUILD_VECTOR of
the extract_vector_elt which can become a splat shuffle, but won't if
we form SPLAT_VECTOR first. We either need to reorder visitBUILD_VECTOR
or add visitSPLAT_VECTOR.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D100803
2021-04-29 08:20:09 -07:00
Craig Topper 25391cec3a [RISCV] Teach computeKnownBits that vsetvli returns number less than 2^31.
This seems like a reasonable upper bound on VL. WG discussions for
the V spec would probably allow us to use 2^16 as an upper bound
on VLEN, but this is good enough for now.

This allows us to remove sext and zext if user happens to assign
the size_t result into an int and then uses it as a VL intrinsic
argument which is size_t.

Reviewed By: frasercrmck, rogfer01, arcbbb

Differential Revision: https://reviews.llvm.org/D101472
2021-04-29 08:07:59 -07:00
Fraser Cormack 43ad058a01 [RISCV] Fix stack slot for argument types (Bug 49500)
This is an complementary/alternative fix for D99068. It takes a slightly
different approach by explicitly summing up all of the required split
part type sizes and ensuring we allocate enough space for them. It also
takes the maximum alignment of each part.

Compared with D99068 there are fewer changes to the stack objects in
existing tests. However, @luismarques has shown in that patch that there
are opportunities to reduce our stack usage in the future.

Reviewed By: luismarques

Differential Revision: https://reviews.llvm.org/D99087
2021-04-29 09:10:48 +01:00
Craig Topper ce09dd54e6 [RISCV] Select 5 bit immediate for VSETIVLI during isel rather than peepholing in the custom inserter.
This adds a special operand type that is allowed to be either
an immediate or register. By giving it a unique operand type the
machine verifier will ignore it.

This perturbs a lot of tests but mostly it is just slightly different
instruction orders. Something bad did happen to some min/max reduction
tests. We're spilling vector registers when we weren't before.

Reviewed By: khchen

Differential Revision: https://reviews.llvm.org/D101246
2021-04-27 14:38:16 -07:00
Craig Topper 262a72f50f [RISCV] Use stack slot to handle SPLAT_VECTOR_PARTS on RV32.
Reduces the amount of vector ALU operations and reduces vector
register pressure.
2021-04-26 15:43:02 -07:00
Craig Topper e2cd92cb9b [RISCV] Match splatted load to scalar load + splat. Form strided load during isel.
This modifies my previous patch to push the strided load formation
to isel. This gives us opportunity to fold the splat into a .vx
operation first. Using a scalar register and a .vx operation reduces
vector register pressure which can be important for larger LMULs.

If we can't fold the splat into a .vx operation, then it can make
sense to use a strided load to free up the vector arithmetic
ALU to do actual arithmetic rather than tying it up with vmv.v.x.

Reviewed By: khchen

Differential Revision: https://reviews.llvm.org/D101138
2021-04-26 13:32:03 -07:00
Craig Topper 837442de9c [RISCV] Cleanup setOperationAction calls for INTRINSIC_WO_CHAIN/INTRINSIC_W_CHAIN
We have several extensions that need i32 to be Custom for
INTRINSIC_WO_CHAIN with RV64 so enable it for all RV64.

For V extension, make i32 Custom for RV64 and i64 Custom for RV32.
When the i32 or i64 is legal, the operation action doesn't matter.
LegalizeDAG checks MVT::Other rather than the real type.
2021-04-25 23:44:28 -07:00
Craig Topper 8f5cd49405 [RISCV] Teach DAG combine what bits Zbp instructions demanded from their inputs.
This teaches DAG combine that shift amount operands for grev, gorc
shfl, unshfl only read a few bits.

This also teaches DAG combine that grevw, gorcw, shflw, unshflw,
bcompressw, bdecompressw only consume the lower 32 bits of their
inputs.

In the future we can teach SimplifyDemandedBits to also propagate
demanded bits of the output to the inputs in some cases.
2021-04-25 21:54:06 -07:00
Levy Hsu 8cf54c7ff5 [RISCV] [1/2] Add IR intrinsic for Zbe extension
RV32/64:
bcompress
bdecompress

RV64 ONLY:
bcompressw
bdecompressw

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D101143
2021-04-25 19:14:34 -07:00
Craig Topper bd28d86119 [RISCV] Removed getLMULForFixedLengthVector.
Use getContainerForFixedLengthVector and getRegClassIDForVecVT to
get the register class to use when making a fixed vector type legal.

Inline it into the other two call sites.

I'm looking into using fractional lmul for fixed length vectors
and getLMULForFixedLengthVector returned an integer making it
unable to express this. I considered returning the LMUL
enum, but that seemed like it would introduce more complexity to
convert it for use.
2021-04-23 16:56:46 -07:00
Craig Topper bcf321015b [RISCV] Move getLMULForFixedLengthVector out of RISCVSubtarget.
Make it a static function RISCVISelLowering, the only place it
is used.

I think I'm going to make this return a fractional LMULs in some
cases so I'm sorting out where it should live before I start
making changes.
2021-04-23 15:06:20 -07:00
Craig Topper baa107f018 [RISCV] Only expose one interface for getContainerForFixedLengthVector in the RISCVTargetLowering class
We can have RISCVISelDAGToDAG.cpp call the VT only version by
finding the RISCVTargetLowering object via the Subtarget.

Make the static versions just global static functions in
RISCVISelLowering that can be called by static functions in that
file.
2021-04-23 15:06:10 -07:00
Fraser Cormack 83b8f8da82 [RISCV] Custom lower vector F(MIN|MAX)NUM to vf(min|max)
This patch adds support for both scalable- and fixed-length vector code
lowering of the llvm.minnum and llvm.maxnum intrinsics to the equivalent
RVV instructions.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D101035
2021-04-23 12:22:15 +01:00
Levy Hsu b49337bbb9 [RISCV] [1/2] Add IR intrinsic for Zbp extension
RV32/64:
    grev
    grevi
    gorc
    gorci
    shfl
    shfli
    unshfl
    unshfli

RV64 ONLY:
    grevw
    greviw
    gorcw
    gorciw
    shflw
    shfli     (For non-existing shfliw)
    unshfli   (For non-existing unshfliw)

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D100830
2021-04-22 16:34:51 -07:00
Craig Topper 70254ccb69 [RISCV] Turn splat shuffles of vector loads into strided load with stride of x0.
Implementations are allowed to optimize an x0 stride to perform
less memory accesses. This is the case in SiFive cores.

No idea if this is the case in other implementations. We might
need a tuning flag for this.

Reviewed By: frasercrmck, arcbbb

Differential Revision: https://reviews.llvm.org/D100815
2021-04-22 10:02:57 -07:00
Craig Topper 77f14c96e5 [RISCV] Use stack temporary to splat two GPRs into SEW=64 vector on RV32.
Rather than doing splatting each separately and doing bit manipulation
to merge them in the vector domain, copy the data to the stack
and splat it using a strided load with x0 stride. At least on
some implementations this vector load is optimized to not do
a load for each element.

This is equivalent to how we move i64 to f64 on RV32.

I've only implemented this for the intrinsic fallbacks in this
patch. I think we do similar splatting/shifting/oring in other
places. If this is approved, I'll refactor the others to share
the code.

Differential Revision: https://reviews.llvm.org/D101002
2021-04-22 09:50:07 -07:00
Serge Pavlov 740962e5d0 [RISCV] Custom lowering of SET_ROUNDING
Differential Revision: https://reviews.llvm.org/D91242
2021-04-22 15:04:55 +07:00
Craig Topper 58c5b4c2c3 [RISCV] Use TargetConstant for condition code of RISCVISD::SELECT_CC.
The value is always an immediate and can never be in a register.
This the kind of thing TargetConstant is for.

Saves a step GenDAGISel to convert a Constant to a TargetConstant.
2021-04-21 23:08:52 -07:00
Serge Pavlov 6e63dfdae2 [RISCV] Custom lowering of FLT_ROUNDS_
Differential Revision: https://reviews.llvm.org/D90854
2021-04-22 11:39:15 +07:00
Craig Topper f6d8cf7798 [RISCV] Teach lowerSPLAT_VECTOR_PARTS to detect cases where Hi is sign extended from Lo.
This recognizes the case when Hi is (sra Lo, 31). We can use
SPLAT_VECTOR_I64 rather than splatting the high bits and
combining them in the vector register.
2021-04-21 20:24:23 -07:00
Craig Topper 87afefcd22 [RISCV] Fix mistake in comment. NFC 2021-04-19 11:15:32 -07:00
Craig Topper 7ed01a420a [RISCV] Pad v4i1/v2i1/v1i1 stores with 0s to make a full byte.
As noted in the FIXME there's a sort of agreement that the any
extra bits stored will be 0.

The generated code is pretty terrible. I was really hoping we
could use a tail undisturbed trick, but tail undisturbed no
longer applies to masked destinations in the current draft
spec.

Fingers crossed that it isn't common to do this. I doubt IR
from clang or the vectorizer would ever create this kind of store.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D100618
2021-04-19 11:05:18 -07:00
Fraser Cormack c9a93c3e01 [RISCV] Lower vector shuffles to vrgather operations
This patch extends the lowering of RVV fixed-length vector shuffles to
avoid the default stack expansion and instead lower to vrgather
instructions.

For "permute"-style shuffles where one vector is swizzled, we can lower
to one vrgather. For shuffles involving two vector operands, we lower to
one unmasked vrgather (or splat, where appropriate) followed by a masked
vrgather which blends in the second half.

On occasion, when it's not possible to create a legal BUILD_VECTOR for
the indices, we use vrgatherei16 instructions with 16-bit index types.

For 8-bit element vectors where we may have indices over 255, we have a
fairly blunt fallback to the stack expansion to avoid custom-splitting
of the vector types.

To enable the selection of masked vrgather instructions, this patch
extends the various RISCVISD::VRGATHER nodes to take a passthru operand.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D100549
2021-04-19 11:13:13 +01:00
Craig Topper 1afdfc6169 [RISCV] Rename RISCVISD::GREVI(W)/GORCI(W) to RISCVISD::GREV(W)/GORC(W). Don't require second operand to be a constant.
Prep work for adding intrinsics for these instructions in the
future.
2021-04-13 11:04:28 -07:00
Craig Topper 7c9bbbf735 [RISCV] Rename RISCVISD::SHFLI to RISCVISD::SHFL and don't require the second operand to be an immediate.
Prep work for adding intrinsics in the future.

Left an assert that the input is constant in ReplaceNodeResults,
as the intrinsic shouldn't go through that path.
2021-04-12 23:46:50 -07:00
Craig Topper cb4c793e46 [RISCV] Update computeKnownBitsForTargetNode to treat READ_VLENB as being 16 byte aligned.
According to the 0.10 spec, VLEN is at least 128 bits and is a
power of 2.
2021-04-11 17:54:23 -07:00
Craig Topper bc0e052730 [RISCV] Teach targetShrinkDemandedConstant to preserve (and X, 0xffff) when zext.h is supported.
Similar to what we do for zext.w.

Disable the (srl (and X, 0xffff), C) custom isel when zext.h is
available.
2021-04-11 10:03:35 -07:00
Fraser Cormack a5693445ca [RISCV] Support OR/XOR/AND reductions on vector masks
This patch adds RVV codegen support for OR/XOR/AND reductions for both
scalable- and fixed-length vector types. There are a few possible
codegen strategies for each -- vmfirst.m, vmsbf.m, and vmsif.m could be
used to some extent -- but the vpopc.m instruction was chosen since it
produces the scalar result in one instruction, after which scalar
instructions can finish off the computation.

The reductions are lowered identically for both scalable- and
fixed-length vectors, although some alternate strategies may be more
optimal on fixed-length vectors since it's cheaper to get the length of
those types.

Other reduction types were not deemed to be relevant for mask vectors.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D100030
2021-04-08 09:46:38 +01:00
Serge Pavlov 65b1103798 [RISCV] DAG nodes and pseudo instructions for CSR access
New custom DAG nodes were added to represent operations on CSR. These
nodes are lowered to corresponding pseudo instruction. Using the pseudo
instructions allows to specify different scheduling information for
operations on different system registers. It also make possible to
specify dependencies of instructions on specific system registers.

Differential Revision: https://reviews.llvm.org/D98936
2021-04-08 10:36:36 +07:00
Craig Topper 56ea2e2fdd [RISCV] Add a special case to lowerSELECT for select of 2 constants with a SETLT condition.
If the constants have a difference of 1 we can convert one to
the other by adding or subtracting the condition.

We have a DAG combine for this, but it only runs before type
legalization. If the select is introduced later during type
legalization or op legalization we will miss it.

We don't need a specific condition, but some conditions are
harder to materialize than others on RISCV. I know that SETLT
will be a single instruction and it is what is used by the
motivating pattern from signed saturating add/sub.

Differential Revision: https://reviews.llvm.org/D99021
2021-04-07 13:47:17 -07:00
Craig Topper f087d7544a [RISCV] Support vslide1up/down intrinsics for SEW=64 on RV32.
This can't use our normal strategy of splatting the scalar and using
a .vv operation instead of .vx.

Instead this patch bitcasts the vector to the equivalent SEW=32
vector and inserts the scalar parts using two vslide1up/down. We
do that unmasked and apply the mask separately at the end with
a vmerge.

For vslide1up there maybe some other options here like getting
i64 into element 0 and using vslideup.vi with this vector as
vd and the original source as vs1. Masking would still need to
be done afterwards.

That idea doesn't work for vslide1down. We need to slidedown and
then insert a single scalar at vl-1 which we could do with a
vslideup, but that assumes vl > 0 which I don't think we can assume.

The i32 double slide1down implemented here is the best I could come
up with and I just made vslide1up consistent.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D99910
2021-04-07 10:44:53 -07:00
Craig Topper 01a23dccb1 [RISCV] Add an assertion to the ReplaceNodeResults handling of bitcasts to make sure the VT is always a scalar integer. 2021-04-06 16:48:40 -07:00
Craig Topper 2641c1f15e [RISCV] Don't custom type legalize fixed vector to scalar integer bitcasts if the fixed vector type isn't legal.
We encountered a hang in our internal code base. I'm having trouble
creating a test case because the test that hit it was testing some
code that is not upstream.
2021-04-06 15:00:33 -07:00
Craig Topper af2837675a [RISCV] Split RISCVISD::VMV_S_XF_VL into separate integer and FP.
It's a bit silly, but it allows us to write stricter type
constraints for isel. There's still some extra type checks in
the generated table due to some type interference limitations
around HWMode.
2021-04-05 12:57:35 -07:00
Fraser Cormack af3a839c70 [RISCV] Add support for bitcasts between scalars and fixed-length vectors
This patch supports bitcasts from scalar types to fixed-length vectors
and vice versa. It custom-lowers and custom-legalizes them to
EXTRACT_VECTOR_ELT/INSERT_VECTOR_ELT operations, using a single-element
vectors to hold the scalar where appropriate.

Previously, some of these would fail to select, others would be expanded
through stack loads and stores. Effort was made to ensure the codegen
avoids the stack for both legal and illegal scalar types.

Some of the codegen could be improved, but on first glance it looks like
a general optimization of EXTRACT_VECTOR_ELT when extracting an i64
element on RV32.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D99667
2021-04-05 17:21:55 +01:00
Fraser Cormack 3f0df4d7b0 [RISCV] Expand scalable-vector truncstores and extloads
Caught in internal testing, these operations are assumed legal by
default, even for scalable vector types. Expand them back into separate
truncations and stores, or loads and extensions.

Also add explicit fixed-length vector tests for these operations, even
though they should have been correct already.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D99654
2021-04-05 17:03:45 +01:00
Craig Topper 4708a05da0 [RISCV] Use gorciw for i32 orc.b intrinsic when Zbp is enabled.
The W version of orc.b does not exist in Zbp so we need to use
gorci encoding. If we have Zbp, we can use gorciw which can avoid a
sext.w in some cases.
2021-04-04 17:14:28 -07:00
Craig Topper 98d5db3e3a [RISCV] Lower orc.b intrinsic to RISCVISD::GORCI.
This will allow us to share any future known bits, demaned bits,
or sign bits improvements.
2021-04-04 12:31:41 -07:00
Craig Topper a2ea003fcb [RISCV] Don't convert fshr/fshl to target specific FSL/FSR node if shift amount is a constant.
As long as it's a constant we can directly pattern match it
without any problems. It's only when it isn't a constant that
we need to add an AND.

In theory this should allow more target independent optimizations
to remain active.
2021-04-03 23:13:30 -07:00
Levy Hsu 944adbf285 Recommit "[RISCV] Add IR intrinsic for Zbb extension"
Forgot to amend the Author.

Original commit message:

Header files are included in a separate patch in case the name needs to be changed.

RV32 / 64:
orc.b

Differential Revision: https://reviews.llvm.org/D99320
2021-04-02 11:50:19 -07:00
Craig Topper 1f0b309f24 Revert "[RISCV] Add IR intrinsic for Zbb extension"
This reverts commit 1808194590.

I forgot to change the author.
2021-04-02 11:47:02 -07:00
Craig Topper 1808194590 [RISCV] Add IR intrinsic for Zbb extension
Header files are included in a separate patch in case the name needs to be changed.

RV32 / 64:
orc.b
2021-04-02 11:23:57 -07:00
Craig Topper dbbc95e3e5 [RISCV] Use softPromoteHalf legalization for fp16 without Zfh rather than PromoteFloat.
The default legalization strategy is PromoteFloat which keeps
half in single precision format through multiple floating point
operations. Conversion to/from float is done at loads, stores,
bitcasts, and other places that care about the exact size being 16
bits.

This patches switches to the alternative method softPromoteHalf.
This aims to keep the type in 16-bit format between every operation.
So we promote to float and immediately round for any arithmetic
operation. This should be closer to the IR semantics since we
are rounding after each operation and not accumulating extra
precision across multiple operations. X86 is the only other
target that enables this today. See https://reviews.llvm.org/D73749

I had to update getRegisterTypeForCallingConv to force f16 to
use f32 when the F extension is enabled. This way we can still
pass it in the lower bits of an FPR for ilp32f and lp64f ABIs.
The softPromoteHalf would otherwise always give i16 as the
argument type.

Reviewed By: asb, frasercrmck

Differential Revision: https://reviews.llvm.org/D99148
2021-04-01 12:41:57 -07:00
Craig Topper d157e3f387 [RISCV] Fix handling of nxvXi64 vmsgt(u).vx intrinsics on RV32.
We need to splat the scalar separately and use .vv, but there is
no vmsgt(u).vv. So add isel patterns to select vmslt(u).vv with
swapped operands.

We also need to get VT to use for the splat from an operand rather
than the result since the result VT is nxvXi1.

Reviewed By: HsiangKai

Differential Revision: https://reviews.llvm.org/D99704
2021-04-01 10:38:05 -07:00
Craig Topper b7c2e577cc [RISCV] Add custom type legalization to form MULHSU when possible.
There's no target independent ISD opcode for MULHSU, so custom
legalize 2*XLen multiplies ourselves. We have to be a little
careful to prefer MULHU or MULHSU.

I thought about doing this in isel by pattern matching the
(add (mul X, (srai Y, XLen-1)), (mulhu X, Y)) pattern. I decided
against this because the add might become part of a chain of adds.
I don't trust DAG combine not to reassociate with other adds making
it difficult to find both pieces again.

Reviewed By: asb

Differential Revision: https://reviews.llvm.org/D99479
2021-04-01 10:15:55 -07:00
Craig Topper 2a8b7cab6a [RISCV] Add RISCVISD opcodes for CLZW and CTZW.
Our CLZW isel pattern is quite easily broken by surrounding code
preventing it from matching sometimes. This usually results in
failing to remove the and X, 0xffffffff inserted by type
legalization. The add with -32 that type legalization also inserts
will often gets combined into other add/sub nodes. That doesn't
usually result in extra code when we don't use clzw.

CTTZ seems to be less fragile, but I wanted to keep it consistent
with CTLZ.

Reviewed By: asb, HsiangKai

Differential Revision: https://reviews.llvm.org/D99317
2021-03-31 09:40:07 -07:00
Fraser Cormack 10fc6e4358 [RISCV] Add support for the stepvector intrinsic
This adds almost everything required for supporting the new stepvector
intrinsic on RVV. It is lowered to the existing VID_VL SDNode.

The only exception is a limitation that RV32 cannot yet lower the
intrinsic on i64 vectors. This is because the step operand is
(currently) required to be at least as large as the vector element type.
I will look into patching that out and loosening the requirement to only
an integer pointer type.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D99594
2021-03-31 11:41:17 +01:00
Craig Topper a33fcafaf0 [RISCV] Pass 'half' in the lower 16 bits of an f32 value when F extension is enabled, but Zfh is not.
Without Zfh the half type isn't legal, but it could still be
used as an argument/return in IR. Clang will not generate this today.

Previously we promoted the half value to float for arguments and
returns if the F extension is enabled but Zfh isn't. Then depending on
which ABI is enabled we would pass it in either an FPR or a GPR in
float format.

If the F extension isn't enabled, it would get passed in the lower
16 bits of a GPR in half format.

With this patch the value will always in half format and will be
in the lower bits of a GPR or FPR. This should be consistent
with where the bits are located when Zfh is enabled.

I've based this implementation off of how this is done on ARM.

I've manually nan-boxed the value to 32 bits using integer ops.
It looks like flw, fsw, fmv.s, fmv.w.x, fmf.x.w won't
canonicalize nans so should leave the value alone. I think those
are the instructions that could get used on this value.

Reviewed By: kito-cheng

Differential Revision: https://reviews.llvm.org/D98670
2021-03-30 09:47:54 -07:00
Craig Topper f069000b43 [RISCV] Remove floating point condition code legalization from lowerFixedLengthVectorSetccToRVV.
After D98939, this is done by LegalizeVectorOps making this code dead.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D99519
2021-03-30 09:11:56 -07:00
Craig Topper c40cea6f08 [RISCV] Teach targetShrinkDemandedConstant to preserve (and X, 0xffffffff).
We look for this pattern frequently in isel patterns so its a
good idea to try to preserve it.

This also let's us remove our special isel handling for srliw
and use a direct pattern match of (srl (and X, 0xffffffff), C)
since no bits will be removed from the and mask.

Differential Revision: https://reviews.llvm.org/D99042
2021-03-25 09:03:25 -07:00
Fraser Cormack 99211352c1 [RISCV] Optimize select-like vector shuffles
This patch adds a small optimization for vector shuffle lowering,
detecting shuffles which can be re-expressed as vector selects.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D99270
2021-03-25 11:39:57 +00:00
Fraser Cormack 321a71a772 [RISCV] Optimize BUILD_VECTOR sequences that reveal hidden splats
This patch adds further optimization techniques to RVV BUILD_VECTOR
lowering. It teaches the compiler to find splats of larger vector
element types "hidden" in smaller ones. For example, a v4i8 build_vector
(0x1, 0x2, 0x1, 0x2) could be splat as v2i16 0x0201. This is generally
more optimal than the dominant-element BUILD_VECTORs and so takes
priority.

This optimization is currently limited to all-constant-or-undef
BUILD_VECTORs as those were found to be the most common. There's no
reason this couldn't be extended to other BUILD_VECTORs, but the
additional bit-manipulation instructions may require more sophisticated
heuristics.

There are some cases where the materialization of the larger constant
takes more scalar instructions than it does to build the vector with
vector instructions. We could add heuristics to try and catch this.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D99195
2021-03-25 10:35:31 +00:00
Craig Topper 0f99c6c56e [RISCV] Remove duplicate DebugLoc variables from cases in ReplaceNodeResults. NFC
We already created a DebugLoc at the top of the function. We can
just use that one.
2021-03-24 20:23:03 -07:00
Fraser Cormack feff66a082 [RISCV] Further optimize BUILD_VECTORs with repeated elements
This patch builds upon the initial BUILD_VECTOR work introduced in
D98700. It further optimizes the lowering of BUILD_VECTOR by using
VSELECT operations to effectively insert repeated elements into the
vector with relatively few instructions. This allows us to optimize more
BUILD_VECTORs without significantly increasing the size of the generated
code.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D98969
2021-03-23 14:14:48 +00:00
Fraser Cormack 5bfbd9d938 [RISCV] Optimize all-constant mask BUILD_VECTORs
This patch adds an optimization for mask-vector BUILD_VECTOR nodes whose
elements are all constants or undef. It lowers such operations by
building up the vector via a series of integer operations, in which
multiple mask elements are inserted into a vector at a time via
i8/i16/i32/i64 element types. The final result is then bitcast from that
integer vector.

We restrict this optimization in certain circumstances when optimizing
for size. If we are required to use more than one integer insert
operation, then it will likely increase code size compared with using a
load from a constant pool.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D98860
2021-03-23 10:11:19 +00:00
Craig Topper 294efcd6f7 [RISCV] Add support for fixed vector masked gather/scatter.
I've split the gather/scatter custom handler to avoid complicating
it with even more differences between gather/scatter.

Tests are the scalable vector tests with the vscale removed and
dropped the tests that used vector.insert. We're probably not
as thorough on the splitting cases since we use 128 for VLEN here
but scalable vector use a known min size of 64.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D98991
2021-03-22 10:17:30 -07:00
Craig Topper 5d315691c4 [RISCV] Add missing bitcasts to the results of lowerINSERT_SUBVECTOR and lowerEXTRACT_SUBVECTOR when handling mask vectors.
Found by adding asserts to LegalizeDAG to catch incorrect result
types being returned.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D98964
2021-03-19 10:54:33 -07:00
Craig Topper 85f3f6b3cc [RISCV] Lower scalable vector masked loads to intrinsics to match fixed vectors and reduce isel patterns.
Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D98840
2021-03-19 10:39:35 -07:00
Fraser Cormack d399b82e2a [RISCV] Maintain fixed-length info when optimizing BUILD_VECTORs
I'm not sure how I failed to notice this before, but when optimizing
dominant-element BUILD_VECTORs we would lower via the scalable container type,
which lost us the information about the fixed length of the vector types. By
lowering via the fixed-length type we can preserve that information and
eliminate redundant vsetvli instructions.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D98938
2021-03-19 17:21:06 +00:00
Fraser Cormack 550292ecb1 [RISCV] Fix missing scalable->fixed-length vector conversion
Returning the scalable-vector container type would present problems when
the fixed-length INSERT_VECTOR_ELT was used by later operations.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D98776
2021-03-19 16:49:47 +00:00
Craig Topper c9861f722e [RISCV] Correct the output chain in lowerFixedLengthVectorMaskedLoadToRVV
We returned the input chain instead of the output chain from the
new load. This bypasses the load in the chain. I haven't found a
good way to test this yet. IR order prevents my initial attempts
at causing reordering.
2021-03-18 16:34:35 -07:00
Fraser Cormack 3495031a39 [RISCV] Support scalable-vector masked scatter operations
This patch adds support for masked scatter intrinsics on scalable vector
types. It is mostly an extension of the earlier masked gather support
introduced in D96263, since the addressing mode legalization is the
same.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D96486
2021-03-18 10:17:50 +00:00
Fraser Cormack 0331399dc9 [RISCV] Support scalable-vector masked gather operations
This patch supports the masked gather intrinsics in RVV.

The RVV indexed load/store instructions only support the "unsigned unscaled"
addressing mode; indices are implicitly zero-extended or truncated to XLEN and
are treated as byte offsets. This ISA supports the intrinsics directly, but not
the majority of various forms of the MGATHER SDNode that LLVM combines to. Any
signed or scaled indexing is extended to the XLEN value type and scaled
accordingly. This is done during DAG combining as widening the index types to
XLEN may produce illegal vectors that require splitting, e.g.
nxv16i8->nxv16i64.

Support for scalable-vector CONCAT_VECTORS was added to avoid spilling via the
stack when lowering split legalized index operands.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D96263
2021-03-18 09:26:18 +00:00
Fraser Cormack c2b4600ec8 [RISCV] Support bitcasts of fixed-length mask vectors
Without this patch, bitcasts of fixed-length mask vectors would go
through the stack.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D98779
2021-03-18 08:52:42 +00:00
Craig Topper 696ddef569 [RISCV] Support masked load/store for fixed vectors.
Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D98561
2021-03-17 10:26:15 -07:00
Fraser Cormack 70251759a2 [RISCV] Optimize "dominant element" BUILD_VECTORs
This patch adds an optimization path for BUILD_VECTOR nodes where the
majority of the elements are identical. These can be splatted, with the
remaining elements patched up with INSERT_VECTOR_ELTs. The threshold can
be tweaked as required - it is currently conservative. Undef elements
are disregarded when judging the dominance of a particular element. This
allows them to be covered by the splat value.

In addition, vectors of 2 elements are always optimized to a splat (for
the upper element) and an insert at element zero.

This optimization is disabled when optimizing for size.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D98700
2021-03-17 10:09:04 +00:00
Craig Topper 229eeb187d [RISCV] Look through copies when trying to find an implicit def in addVSetVL.
The InstrEmitter can sometimes insert a copy after an IMPLICIT_DEF
before connecting it to the vector instruction. This occurs when
constrainRegClass reduces to a class with less than 4 registers.
I believe LMUL8 on masked instructions triggers this since the
result can only use the v8, v16, or v24 register group as the mask
is using v0.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D98567
2021-03-16 07:59:09 -07:00
Craig Topper a33ce06cf5 [RISCV] Improve i32 UADDSAT/USUBSAT on RV64.
The default promotion uses zero extends that become shifts. We
cam use sign extend instead which is better for RISCV.

I've used two different implementations based on whether we
have minu/maxu instructions.

Differential Revision: https://reviews.llvm.org/D98683
2021-03-16 07:44:06 -07:00
Craig Topper 41759c3d92 [RISCV] Add RISCVISD::BR_CC similar to RISCVISD::SELECT_CC.
This allows me to introduce similar combines for branches as
we have recently added for SELECT_CC. Some of them are less
useful for standalone setccs and only help branch instructions.
By having a BR_CC node its easier to only affect branches.

I'm using CondCodeSDNode to make isel patterns easier to
write so we can refer to the codes by name. SELECT_CC uses a
constant instead.

I've translated the condition code just like SELECT_CC so
we need less patterns for the swapped conditions. This
includes special cases for X < 1 and X > -1 that get translated
to blez and bgez by using a 0 constant.

computeKnownBitsForTargetNode support for SELECT_CC is added
to allow MaskedValueIsZero to work for cases where the true
and false values of the SELECT_CC are setccs and the
result of the SELECT_CC is used by a BR_CC. This was needed
to avoid regressions in some of the overflow tests.

Reviewed By: luismarques

Differential Revision: https://reviews.llvm.org/D98159
2021-03-15 11:54:01 -07:00
Craig Topper 3dc5b533e0 [RISCV] Improve legalization of i32 UADDO/USUBO on RV64.
The default legalization uses zero extends that require pair of shifts
on RISCV. Instead we can take advantage of the fact that unsigned
compares work equally well on sign extended inputs. This allows
us to use addw/subw and sext.w.

Reviewed By: luismarques

Differential Revision: https://reviews.llvm.org/D98233
2021-03-15 09:30:23 -07:00
Fraser Cormack 0c5b789c73 [RISCV] Support fixed-length vectors in the calling convention
This patch adds fixed-length vector support to the calling convention
when RVV is used to lower fixed-length vectors. The scheme follows the
regular vector calling convention for the argument/return registers, but
uses scalable vector container types as the LocVTs, and converts to/from
the fixed-length vector value types as required.

Fixed-length vector types may be split when the combination of minimum
VLEN and the maximum allowable LMUL is not large enough to fully contain
the vector. In this case the behaviour differs between fixed-length
vectors passed as parameters and as return values:
1. For return values, vectors must be passed entirely via registers or
via the stack.
2. For parameters, unlike scalar values, split vectors continue to be
passed by value, and are split across multiple registers until there are
no remaining registers. Thus vector parameters may be found partly in
registers and partly on the stack.

As with scalable vectors, the first fixed-length mask vector is passed
via v0. Split mask fixed-length vectors are passed first via v0 and then
via the next available vector register: v8,v9,etc.

The handling of vector return values uses all available argument
registers v8-v23 which does not adhere to the calling convention we're
supposedly implementing, but since this issue affects both fixed-length
and scalable-vector values, it was left as-is.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D97954
2021-03-15 10:43:51 +00:00
Hsiangkai Wang a81dff1e58 [RISCV] Support inline asm for vector instructions.
Types of fractional LMUL and LMUL=1 are all using VR register class. When
using inline asm, it will use the first type in the register class as the
type for the register. It is not necessary the same as the value type. We
need to use INSERT_SUBVECTOR/EXTRACT_SUBVECToR/BITCAST to make it legal
to put the value in the corresponding register class.

Differential Revision: https://reviews.llvm.org/D97480
2021-03-15 11:02:18 +08:00
Craig Topper 51151828ac [RISCV] Teach normaliseSetCC to canonicalize X > -1 to X >= 0 and X < 1 to 0 >= X.
This allows the use of BGE with X0 instead of puting -1/1 in a
register.

Reviewed By: jrtc27

Differential Revision: https://reviews.llvm.org/D98542
2021-03-12 11:50:10 -08:00
Fraser Cormack 641f5700f9 [RISCV] Optimize INSERT_VECTOR_ELT sequences
This patch optimizes the codegen for INSERT_VECTOR_ELT in various ways.
Primarily, it removes the use of vslidedown during lowering, and the
vector element is inserted entirely using vslideup with a custom VL and
slide index.

Additionally, lowering of i64-element vectors on RV32 has been optimized
in several ways. When the 64-bit value to insert is the same as the
sign-extension of the lower 32-bits, the codegen can follow the regular
path. When this is not possible, a new sequence of two i32 vslide1up
instructions is used to get the vector element into a vector. This
sequence was suggested by @craig.topper. From there, the value is slid
into the final position for more consistent lowering across RV32 and
RV64.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D98250
2021-03-12 09:13:38 +00:00
Fraser Cormack 4d2d5855c7 [RISCV] Fix up stale VECREDUCE comments. NFC.
Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D98399
2021-03-12 08:49:46 +00:00
Craig Topper 1d26bbcf9b [RISCV] Return false from isShuffleMaskLegal except for splats.
We don't support any other shuffles currently.

This changes the bswap/bitreverse tests that check for this in
their expansion code. Previously we expanded a byte swapping
shuffle through memory. Now we're scalarizing and doing bit
operations on scalars to swap bytes.

In the future we can probably use vrgather.vx to do a byte swap
shuffle.
2021-03-11 20:02:49 -08:00
Craig Topper c82f442954 [RISCV] Support fixed vector copysign.
Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D98394
2021-03-11 09:57:24 -08:00
Craig Topper 0dff8a9627 [RISCV] Handle vmv.x.s intrinsic for i64 vectors on RV32.
Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D98372
2021-03-11 09:39:50 -08:00
Craig Topper 9c841cb8e8 [RISCV] Support extract_vector_elt for fixed and scalable masked registers.
This uses a really simple approach of converting to an i8 vector
and extracting. This is probably not the best approach especially
if you know the index is constant.

Other ideas:
-Store to stack temporary using vse1, load as scalar and shift.
-Sort of bitcast the vector to a vector of i8, slide down the
 appropriate 8 bit element, copy to scalar, shift down the
 correct bit within the 8 bits we extracted. Not exactly sure
 how to describe such a bitcast from i1 vector to i8 vector
 within the type system for elements less than 8.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D98310
2021-03-11 09:26:44 -08:00
Craig Topper 9106d04554 [RISCV][SelectionDAG] Introduce an ISD::SPLAT_VECTOR_PARTS node that can represent a splat of 2 i32 values into a nxvXi64 vector for riscv32.
On riscv32, i64 isn't a legal scalar type but we would like to
support scalable vectors of i64.

This patch introduces a new node that can represent a splat made
of multiple scalar values. I've used this new node to solve the current
crashes we experience when getConstant is used after type legalization.

For RISCV, we are now default expanding SPLAT_VECTOR to SPLAT_VECTOR_PARTS
when needed and then handling the SPLAT_VECTOR_PARTS later during
LegalizeOps. I've remove the special case I previously put in for
ABS for D97991 as the default expansion is now able to succesfully
use getConstant.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D98004
2021-03-10 09:46:18 -08:00
Craig Topper 0c73a506e8 [RISCV] Starting fixing issues that prevent us from testing vXi64 intrinsics on RV32.
Currently we crash in type legalization any time an intrinsic
uses a scalar i64 on RV32.

This patch adds support for type legalizing this to prevent
crashing. I don't promise that it uses the best possible codegen
just that it is functional.

This first version handles 3 cases. vmv.v.x intrinsic, vmv.s.x
intrinsic and intrinsics that take a scalar input, splat it and
then do some operation.

For vmv.v.x we'll either rely on hardware sign extension for
constants or we'll convert it to multiple splats and bit
manipulation.

For vmv.s.x we use a really unoptimal sequence inspired by what
we do for an INSERT_VECTOR_ELT.

For the third case we'll either try to use the .vi form for
constants or convert to a complicated splat and bitmanip and use
the .vv form of the operation.

I've renamed the ExtendOperand field to SplatOperand now use it
specifically for the third case. The first two cases are handled
by custom lowering specifically for those intrinsics.

I haven't updated all tests yet, but I tried to cover a subset
that includes single-width, widening, and narrowing.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D97895
2021-03-10 09:45:38 -08:00
Craig Topper 1e39118638 [RISCV] Manually split vector operands to VECREDUCE when handling vXi64 vectors on RV32.
The type legalizer will visit the result before the operands. To
avoid creating an illegal target specific node or falling back to
scalarization, we need to manually split vector operands.

This still doesn't handle the case of non-power of 2 operands
which need to be widened. I'm not sure the type legalizer is
ready for it. I think we would need to insert an
INSERT_SUBVECTOR with the power of 2 type we want, with an undef
first operand, and the non-power of 2 orignal operand as the vector
to insert. Then fill in the neutral elements into the elements the
padded elements. Alternatively we INSERT_SUBVECTOR into a neutral vector.
From there we carry on splitting if needed to get to a legal type
then do the target specific code.

The problem with this is the type legalizer doesn't know how to
widen an insert_subvector yet. We would need to add that including
the handling for a non-undef first vector.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D98292
2021-03-10 09:27:38 -08:00
Craig Topper 351844edf1 [RISCV] Add support for VECTOR_REVERSE for scalable vector types.
I've left mask registers to a future patch as we'll need
to convert them to full vectors, shuffle, and then truncate.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D97609
2021-03-09 10:03:45 -08:00
Craig Topper 77ac3166e5 [RISCV] Add support for fixed vector reductions.
I've included tests that require type legalization to split the
vector. The i64 version of these scalarizes on RV32 due to type
legalization visiting the result before the vector type. So we
have to abort our custom expansion to avoid creating target
specific nodes with an illegal type. Then type legalization ends
up scalarizing. We might be able to fix this by doing custom
splitting for large vectors in our handler to get down to a legal
type.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D98102
2021-03-09 09:39:59 -08:00
Craig Topper 1c7ad4dd88 [RISCV] Don't modify the SEW immediate on the V extension pseudo instructions after inserting VSETVLI.
Previously we set the value to -1, but the SEW information could
be useful for scheduling.

Reviewed By: frasercrmck, rogfer01

Differential Revision: https://reviews.llvm.org/D98062
2021-03-09 09:02:19 -08:00
Craig Topper 72ecf2f43f [RISCV] Optimize fixed vector ABS. Fix crash on scalable vector ABS for SEW=64 with RV32.
The default fixed vector expansion uses sra+xor+add since it can't
see that smax is legal due to our custom handling. So we select
smax(X, sub(0, X)) manually.

Scalable vectors are able to use the smax expansion automatically
for most cases. It crashes in one case because getConstant can't build a
SPLAT_VECTOR for nxvXi64 when i64 scalars aren't legal. So
we manually emit a SPLAT_VECTOR_I64 for that case.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D97991
2021-03-09 08:51:03 -08:00
Craig Topper 7a64cc4a76 [RISCV] Make use of DAG.getNeutralElement in lowerVECREDUCE to avoid repeating the same list of constants. NFC
Reviewed By: frasercrmck, khchen

Differential Revision: https://reviews.llvm.org/D98091
2021-03-08 09:11:10 -08:00
Fraser Cormack 18173c57bd [RISCV] Add new entry points to getContainerForFixedLengthVector
While working on adding fixed-length vectors to the calling convention,
it was necessary to be able to query for a fixed-length vector container
type without access to an instance of SelectionDAG.

This patch modifies the "main" getContainerForFixedLengthVector function
to use an instance of TargetLowering rather than SelectionDAG, and
preserves the SelectionDAG overload as a wrapper.

An additional non-static version of the function was also added to
simplify the common case in RISCVTargetLowering.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D97925
2021-03-08 09:26:19 +00:00
Craig Topper c91b3c9e63 [RISCV] Fold (select_cc (setlt X, Y), 0, ne, trueV, falseV) -> (select_cc X, Y, lt, trueV, falseV)
A setcc can be created during LegalizeDAG after select_cc has been
created. This combine will enable us to fold these late setccs.

Reviewed By: luismarques

Differential Revision: https://reviews.llvm.org/D98132
2021-03-07 09:44:56 -08:00
Craig Topper fdbd5d3206 [RISCV] Fold (select_cc (xor X, Y), 0, eq/ne, trueV, falseV) -> (select_cc X, Y, eq/ne, trueV, falseV)
This pattern occurs when lowering for overflow operations
introduce an xor after select_cc has already been formed.

I had to rework another combine that looked for select_cc of an xor
with 1. That xor will now get combined away so we just need to
look for the RHS of the select_cc being 1.

Reviewed By: luismarques

Differential Revision: https://reviews.llvm.org/D98130
2021-03-07 09:29:55 -08:00
Fraser Cormack 8e7ceffd0b [RISCV] Fix crash when inserting large fixed-length subvectors
This patch addresses a compiler crash resulting from passing a
fixed-length type to one that expects scalable vector types. An
assertion was added to prevent this regressing in the future.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D97868
2021-03-04 09:27:16 +00:00
Fraser Cormack d8e1d2ebf4 [RISCV] Preserve fixed-length VL on insert_vector_elt in more cases
This patch fixes up one case where the fixed-length-vector VL was
dropped (falling back to VLMAX) when inserting vector elements, as the
code would lower via ISD::INSERT_VECTOR_ELT (at index 0) which loses the
fixed-length vector information.

To this end, a custom node, VMV_S_XF_VL, was introduced to carry the VL
operand through to the final instruction. This node wraps the RVV
vmv.s.x and vmv.s.f instructions, which were being selected by
insert_vector_elt anyway.

There should be no observable difference in scalable-vector codegen.

There is still one outstanding drop from fixed-length VL to VLMAX, when
an i64 element is inserted into a vector on RV32; the splat (which is
custom legalized) has no notion of the original fixed-length vector
type.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D97842
2021-03-04 09:21:10 +00:00
Fraser Cormack c1695ddf7d [RISCV] Support fixed-length INSERT_VECTOR_ELT
This patch enables support for lowering INSERT_VECTOR_ELT on
fixed-length vector types. The strategy follows that for scalable vector
types.

This patch also includes a quick fix to prevent the compiler infinitely
looping between lowering BUILD_VECTOR as VECTOR_SHUFFLE and back again.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D97698
2021-03-02 16:48:38 +00:00
Fraser Cormack de2b70010a [RISCV] Lower CONCAT_VECTORS to INSERT_SUBVECTOR nodes
The default expansion of CONCAT_VECTORS goes through the stack. This
patch avoids that penalty by custom-lowering CONCAT_VECTORS to a series
of INSERT_SUBVECTOR nodes. Futher optimizations are possible, but this
is a good start.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D97692
2021-03-02 11:13:59 +00:00
Fraser Cormack 3fea9226ee [RISCV] Support INSERT_SUBVECTOR on vector masks
Like with EXTRACT_SUBVECTOR, INSERT_SUBVECTOR poses a problem
for vector masks as RVV isn't able to slide mask types around. We choose
instead to bitcast to equivalently-sized i8 types where we can, else we
zero-extend, perform the operation, and truncate back down.

One test was left disabled due to a crash in the legalizer.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D97559
2021-03-01 12:04:11 +00:00
Fraser Cormack e80ca3af82 [RISCV] Fix INSERT/EXTRACT_SUBVECTOR on fractional LMUL types
This patch fixes a bug where the lowering for INSERT_SUBVECTOR and
EXTRACT_SUBVECTOR would insist on first extracting a register-aligned
LMUL1 vector type before perfoming the slide up/down. This was even if
the vector was a fractional LMUL type, in which case the aligned
EXTRACT_SUBVECTOR was invalid.

This issue only occurred for scalable vector types, but a variety of
tests for both scalable and fixed-length vectors have been added to
ensure this does not regress in the future.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D97556
2021-03-01 11:51:05 +00:00
Fraser Cormack 4ea734e6ec [RISCV] Unify scalable- and fixed-vector INSERT_SUBVECTOR lowering
This patch unifies the two disparate paths for lowering INSERT_SUBVECTOR
operations under one roof. Consequently, with this patch it is possible to
support any fixed-length subvector insertion, not just "cast-like" ones.

As before, support for the insertion of mask vectors will come in a
separate patch.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D97543
2021-03-01 11:38:47 +00:00
Fraser Cormack bd4d421688 [RISCV] Support EXTRACT_SUBVECTOR on vector masks
This patch adds support for extracting subvectors from vector masks.
This can be either extracting a scalable vector from another, or a fixed-length
vector from a fixed-length or scalable vector.

Since RVV lacks a way to slide vector masks down on an element-wise
basis and we don't know the true length of the vector registers, in many
cases we must resort to using equivalently-sized i8 vectors to perform
the operation. When this is not possible we fall back and extend to a
suitable i8 vector.

Support was also added for fixed-length truncation to mask types.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D97475
2021-03-01 11:20:09 +00:00
Fraser Cormack 37014db013 [RISCV] Use existing method for the LMUL1 type. NFCI.
Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D97467
2021-02-26 09:44:05 +00:00
Craig Topper d7fca3f0bf [RISCV] Support fixed vector extract_element for FP types. 2021-02-25 16:30:28 -08:00
Fraser Cormack 02f435db0b [RISCV] Support fixed-length vector i2fp/fp2i conversions
This patch extends the support for scalable-vector int->fp and fp->int
conversions by additionally handling fixed-length vectors.

The existing scalable-vector lowering re-expresses widening/narrowing by
x4+ conversions as standard nodes. The fixed-length vector support slots
in at "the end" of this process by lowering the now equally-sized and
widening/narrowing by x2 nodes to our custom VL versions.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D97374
2021-02-25 13:47:58 +00:00
Fraser Cormack 9620ce90d7 [RISCV] Support fixed-length vector FP_ROUND & FP_EXTEND
This patch extends the support for vector FP_ROUND and FP_EXTEND by
including support for fixed-length vector types. Since fixed-length
vectors use "VL" nodes and scalable vectors can use the standard nodes,
there is slightly more to do in the fixed-length case. A helper function
was introduced to try and reduce the divergent paths. It is expected
that this function will similarly come in useful for lowering the
int-to-fp and fp-to-int operations for fixed-length vectors.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D97301
2021-02-25 12:16:06 +00:00
Fraser Cormack 84413e1947 [RISCV] Support fixed-length vector truncates
This patch extends support for our custom-lowering of scalable-vector
truncates to include those of fixed-length vectors. It does this by
co-opting the custom RISCVISD::TRUNCATE_VECTOR node and adding mask and
VL operands. This avoids unnecessary duplication of patterns and
inflation of the ISel table.

Some truncates go through CONCAT_VECTORS which currently isn't
efficiently handled, as it goes through the stack. This can be improved
upon in the future.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D97202
2021-02-25 12:11:34 +00:00
Fraser Cormack 3bc5ed3875 [RISCV] Support fixed-length vector sign/zero extension
This patch adds support for the custom lowering sign- and zero-extension
of fixed-length vector types. It does so through custom nodes. Since the
source and destination types are (necessarily) of different sizes, it is
possible that the source type is legal whilst the larger destination
type isn't. In this case the legalization makes heavy use of
EXTRACT_SUBVECTOR.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D97194
2021-02-25 12:05:17 +00:00
Fraser Cormack 821f8bb29a [RISCV] Unify scalable- and fixed-vector EXTRACT_SUBVECTOR lowering
This patch unifies the two disparate paths for lowering
EXTRACT_SUBVECTOR operations under one roof. Consequently, with this
patch it is possible to support any fixed-length subvector extraction,
not just "cast-like" ones.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D97192
2021-02-25 11:46:57 +00:00
Craig Topper efcdd598b7 [RISCV] Teach VSETVLI inserter to use VSETIVLI when possible.
We always create the VL operand using a register, but if we can
determine that it came from an ADDI X0, imm with a sufficiently
small immediate, we can use VSETIVLI.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D97332
2021-02-24 16:07:33 -08:00
Craig Topper 086670d367 [RISCV] Support fixed vector extract element. Use VL=1 for scalable vector extract element.
I've changed to use VL=1 for slidedown and shifts to avoid extra
element processing that we don't need.

The i64 fixed vector handling on i32 isn't great if the vector type
isn't legal due to an ordering issue in type legalization. If the
vector type isn't legal, we fall back to default legalization
which will bitcast the vector to vXi32 and use two independent extracts.
Doing better will require handling several different cases by
manually inserting insert_subvector/extract_subvector to adjust the type
to a legal vector before emitting custom nodes.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D97319
2021-02-24 10:17:00 -08:00
Fraser Cormack dd68f3cf28 [RISCV] Support insertion of misaligned subvectors
This patch extends the support for RVV INSERT_SUBVECTOR to cover those
which don't align to a vector register boundary. Like the support for
EXTRACT_SUBVECTOR in D96959, it accomplishes this by extracting the
nearest register-sized subvector (a subregister operation), then sliding
the vector down with VSLIDEDOWN, inserting the subvector to the first
position, and sliding the vector back up again afterwards.

Unlike subvector extraction, for vectors that occupy less than a full
vector register we must preserve the untouched elements. We do this by
lowering to an LMUL=1 INSERT_SUBVECTOR using the above method and
lowering that to a VSLIDEUP with a zero offset. This uses a
tail-undisturbed policy and so has the effect of "sliding in" the
subvector elements while preserving the surrounding ones.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D96972
2021-02-23 10:31:06 +00:00
Craig Topper 1cd2a5a7da [RISCV] Add isel support for bitcasts between fixed vector types.
This should fix the issue reported in D96972.

I don't have a good test case for this without those changes.

Differential Revision: https://reviews.llvm.org/D97082
2021-02-22 12:05:46 -08:00
Craig Topper 1aeb927fed [RISCV] Custom isel the rest of the vector load/store intrinsics.
A previous patch moved the index versions. This moves the rest.
I also removed the custom lowering for VLEFF since we can now
do everything directly in the isel handling.

I had to update getLMUL to handle mask registers to index the
pseudo table correctly for VLE1/VSE1.

This is good for another 15K reduction in llc size.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D97097
2021-02-22 09:53:46 -08:00
Fraser Cormack 3e1317fd32 [RISCV] Support extraction of misaligned subvectors
This patch extends the support for RVV EXTRACT_SUBVECTOR to cover those
which don't align to a vector register boundary. It accomplishes this by
extracting the nearest register-sized subvector (a subregister
operation), then sliding the vector down with VSLIDEDOWN and extracting
the subvector from the first position (a COPY operation).

Since this procedure involves the use of VSCALE and multiplication, the
handling of such operations is done during lowering to simplify the
implementation and make use of DAG combining. This necessitated moving
some helper functions from RISCVISelDAGToDAG to RISCVTargetLowering.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D96959
2021-02-20 15:43:54 +00:00
Craig Topper 98dff5e804 [RISCV] Move SHFLI matching to DAG combine. Add 32-bit support for RV64
We previously used isel patterns for this, but that used quite
a bit of space in the isel table due to OR being associative
and commutative. It also wouldn't handle shifts/ands being in
reversed order.

This generalizes the shift/and matching from GREVI to
take the expected mask table as input so we can reuse it for
SHFLI.

There is no SHFLIW instruction, but we can promote a 32-bit
SHFLI to i64 on RV64. As long as bit 4 of the control bit isn't
set, a 64-bit SHFLI will preserve 33 sign bits if the input had
at least 33 sign bits. ComputeNumSignBits has been updated to
account for that to avoid sext.w in the tests.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D96661
2021-02-19 10:07:12 -08:00
Craig Topper 156fc07e19 [RISCV] Add support for fixed vector MULHU/MULHS.
This uses to division by constant optimization to use MULHU/MULHS.

Reviewed By: frasercrmck, arcbbb

Differential Revision: https://reviews.llvm.org/D96934
2021-02-18 09:15:08 -08:00
Craig Topper 792627be35 [RISCV] Add support for fixed vector sign/zero extend from mask types.
Due to vXi64 on RV32, I've directly emitted this using _VL ISD
opcodes. If it wasn't for that we could just use fixed vector
BUILD_VECTOR and VSELECT and let those each be legalized.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D96910
2021-02-18 09:08:10 -08:00
Craig Topper 016eca8f90 [RISCV] Guard LowerINSERT_VECTOR_ELT against fixed vectors.
The type legalizer can call this code based on the scalar type so
we need to verify the vector type is a scalable vector.

I think due to how type legalization visits nodes, the vector type
will have already been legalized so we don't have an issue with
using MVT here like we did for EXTRACT_VECTOR_ELT.
I've added a test just in case.
2021-02-17 19:27:08 -08:00
Craig Topper 00c4e0a8f6 [RISCV] Guard the ISD::EXTRACT_VECTOR_ELT handling in ReplaceNodeResults against fixed vectors and non-MVT types.
The type legalizer is calling this code based on the scalar type so
we need to verify the input type is a scalable vector.

The vector type has also not been legalized yet when this is called
so we need to use EVT for it.
2021-02-17 18:25:38 -08:00
Craig Topper 3bdd02735b [RISCV] Localize RISCVZvlssegTable to RISCVISelDAGToDAG.cpp, the only place it is used. 2021-02-17 11:37:28 -08:00
Fraser Cormack d81161646a [RISCV] Add support for fixed vector vselect
This patch adds support for fixed-length vector vselect. It does so by
lowering them to a custom unmasked VSELECT_VL node with a vector length
operand.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D96768
2021-02-17 10:59:00 +00:00
Craig Topper 07ca13fe07 [RISCV] Add support for fixed vector mask logic operations.
Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D96741
2021-02-16 09:34:00 -08:00
Fraser Cormack 04977ce5ce [RISCV] Fix a crash in fixed-length build_vector lowering
Non-splatted non-integer build_vector nodes were mistakenly being
lowered as VID expressions, which should not happen. VID can only be
used to select integer build_vector nodes.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D96718
2021-02-16 10:25:15 +00:00
Fraser Cormack b870199020 [RISCV] Add patterns for scalable-vector fabs & fcopysign
The patterns mostly follow the scalar counterparts, save for some extra
optimizations to match the vector/scalar forms.

The patch adds a DAGCombine for ISD::FCOPYSIGN to try and reorder
ISD::FNEG around any ISD::FP_EXTEND or ISD::FP_TRUNC of the second
operand. This helps us achieve better codegen to match vfsgnjn.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D96028
2021-02-16 10:21:09 +00:00
Craig Topper 7ba2e1c601 [RISCV] Add support for fixed vector floating point setcc.
This is annoying because the condition code legalization belongs
to LegalizeDAG, but our custom handler runs in Legalize vector ops
which occurs earlier.

This adds some of the mask binary operations so that we can combine
multiple compares that we need for expansion.

I've also fixed up RISCVISelDAGToDAG.cpp to handle copies of masks.

This patch contains a subset of the integer setcc patch as well.
That patch is dependent on the integer binary ops patch. I'll rebase
based on what order the patches go in.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D96567
2021-02-15 12:52:25 -08:00
Fraser Cormack 4bd5bd4009 [RISCV] Convert VSLIDE(UP|DOWN) nodes to "VL" versions (NFC)
This patch prepares the RISCV VSLIDEUP and VSLIDEDOWN custom nodes to
ones carrying additional mask and vector-length operands. This is
primarily so they can be used by both systems.

This also takes the opportunity to create some helper functions to deal
with the common task of getting the default (unmasked) VL operands.

Reviewed By: craig.topper, arcbbb

Differential Revision: https://reviews.llvm.org/D96505
2021-02-15 10:32:56 +00:00
Craig Topper 4220a81c84 [RISCV] Add support for fixed vector fabs 2021-02-12 15:33:36 -08:00
Craig Topper 36658376d5 [RISCV] Add support for fixed vector sqrt. 2021-02-12 15:33:29 -08:00
Craig Topper 1697cc78b1 [RISCV] Add support for integer fixed vector setcc
I believe I've covered all orderings of splat operands here. Better
canonicalization in lowering might help reduce this. I did not handle
the immediate adjustments needed for set(u)gt/set(u)lt.

Testing here is limited to byte types because the scalable vector
type used for masks for the store is calculated assuming 8 byte
elements. But for the setcc its based on the element count of the
container type for the setcc input. So they don't agree. We'll need
to enhanced D96352 to handle this I think.

Differential Revision: https://reviews.llvm.org/D96443
2021-02-12 09:29:41 -08:00
Fraser Cormack e88da1d677 [RISCV] Add support for integer fixed min/max
This patch extends the initial fixed-length vector support to include
smin, smax, umin, and umax.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D96491
2021-02-12 09:19:45 +00:00
Craig Topper 033b1bd185 [RISCV] Add support loads, stores, and splats of vXi1 fixed vectors.
This refines how we determine which masks types are legal and adds
support for loads, stores, and all ones/zeros splats.

I left a fixme in store handling where I think we need to zero
extra bits if the type isn't a multiple of a byte. If I remember
right from X86 there was some case we could have a store of a
1, 2, or 4 bit mask and have a scalar zextload that then expected the
bits to be 0. Its tricky to zero the bits with RVV. We need to do
something like round VL up, zero a register, lower the VL back down,
then do a tail undisturbed move into the zero register. Another
option might be to generate a mask of 1/2/4 bits set with a VL of 8
and use that to mask off the bits.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D96468
2021-02-11 09:13:16 -08:00
Craig Topper 0c254b4a69 [RISCV] Add support for selecting vrgather.vx/vi for fixed vector splat shuffles.
The test cases extract a fixed element from a vector and splat it
into a vector. This gets DAG combined into a splat shuffle.

I've used some very wide vectors in the test to make sure we have
at least a couple tests where the element doesn't fit into the
uimm5 immediate of vrgather.vi so we fall back to vrgather.vx.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D96186
2021-02-10 10:01:56 -08:00
Fraser Cormack a3c74d6d53 [RISCV] Add support for selecting vid.v from build_vector
This patch optimizes a build_vector "index sequence" and lowers it to
the existing custom RISCVISD::VID node. This pattern is common in
autovectorized code.

The custom node was updated to allow it to be used by both scalable and
fixed-length vectors, thus avoiding pattern duplication.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D96332
2021-02-10 10:58:40 +00:00
Hsiangkai Wang a5b07a221a [RISCV] Initial support of LoopVectorizer for RISC-V Vector.
Define an option -riscv-vector-bits-max to specify the maximum vector
bits for vectorizer. Loop vectorizer will use the value to check if it
is safe to use the whole vector registers to vectorize the loop.

It is not the optimum solution for loop vectorizing for scalable vector.
It assumed the whole vector registers will be used to vectorize the code.
If it is possible, we should configure vl to do vectorize instead of
using whole vector registers.

We only consider LMUL = 1 in this patch.

This patch just an initial work for loop vectorizer for RISC-V Vector.

Differential Revision: https://reviews.llvm.org/D95659
2021-02-09 06:32:18 +08:00
Craig Topper 8d8cafa32e [RISCV] Add support for splat fixed length build_vectors using RVV.
Building on the fixed vector support from D95705

I've added ISD nodes for vmv.v.x and vfmv.v.f and switched to
lowering the intrinsics to it. This allows us to share the same
isel patterns for both.

This doesn't handle splats of i64 on RV32 yet. The build_vector
gets converted to a vXi32 build_vector+bitcast during type
legalization. Not sure the best way to handle this at the moment.

Differential Revision: https://reviews.llvm.org/D96108
2021-02-08 11:12:56 -08:00
Craig Topper b8d719fbe8 [RISCV] Add support for fixed vector FMA.
Follow up to D95705. Does not include the commuting support from D95800.

Differential Revision: https://reviews.llvm.org/D96103
2021-02-08 11:12:56 -08:00
Craig Topper a719b667a9 [RISCV] Add initial support for converting fixed vectors to scalable vectors during lowering to use RVV instructions.
This is an alternative to D95563.

This is modeled after a similar feature for AArch64's SVE that uses
predicated scalable vector instructions.a

Rather than use predication, this patch uses an explicit VL operand.
I've limited it to always use LMUL=1 for now, but we can improve this
in the future.

This requires a bunch of new ISD opcodes to carry the VL operand.
I think we can probably lower intrinsics to these ISD opcodes to
cut down on the size of the isel table. Which is why I've added
patterns for all integer/float types and not just LMUL=1.

I'm only testing one vector width right now, but the width is
programmable via the command line.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D95705
2021-02-08 10:41:30 -08:00
Craig Topper b7b4f4cbc3 [RISCV] Make scalable vector FMA commutable for register allocation.
This adds support for commuting operands and converting between
vfmadd and vfmacc to avoid register copies.

To avoid messing up intrinsic behavior, I've added new pseudo
instructions that have the isCommutable flag set. These pseudos also
force a tail agnostic policy. The intrinsic version still use
the tail undisturbed policy.

For best results it looks like we need to start with fmadd and only
pick fmacc if its beneficial. MachineCSE commutes without contraining
the operands and then commutes back if it didn't help with CSE. So
I've made sure that when the operand choice isn't constrained, we
will keep fmadd for MachineCSE and when it does the second commute,
we get back the original instruction.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D95800
2021-02-08 10:05:33 -08:00
Mikael Holmen eb8c27c60c [RISCV] Use std::make_tuple to make some toolchains happy again
My toolchain (LLVM 8.0, libstdc++ 5.4.0) complained with:

12:38:19 ../lib/Target/RISCV/RISCVISelLowering.cpp:1717:12: error: chosen constructor is explicit in copy-initialization
12:38:19     return {RISCVISD::VECREDUCE_FADD, Op.getOperand(0),
12:38:19            ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
12:38:19 /proj/flexasic/app/llvm/8.0/bin/../lib/gcc/x86_64-unknown-linux-gnu/5.4.0/../../../../include/c++/5.4.0/tuple:479:19: note: explicit constructor declared here
12:38:19         constexpr tuple(_UElements&&... __elements)
12:38:19                   ^
12:38:19 ../lib/Target/RISCV/RISCVISelLowering.cpp:1720:12: error: chosen constructor is explicit in copy-initialization
12:38:19     return {RISCVISD::VECREDUCE_SEQ_FADD, Op.getOperand(1), Op.getOperand(0)};
12:38:19            ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
12:38:19 /proj/flexasic/app/llvm/8.0/bin/../lib/gcc/x86_64-unknown-linux-gnu/5.4.0/../../../../include/c++/5.4.0/tuple:479:19: note: explicit constructor declared here
12:38:19         constexpr tuple(_UElements&&... __elements)
12:38:19                   ^
12:38:19 2 errors generated.

This commit adds explicit calls to std::make_tuple to work around
the problem.
2021-02-08 14:37:25 +01:00
Fraser Cormack b46aac125d [RISCV] Support the scalable-vector fadd reduction intrinsic
This patch adds support for both the fadd reduction intrinsic, in both
the ordered and unordered modes.

The fmin and fmax intrinsics are not currently supported due to a
discrepancy between the LLVM semantics and the RVV ISA behaviour with
regards to signaling NaNs. This behaviour is likely fixed in version 2.3
of the RISC-V F/D/Q extension, but until then the intrinsics can be left
unsupported.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D95870
2021-02-08 09:52:27 +00:00
Fraser Cormack e046c0c28b [RISCV] Support scalable-vector integer reduction intrinsics
This patch adds support for the integer reduction intrinsics supported
by RVV. This excludes "mul" which has no corresponding instruction.

The reduction instructions in RVV have slightly complicated type
constraints given they always produce a single "M1" vector register.

They are lowered to custom nodes including the second "scalar" reduction
operand to simplify the patterns and in the hope that they can be useful
for future DAG combines.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D95620
2021-02-05 10:10:08 +00:00
Fraser Cormack c3eb2da6c4 [RISCV] Optimize sign-extended EXTRACT_VECTOR_ELT nodes
This patch custom-legalizes all integer EXTRACT_VECTOR_ELT nodes where
SEW < XLEN to VMV_S_X nodes to help the compiler infer sign bits from
the result. This allows us to eliminate redundant sign extensions.

For parity, all integer EXTRACT_VECTOR_ELT nodes are legalized this way
so that we don't need TableGen patterns for some and not others.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D95741
2021-02-05 10:05:22 +00:00
Craig Topper 44cc5abbf9 [RISCV] Custom lower fshl/fshr with Zbt extension.
We need to add a mask to the shift amount for these operations
to use the FSR/FSL instructions. We were previously doing this
in isel patterns, but custom lowering will make the mask
visible to optimizations earlier.
2021-01-31 17:49:15 -08:00
Fraser Cormack fc2f27ccf3 [RISCV] Add support for RVV int<->fp & fp<->fp conversions
This patch adds support for the full range of vector int-to-float,
float-to-int, and float-to-float conversions on legal types.

Many conversions are supported natively in RVV so are lowered with
patterns. These include conversions between (element) types of the same
size, and those that are half/double the size of the input. When
conversions take place between types that are less than half or more
than double the size we must lower them using sequences of instructions
which go via intermediate types.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D95447
2021-01-28 09:50:32 +00:00
Craig Topper a40e01e442 [RISCV] Rework fault first only load isel.
-Remove the ISD opcode for READ_VL. Just emit the MachineSDNode directly.
-Move segmented fault first only load intrinsic handling completely to
 RISCVISelDAGToDAG.cpp and emit the ReadVL MachineSDNode there
 instead of lowering to ISD opcodes first.
2021-01-27 11:51:41 -08:00
Craig Topper 04570e98c8 [RISCV] Group the legal vector types into lists we can iterator over in the RISCVISelLowering constructor
Remove the RISCVVMVTs namespace because I don't think it provides
a lot of value. If we change the mappings we'd likely have to add
or remove things from the list anyway.

Add a wrapper around addRegisterClass that can determine the
register class from the fixed size of the type.

Reviewed By: frasercrmck, rogfer01

Differential Revision: https://reviews.llvm.org/D95491
2021-01-27 10:20:12 -08:00
Fraser Cormack 9a75a808c2 [RISCV] Fix a codegen crash in getSetCCResultType
This patch fixes some crashes coming from
`RISCVISelLowering::getSetCCResultType`, which would occasionally return
an EVT constructed from an invalid MVT, which has a null Type pointer.

The attached test shows this happening currently for some fixed-length
vectors, which hit this issue when the V extension was enabled, even
though they're not legal types under the V extension. The fix was also
pre-emptively extended to scalable vectors which can't be represented as
an MVT, even though a test case couldn't be found for them.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D95434
2021-01-27 10:22:54 +00:00
Craig Topper f9d7f77267 [RISCV] Have customLegalizeToWOp truncate to the original type instead of i32 now that we use it for i8/i16 as well.
239cfbccb0 add support for legalizing
i8/i16 UDIV/UREM/SDIV to use *W instructions. So we need to truncate
to i8/i16 if we're legalizing one of those.
2021-01-26 10:50:03 -08:00
Hsiangkai Wang b69932b550 [RISCV] Implement vlsegff intrinsics.
Differential Revision: https://reviews.llvm.org/D95303
2021-01-26 12:02:43 +08:00
Fraser Cormack 15141cd115 [RISCV] Add RVV insertelt/extractelt scalable-vector patterns
Original patch by @rogfer01.

This patch adds support for insertelt and extractelt operations on
scalable vectors.

Special care must be taken on RV32 when dealing with i64 vectors as
there are no straightforward ways to insert a 64-bit element without a
register of that size. To that end, both are custom-lowered to different
sequences.

Authored-by: Roger Ferrer Ibanez <rofirrim@gmail.com>
Co-Authored-by: Fraser Cormack <fraser@codeplay.com>

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D94615
2021-01-25 22:03:52 +00:00
Craig Topper 239cfbccb0 [RISCV] Custom type legalize i8/i16 UDIV/UREM/SDIV on RV64 so we can use divuw/remuw/divw.
This makes our i8/i16 codegen more similar to the i32 codegen.

I've also added computeKnownBits support for DIVUW/REMUW so
that we can remove zero extending ANDs from the output. Without
this we end up turning DIVUW/REMUW back into DIVU/REMU via some
isel patterns.

Reviewed By: frasercrmck, luismarques

Differential Revision: https://reviews.llvm.org/D95322
2021-01-25 10:47:22 -08:00
Craig Topper 4eb4f8963f [RISCV] Use sign extend for i32 arguments and returns in makeLibCall on RV64.
As far as I know 32 bits arguments and returns on RV64 are always
sign extended to i64. So I think we should be taking this into
account around libcalls.

Reviewed By: luismarques

Differential Revision: https://reviews.llvm.org/D95285
2021-01-25 09:33:48 -08:00
Fraser Cormack fde2466171 [SelectionDAG] Support scalable-vector splats in more cases
This patch adds support for scalable-vector splats in DAGCombiner's
`isConstantOrConstantVector` and `ISD::matchUnaryPredicate` functions,
which enable the SelectionDAG div/rem-by-constant optimizations for
scalable vector types.

It also fixes up one case where the UDIV optimization was generating a
SETCC without first consulting the target for its preferred SETCC result
type.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D94501
2021-01-25 10:58:15 +00:00
Craig Topper 4d5aa760a7 [RISCV] Add support for rev8 and orc.b to Zbb.
These instructions use a portion of the encodings for grevi and
gorci. The full encodings are only supported with Zbp. Note,
rev8 has a different encoding between rv32 and rv64.

Zbb is closer to being finalized that Zbp which has motivated
some decisions in this patch.

I'm treating rev8 and orc.b as separate instructions when
either Zbb or Zbp is enabled. This allows us to print to suggest
that either feature needs to be enabled to support these mnemonics.
I had tried to put HasStdExtZbbAndNotZbp on the Zbb instructions,
but that caused a diagnostic that said Zbp is required if neither
feature is enabled. We should really mention Zbb since its closer
to final.

This does require extra isel patterns for the different cases so
that bswap will always print as rev8 in assembly listing since
we can't use an InstAlias.

llvm-objdump disassembling should always pick the rev8 or orc.b
instructions. llvm-mc parsing and printing text will not convert
the grevi/gorci spellings to rev8/gorc.b. We could probably fix
this with a special case in processInstruction in the assembly
parser if it its important.

Reviewed By: asb, frasercrmck

Differential Revision: https://reviews.llvm.org/D94944
2021-01-22 12:49:10 -08:00
Craig Topper 3b5430eb0d [RISCV] Add a VL output to vleff intrinsics.
The fault-only-first-load instructions can reduce VL if an element
other than element 0 triggers a memory fault. This can be used to
vectorize loops with data dependent exit conditions like strcmp or
strlen.

This patch adds a VL output to these intrinsics so that the new
VL value can be captured by software. This will be expanded to
'csrr gpr, vl' after the vleff instruction during SelectionDAG.

By doing this with one intrinsic we are able to guarantee that the
csrr reads the VL value produced by the vleff instruction. Having
it as a separate intrinsic would make it impossible to guarantee
ordering without making every other vector intrinsic have side
effects.

The intrinsics are expanded during lowering into two ISD nodes
that are glued together. These ISD nodes will go
through isel separately, but should maintain the glue so that they
get emitted adjacently by InstrEmitter.

I've only ran the chain through the vleff instruction, allowing
the READ_VL to be deleted if it is unused.

Reviewed By: HsiangKai

Differential Revision: https://reviews.llvm.org/D94286
2021-01-21 17:19:58 -08:00
Hsiangkai Wang 6e360460f1 [RISCV] Use v8-v23 as argument registers to conform to the proposal.
The maximum LMUL is 8. We need 16 vector registers for two LMUL-8
arguments. The modification follows the proposal of psABI in
https://github.com/riscv/riscv-elf-psabi-doc/pull/171

Differential Revision: https://reviews.llvm.org/D95134
2021-01-22 07:55:24 +08:00
Michael Munday 4ab0f51a75 Recommit "[RISCV] Legalize select when Zbt extension available"
This recommits 71ed4b6ce5 with
the polarity of some of the pattern corrected.

Original commit message:
The custom expansion of select operations in the RISC-V backend
interferes with the matching of cmov instructions. Legalizing
select when the Zbt extension is available solves that problem.

Reviewed By: luismarques, craig.topper

Differential Revision: https://reviews.llvm.org/D93767
2021-01-21 12:07:44 -08:00
Craig Topper 9d792fef57 [RISCV] Remove unnecessary APInt copy. NFC
getAPIntValue returns a const APInt& so keep it as a reference.
2021-01-20 10:33:09 -08:00
Hsiangkai Wang 8ca4b174d7 [RISCV] Implement vlseg intrinsics.
For Zvlsseg, we need continuous vector registers for the values. We need
to define new register classes for the different combinations of (number
of fields and LMUL). For example,

when the number of fields(NF) = 3, LMUL = 2, the values will be assigned
to (V0M2, V2M2, V4M2), (V2M2, V4M2, V6M2), (V4M2, V6M2, V8M2), ...

We define the vlseg intrinsics with multiple outputs. There is no way to
describe the codegen patterns with multiple outputs in the tablegen
files. We do the codegen in RISCVISelDAGToDAG and use EXTRACT_SUBREG to
extract the values of output.

The multiple scalable vector values will be put into a struct. This
patch is depended on the support for scalable vector struct.

Differential Revision: https://reviews.llvm.org/D94229
2021-01-20 14:26:04 +08:00
Craig Topper ce8b3937dd [RISCV] Add DAG combine to turn (setcc X, 1, setne) -> (setcc X, 0, seteq) if we can prove X is 0/1.
If we are able to compare with 0 instead of 1, we might be able
to fold the setcc into a beqz/bnez.

Often these setccs start life as an xor that gets converted to
a setcc by DAG combiner's rebuildSetcc. I looked into a detecting
(xor X, 1) and converting to (seteq X, 0) based on boolean contents
being 0/1 in rebuildSetcc instead of using computeKnownBits. It was
very perturbing to AMDGPU tests which I didn't look closely at.
It had a few changes on a couple other targets, but didn't seem
to be much if any improvement.

Reviewed By: lenary

Differential Revision: https://reviews.llvm.org/D94730
2021-01-19 11:21:48 -08:00
Fraser Cormack 9c6a00fe99 [RISCV] Add ISel patterns for scalable mask exts & truncs
Original patch by @rogfer01.

This patch adds support for sign-, zero-, and any-extension from
scalable mask vector types to integer vector types, as well as
truncation in the opposite direction.

Authored-by: Roger Ferrer Ibanez <rofirrim@gmail.com>
Co-Authored-by: Fraser Cormack <fraser@codeplay.com>

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D94590
2021-01-19 18:13:15 +00:00
Fraser Cormack ac603c8d38 [RISCV] Add scalable vector truncate patterns
Original patch by @rogfer01.

This patch supports vector truncates, which on RVV must be done in a
series of instructions truncating by one power-of-two at a time. This is
done through custom-lowering and a custom node to avoid LLVM
re-combining the split TRUNCATE nodes.

Authored-by: Roger Ferrer Ibanez <rofirrim@gmail.com>
Co-Authored-by: Fraser Cormack <fraser@codeplay.com>

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D94796
2021-01-18 10:18:43 +00:00
Craig Topper 383b6501ff [RISCV] Use tail agnostic policy for instructions with tied defs if the use operand is IMPLICIT_DEF.
The vcompress intrinsic is defined such that it requires a tail
undisturbed policy. This patch makes it so we can use the tail
agnostic policy if the user has passed vundefined to the dest
operand.

We need to do something similar for masked policy, but we need
annotation of which instructions use the mask policy first.

Not sure if this is sufficient for scheduling or if we'll need to
select different pseudos that don't have a tied def.

Reviewed By: evandro

Differential Revision: https://reviews.llvm.org/D94566
2021-01-17 23:47:58 -08:00
Craig Topper 86e604c4d6 [RISCV] Add implementation of targetShrinkDemandedConstant to optimize AND immediates.
SimplifyDemandedBits can remove set bits from immediates from instructions
like AND/OR/XOR. This can prevent them from being efficiently
codegened on RISCV.

This adds an initial version that tries to keep or form 12 bit
sign extended immediates for AND operations to enable use of ANDI.
If that doesn't work we'll try to create a 32 bit sign extended immediate
to use LUI+ADDIW.

More optimizations are possible for different size immediates or
different operations. But this is a good starting point that already
has test coverage.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D94628
2021-01-15 11:14:14 -08:00
Craig Topper b894a9fb23 [RISCV] Optimize select_cc after fp compare expansion
Some FP compares expand to a sequence ending with (xor X, 1) to invert the result. If
the consumer is a select_cc we can likely get rid of this xor by fixing
up the select_cc condition.

This patch combines (select_cc (xor X, 1), 0, setne, trueV, falseV) -
(select_cc X, 0, seteq, trueV, falseV) if we can prove X is 0/1.

Reviewed By: lenary

Differential Revision: https://reviews.llvm.org/D94546
2021-01-14 13:41:40 -08:00
Craig Topper 387d3c2479 [RISCV] Merge Utils library into MCTargetDesc
MCTargetDesc includes headers from Utils and Utils includes headers
from MCTargetDesc. So from a library layering perspective it makes sense
for them to be in the same library. I guess the other option might be to
move the tablegen includes from RISCVMCTargetDesc.h to RISCVBaseInfo.h
so that RISCVBaseInfo.h didn't need to include RISCVMCTargetDesc.h.
Everything else that depends on Utils also depends on MCTargetDesc so
having one library seemed simpler.

Differential Revision: https://reviews.llvm.org/D93168
2021-01-14 11:47:30 -08:00
Sam Elliott 7c9c2a2ea5 Revert "[RISCV] Legalize select when Zbt extension available"
We found issues with this patch in additional testing. Backing out while
we work on a fix.

This reverts commit 71ed4b6ce5.
2021-01-14 16:44:34 +00:00
Craig Topper dfc1901d51 [RISCV] Custom lower ISD::VSCALE.
This patch custom lowers ISD::VSCALE into a csrr vlenb followed
by a shift right by 3 followed by a multiply by the scale amount.

I've added computeKnownBits support to indicate that the csrr vlenb
always produces 3 trailng bits of 0s so the shift right is "exact".
This allows the shift and multiply sequence to be nicely optimized
into a single shift or removed completely when the scale amount is
a power of 2.

The non power of 2 case multiplying by 24 is still producing
suboptimal code. We could remove the right shift and use a
multiply by 3. Hopefully we can improve DAG combine to fix that
since it's not unique to this sequence.

This replaces D94144.

Reviewed By: HsiangKai

Differential Revision: https://reviews.llvm.org/D94249
2021-01-13 17:14:49 -08:00
Michael Munday 71ed4b6ce5 [RISCV] Legalize select when Zbt extension available
The custom expansion of select operations in the RISC-V backend
interferes with the matching of cmov instructions. Legalizing
select when the Zbt extension is available solves that problem.

Reviewed By: lenary, craig.topper

Differential Revision: https://reviews.llvm.org/D93767
2021-01-12 21:24:38 +00:00
Fraser Cormack 37b41bd087 [RISCV] Add scalable vector fcmp ISel patterns
Original patch by @rogfer01.

All ordered comparisons except ONE are supported natively, and all
unordered comparisons except UNE are expanded into sequences involving
explicit NaN checks and mask arithmetic.

Additionally, we expand GT,OGT,GE,OGE to their swapped-operand versions, and
pattern-match those back to the "original", swapping operands once more. This
way we catch both operations and both "vf" and "fv" forms with fewer patterns.

Also add support for floating-point splat_vector, with an optimization for
splatting fpimm0.

Authored-by: Roger Ferrer Ibanez <rofirrim@gmail.com>
Co-Authored-by: Fraser Cormack <fraser@codeplay.com>

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D94242
2021-01-11 19:38:56 +00:00
Craig Topper 5cf73dca77 [RISCV] Convert most of the information about RVV Pseudos into bits in TSFlags.
This patch moves all but the BaseInstr to bits in TSFlags.

For the index fields, we can just use a bit to indicate their presence.
The locations of the operands are well defined.

This reduces the llc binary by about 32K on my build. It also
removes the binary search of the table from the custom inserter.
Instead we just check that the SEW op is present.

Reviewed By: rogfer01

Differential Revision: https://reviews.llvm.org/D94375
2021-01-10 19:15:45 -08:00
Ben Shi 55f0a1b066 [RISCV] Optimize multiplication with constant
1. Break MUL with specific constant to a SLLI and an ADD/SUB on riscv32
   with the M extension.
2. Break MUL with specific constant to two SLLI and an ADD/SUB, if the
   constant needs a pair of LUI/ADDI to construct.

Reviewed by: craig.topper

Differential Revision: https://reviews.llvm.org/D93619
2021-01-09 10:37:21 +08:00
Craig Topper c68faed041 [RISCV] Return a vXi1 vector type from getSetCCResultType if V extension is enabled.
nvxXi1 types are legal with V extension and that's the result
vmseq/vmsne/vmslt/etc instructions return.

No test cases yet because the setcc isel patterns aren't in
and we'll need more than basic tests to observe this. I locally
tested that this plus D947078, D94168, D94142, and D94149
was enough to be able to handle the overflow result from
llvm.sadd.overflow.
2021-01-06 11:50:15 -08:00
Fraser Cormack 1d4411e9ea [RISCV] Add vector integer min/max ISel patterns
Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D94012
2021-01-05 09:15:50 +00:00
Craig Topper 79cbb003c5 [RISCV] Don't use tail agnostic policy on instructions where destination is tied to source
If the destination is tied, then user has some control of the
register used for input. They would have the ability to control
the value of any tail elements. By using tail agnostic we take
this option away from them.

Its not clear that the intrinsics are defined such that this isn't
supposed to work. And undisturbed is a valid implementation for agnostic
so code wouldn't even fail to work on all systems if we always used
agnostic.

The vcompress intrinsic is defined to require tail undisturbed so
at minimum we need this for that instruction or need to redefine
the intrinsic.

I've made an exception here for vmv.s.x/fmv.s.f and reduction
instructions which only write to element 0 regardless of the tail
policy. This allows us to keep the agnostic policy on those which
should allow better redundant vsetvli removal.

An enhancement would be to check for undef input and keep the
agnostic policy, but we don't have good test coverage for that yet.

Reviewed By: khchen

Differential Revision: https://reviews.llvm.org/D93878
2020-12-29 10:37:58 -08:00
Fraser Cormack aebb4a6052 [RISCV] Rewrite and simplify helper function. NFC.
Reviewed By: luismarques

Differential Revision: https://reviews.llvm.org/D93851
2020-12-29 11:29:44 +00:00
Kazu Hirata d6ff5cf995 [Target] Use llvm::any_of (NFC) 2020-12-24 19:43:26 -08:00
Fraser Cormack 1a7ac29a89 [RISCV] Add ISel support for RVV vector/scalar forms
This patch extends the SDNode ISel support for RVV from only the
vector/vector instructions to include the vector/scalar and
vector/immediate forms.

It uses splat_vector to carry the scalar in each case, except when
XLEN<SEW (RV32 SEW=64) when a custom node `SPLAT_VECTOR_I64` is used for
type-legalization and to encode the fact that the value is sign-extended
to SEW. When the scalar is a full 64-bit value we use a sequence to
materialize the constant into the vector register.

The non-intrinsic ISel patterns have also been split into their own
file.

Authored-by: Roger Ferrer Ibanez <rofirrim@gmail.com>
Co-Authored-by: Fraser Cormack <fraser@codeplay.com>

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D93312
2020-12-23 20:16:18 +00:00
Nandor Licker 0586f048d7 [RISCV] Basic jump table lowering
This patch enables jump table lowering in the RISC-V backend.

In addition to the test case included, the new lowering was
tested by compiling the OCaml runtime and running it under qemu.

Differential Revision: https://reviews.llvm.org/D92097
2020-12-22 15:05:54 +00:00
Craig Topper 09468a9148 [RISCV] Sign extend constant arguments to V intrinsics when promoting to XLen.
The default behavior for any_extend of a constant is to zero extend.
This occurs inside of getNode rather than allowing type legalization
to promote the constant which would sign extend. By using sign extend
with getNode the constant will be sign extended. This gives a better
chance for isel to find a simm5 immediate since all xlen bits are
examined there.

For instructions that use a uimm5 immediate, this change only affects
constants >= 128 for i8 or >= 32768 for i16. Constants that large
already wouldn't have been eligible for uimm5 and would need to use a
scalar register.

If the instruction isn't able to use simm5 or the immediate is
too large, we'll need to materialize the immediate in a register.
As far as I know constants with all 1s in the upper bits should
materialize as well or better than all 0s.

Longer term we should probably have a SEW aware PatFrag to ignore
the bits above SEW before checking simm5.

I updated about half the test cases in some tests to use a negative
constant to get coverage for this.

Reviewed By: evandro

Differential Revision: https://reviews.llvm.org/D93487
2020-12-18 11:43:38 -08:00
Craig Topper 86d282baed [RISCV] Add intrinsics for vmv.x.s and vmv.s.x
This adds intrinsics for vmv.x.s and vmv.s.x.

I've used stricter type constraints on these intrinsics than what we've been doing on the arithmetic intrinsics so far. This will allow us to not need to pass the scalar type to the Intrinsic::getDeclaration call when creating these intrinsics.

A custom ISD is used for vmv.x.s in order to implement the change in computeNumSignBitsForTargetNode which can remove sign extends on the result.

I also modified the MC layer description of these instructions to show the tied source/dest operand. This is different than what we do for masked instructions where we drop the tied source operand when converting to MC. But it is a more accurate description of the instruction. We can't do this for masked instructions since we use the same MC instruction for masked and unmasked. Tools like llvm-mca operate in the MC layer and rely on ins/outs and Uses/Defs for analysis so I don't know if we'll be able to maintain the current behavior for masked instructions. So I went with the accurate description here since it was easy.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D93365
2020-12-18 10:30:48 -08:00
Monk Chiang ee2cb90e3b [RISCV] Define vsadd/vsaddu/vssub/vssubu intrinsics.
We work with @rogfer01 from BSC to come out this patch.

Authored-by: Roger Ferrer Ibanez <rofirrim@gmail.com>
Co-Authored-by: ShihPo Hung <shihpo.hung@sifive.com>
Co-Authored-by: Monk Chiang <monk.chiang@sifive.com>

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D93366
2020-12-18 10:24:24 +08:00
Hsiangkai Wang f03609b5c7 [RISCV] V does not imply F.
If users want to use vector floating point instructions, they need to
specify 'F' extension additionally.

Differential Revision: https://reviews.llvm.org/D93282
2020-12-17 10:57:36 +08:00
Craig Topper 028efac2d7 [RISCV] Only custom legalize i32 arguments to vector intrinsics on RV64. 2020-12-15 13:54:41 -08:00
Hsiangkai Wang 14a91d676b [RISCV][NFC] Define scalable vectors for half types.
This is a preperation work for vfadd intrinsics.

Differential Revision: https://reviews.llvm.org/D93275
2020-12-15 16:23:22 +08:00
Hsiangkai Wang a6805a0e02 [RISCV] Define vadd/vsub/vrsub intrinsics and lower to V instructions.
This patch is based on the proposal from Roger Ferrer Ibanez.
http://lists.llvm.org/pipermail/llvm-dev/2020-October/145850.html

Differential Revision: https://reviews.llvm.org/D93013
2020-12-15 12:56:49 +08:00
Craig Topper b90e2d850e [RISCV] Use tail agnostic policy for vsetvli instruction emitted in the custom inserter
The compiler is making no effort to preserve upper elements. To do so would require another source operand tied with the destination and a different intrinsic interface to give control of this source to the programmer.

This patch changes the tail policy to agnostic so that the CPU doesn't need to make an effort to preserve them.

This is consistent with the RVV intrinsic spec here https://github.com/riscv/rvv-intrinsic-doc/blob/master/rvv-intrinsic-rfc.md#configuration-setting

Differential Revision: https://reviews.llvm.org/D93080
2020-12-10 19:48:03 -08:00
Craig Topper a1ae3c6ac9 [RISCV][LegalizeDAG] Expand SETO and SETUO comparisons. Teach LegalizeDAG to expand SETUO expansion when UNE isn't legal.
If SETUNE isn't legal, UO can use the NOT of the SETO expansion.

Removes some complex isel patterns. Most of the test changes are
from using XORI instead of SEQZ.

Differential Revision: https://reviews.llvm.org/D92008
2020-12-10 09:15:52 -08:00
Fraser Cormack af5fd65895 [RISCV] Fix missing def operand when creating VSETVLI pseudos
The register operand was not being marked as a def when it should be. No tests
for this in the main branch as there are not yet any pseudos without a
non-negative VLIndex.

Also change the type of a virtual register operand from unsigned to Register
and adjust formatting.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D92823
2020-12-09 09:35:28 +00:00
Craig Topper 846f576bea [RISCV] Add a table showing the layout of the fields in VTYPE. Rename MaskedOffAgnostic->MaskAgnostic. NFC 2020-12-08 20:41:57 -08:00
Craig Topper a64998be99 [RISCV] Share VTYPE encoding code between the assembler and the CustomInserter for adding VSETVLI before vector instructions
This merges the SEW and LMUL enums that each used into singles enums in RISCVBaseInfo.h. The patch also adds a new encoding helper to take SEW, LMUL, tail agnostic, mask agnostic and turn it into a vtype immediate.

I also stopped storing the Encoding in the VTYPE operand in the assembler. It is easy to calculate when adding the operand which should only happen once per instruction.

Differential Revision: https://reviews.llvm.org/D92813
2020-12-08 16:04:20 -08:00
Craig Topper 5c819eb389 [RISCV] Form GORCI from (or (rotl/rotr X, Bitwidth/2), X).
A rotate by half the bitwidth swaps the bottom and top half which is the same as one of the MSB GREVI stage.

We have to do this as a special combine because we prefer to keep (rotl/rotr X, BitWidth/2) as a rotate rather than a single stage GREVI.

Differential Revision: https://reviews.llvm.org/D92286
2020-12-07 10:28:04 -08:00
Craig Topper 5baef6353e [RISCV] Initial infrastructure for code generation of the RISC-V V-extension
The companion RFC (http://lists.llvm.org/pipermail/llvm-dev/2020-October/145850.html) gives lots of details on the overall strategy, but we summarize it here:

LLVM IR involving vector types is going to be selected using pseudo instructions (only MachineInstr). These pseudo instructions contain dummy operands to represent the vector type being operated and the vector length for the operation.
These two dummy operands, as set by instruction selection, will be used by the custom inserter to prepend every operation with an appropriate vsetvli instruction that ensures the vector architecture is properly configured for the operation. Not in this patch: later passes will remove the redundant vsetvli instructions.
Register classes of tuples of vector registers are used to represent vector register groups (LMUL > 1).
Those pseudos are eventually lowered into the actual instructions when emitting the MCInsts.
About the patch:

Because there is a bit of initial infrastructure required, this is the minimal patch that allows us to select instructions for 3 LLVM IR instructions: load, add and store vectors of integers. LLVM IR operations have "whole-vector" semantics (as in they generate values for all the elements).

Later patches will extend the information represented in TableGen.

Authored-by: Roger Ferrer Ibanez <rofirrim@gmail.com>
Co-Authored-by: Evandro Menezes <evandro.menezes@sifive.com>
Co-Authored-by: Craig Topper <craig.topper@sifive.com>

Differential Revision: https://reviews.llvm.org/D89449
2020-12-04 11:39:30 -08:00
Craig Topper 3fcdf9ca78 [RISCV] Rename FPCCToExtend->FPOpToExpand and FPOpToExtend->FPOpToExpand. NFC
These are used to call setOperationAction/setCondCodeAction with
the Expand action so it seems that Expand is a better name than
Extend.
2020-12-03 16:00:49 -08:00
Craig Topper a18d5e3e9f [RISCV] Merge FMV_H_X_RV32/FMV_H_X_RV64 into a single opcode. Same with FMV_X_ANYEXTH_RV32/RV64
Rather than having a different opcode for RV32 and RV64. Let's just say the integer type is XLenVT and use a single opcode for both modes.

Differential Revision: https://reviews.llvm.org/D92538
2020-12-03 11:12:40 -08:00
Craig Topper e52a91e156 [RISCV] Add f16 to isFMAFasterThanFMulAndFAdd now that the Zfh extension is supported 2020-12-02 20:31:43 -08:00
Hsiangkai Wang f7bc7c2981 [RISCV] Support Zfh half-precision floating-point extension.
Support "Zfh" extension according to
https://github.com/riscv/riscv-isa-manual/blob/zfh/src/zfh.tex

Differential Revision: https://reviews.llvm.org/D90738
2020-12-03 09:16:33 +08:00
Craig Topper bfc4f29f46 [RISCV] Combine (GORCI (GORCI x, C2), C1) -> (GORCI x, C1|C2).
Unlike GREVI, GORCI stages can't be undone, but they are
redundant if done more than once.

Differential Revision: https://reviews.llvm.org/D92295
2020-11-30 08:42:46 -08:00
Craig Topper 76d1026b59 [RISCV] Custom legalize bswap/bitreverse to GREVI with Zbp extension to enable them to combine with other GREVI instructions
This enables bswap/bitreverse to combine with other GREVI patterns or each other without needing to add more special cases to the DAG combine or new DAG combines.

I've also enabled the existing GREVI combine for GREVIW so that it can pick up the i32 bswap/bitreverse on RV64 after they've been type legalized to GREVIW.

Differential Revision: https://reviews.llvm.org/D92253
2020-11-30 08:30:40 -08:00
Craig Topper cbbd7021f1 [RISCV] Only combine (or (GREVI x, shamt), x) -> GORCI if shamt is a power of 2.
GORCI performs an OR between each stage. So we need to ensure only
one stage is active before doing this combine.

Initial attempts at finding a test case for this failed due to
the order things get combined. It's most likely that we'll form
one stage of GREVI then combine to GORCI before the two stages of
GREVI are able to be formed and combined with each other to form
a multi stage GREVI.

Differential Revision: https://reviews.llvm.org/D92289
2020-11-30 08:10:39 -08:00
Craig Topper 8709d9d872 [RISCV] Replace getSimpleValueType() with getValueType() in DAG combines to prevent asserts with weird types. 2020-11-27 12:49:12 -08:00
Craig Topper ed95cafbc5 [RISCV] Add an implementation of isFMAFasterThanFMulAndFAdd
Start with an assumption that FMA is faster than Fmul+FAdd. If thats not true
on some particular implementation we can add a tuning parameter in the future.

I've update the fmuladd test cases and added new test cases for fast math flag
based contraction.

Differential Revision: https://reviews.llvm.org/D91987
2020-11-25 15:07:34 -08:00
Craig Topper 751b0d970e [RISCV] Make SMIN/SMAX/UMIN/UMAX legal with Zbb extension.
This is the logically correct thing to do. But it generates worse
code for i32 umin/umax on the rv64 due to type legalize requesting
zext even though the arguments are sext. Maybe we can teach type
legalizer to use sext for umin/umax for RISCV.

It's also producing possibly worse code on i64 on RV32 since we
still end up with selects that become branches. But this seems
like something we could improve in type legalization or DAG combine.

Hopefully this makes D92095 work for RISCV with Zbb.
2020-11-25 12:48:43 -08:00
Craig Topper c26e8697d7 [RISCV] Custom type legalize i32 fshl/fshr on RV64 with Zbt.
This adds custom opcodes for FSLW/FSRW so we can type legalize
fshl/fshr without needing to match a sign_extend_inreg.

I've used the operand order from fshl/fshr to make the isel
pattern similar to the non-W form. It was also hard to decide
another order since the register instruction has the shift amount
as the second operand, but the immediate instruction has it as
the third operand.

Differential Revision: https://reviews.llvm.org/D91479
2020-11-25 10:01:47 -08:00
Luís Marques a8dc2110cd [RISCV] Add GHC calling convention
This is a special calling convention to be used by the GHC compiler.

Patch by Andreas Schwab (schwab)

Differential Revision: https://reviews.llvm.org/D89788
2020-11-24 22:35:23 +00:00
Luís Marques e4d9380245 Revert "[RISCV] Add GHC calling convention"
This reverts commit f8317bb256 due to lack of
proper attribution.
2020-11-24 22:34:20 +00:00
Luís Marques f8317bb256 [RISCV] Add GHC calling convention
This is a special calling convention to be used by the GHC compiler.

Differential Revision: https://reviews.llvm.org/D89788
2020-11-24 21:56:28 +00:00
Fraser Cormack ca1f2f2716 [RISCV] Combine GREVI sequences
This combine step performs the following type of transformation:

    rev.p a0, a0   # grevi a0, a0, 0b01
    rev2.n a0, a0  # grevi a0, a0, 0b10
    -->
    rev.n a0, a0   # grevi a0, a0, 0b11

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D91877
2020-11-24 12:07:13 +00:00
Craig Topper 84b8222705 [RISCV] Use separate Lo and Hi MemOperands when expanding BuildPairF64Pseudo and SplitF64Pseudo.
We generate two 4 byte loads or two stores as part of the expansion.
Previously the MemOperand was set the same for both to cover the
full 8 bytes. Now we set a separate 4 byte mem operand for each
with a 4 byte offset for the high part.
2020-11-22 00:46:12 -08:00
Craig Topper 6a1d8b91ed [RISCV] Custom type legalize i32 bswap/bitreverse to GREVIW on RV64 with Zbp extension
Previously we required a sra to pattern match these properly in isel. If the consumer didn't need the result sign extended we'll have an srl instead of sra and fail to match.

This patch switches to custom legalizing to GREVIW using portions of D91259.

Differential Revision: https://reviews.llvm.org/D91457
2020-11-20 10:41:01 -08:00
Craig Topper 78767b7f8e [RISCV] Add RISCVISD::ROLW/RORW use those for custom legalizing i32 rotl/rotr on RV64IZbb.
This should result in better utilization of RORIW since we
don't need to look for a SIGN_EXTEND_INREG that may not exist.

Also remove rotl/rotr isel matching to GREVI and just prefer RORI.
This is to keep consistency so we don't have to match ROLW/RORW
to GREVIW as well. I imagine RORI/RORIW performance will be the
same or better than GREVI.

Differential Revision: https://reviews.llvm.org/D91449
2020-11-20 10:25:47 -08:00
Fraser Cormack 1ac9b54831 [RISCV] Lower GREVI and GORCI as custom nodes
This moves the recognition of GREVI and GORCI from TableGen patterns
into a DAGCombine. This is done primarily to match "deeper" patterns in
the future, like (grevi (grevi x, 1) 2) -> (grevi x, 3).

TableGen is not best suited to matching patterns such as these as the compile
time of the DAG matchers quickly gets out of hand due to the expansion of
commutative permutations.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D91259
2020-11-19 18:11:42 +00:00
Fraser Cormack fe9dc2e54a [RISCV] Use a macro to simplify getTargetNodeName
Similar to the X86 and AMDGPU targets, this uses a macro to cut down on
repetitive and error-prone code when converting RISCVISD node names to
strings in getTargetNodeName.

Reviewed By: asb

Differential Revision: https://reviews.llvm.org/D91414
2020-11-16 09:33:47 +00:00
Craig Topper 637f19c36b [RISCV] Remove traces of Glue from RISCVISD::SELECT_CC
We were creating RISCVISD::SELECT_CC nodes with Glue output that was never being used, and the tablegen SDNode had the SDNPInGlue flag instead of the SDNPOutGlue flag.

Since we don't seem to need the Glue just get rid of it from both places.

Differential Revision: https://reviews.llvm.org/D91199
2020-11-11 09:30:48 -08:00
Craig Topper 5d3fd3df94 [RISCV] Make ctlz/cttz cheap to speculatively execute so CodeGenPrepare won't insert a zero check.
Add additional isel patterns for ctzw/clzw instructions.

Differential Revision: https://reviews.llvm.org/D91040
2020-11-09 10:13:45 -08:00
Craig Topper 4265cbaa34 [RISCV] Make SIGN_EXTEND_INREG from i8/i16 legal when Zbb extension is enabled.
This produces better code for sign extend to i64 on RV32 target.

Differential Revision: https://reviews.llvm.org/D91023
2020-11-09 10:13:45 -08:00
Craig Topper ce5f4f22e9 [RISCV] Use the 'si' lib call for (double (fp_to_sint/uint i32 X)) when F extension is enabled.
D80526 added custom lowering to pick the si lib call on RV64, but this custom handling is only enabled when the F and D extension are both disabled. This prevents the si library call from being used for double when F is enabled but D is not.

This patch changes the behavior so we always enable the Custom hook on RV64 and decide in ReplaceNodeResults if we should emit a libcall based on whether the FP type should be softened or not.

Differential Revision: https://reviews.llvm.org/D90817
2020-11-05 10:46:45 -08:00
Craig Topper ce1270fc7e [RISCV] Remove shadow register list passed to AllocateReg when allocating FP registers for calling convention
The _F and _D registers are already sub/super registers. When one gets allocated all its aliases are already marked as allocated. We don't need to explicitly shadow it too.

I believe shadow is for calling conventions like 64-bit Windows on X86 where have rules like this

CCIfType<[i32], CCAssignToRegWithShadow<[ECX , EDX , R8D , R9D ],
                                         [XMM0, XMM1, XMM2, XMM3]>>

For that calling convention the argument number determines which register is used regardless of how many scalars or vectors came before it.

Removing this removes a question I had in D90738.

Differential Revision: https://reviews.llvm.org/D90801
2020-11-05 09:49:42 -08:00
Simon Pilgrim 36920d5f9d [RISCV] Avoid std::pair<> in FPReg StringSwitch to avoid MSVC compile failures. NFCI.
As discussed on D90322, some MSVC builds are failing with is_trivially_copyable static asserts (see D86126) - we can avoid this by not using the std::pair<unsigned,unsigned> which held both the FP+DP Registers, just handle the FP register and convert to DP on the fly.
2020-11-02 11:30:57 +00:00
Craig Topper a76cd10fcd [RISCV] Use 'unsigned' instead of Register in getRegForInlineAsmConstraint. NFC
The return value of this interface still uses an 'unsigned' on all
targets. So we convert Register back to unsigned at the end.

I'm hoping this will prevent the issue that caused the revert of
D90322.
2020-11-01 10:16:52 -08:00
Craig Topper 6915c76e10 [RISCV] Don't use DCI.CombineTo to replace a single result. NFCI
Just return the new node, which is the standard practice.

I also noticed what appeared to be an unnecessary attempt at
creating an ANY_EXTEND where the type should already be correct.
I replace with an assert to verify the type.

Differential Revision: https://reviews.llvm.org/D90444
2020-10-30 10:46:32 -07:00
Craig Topper 74b078294f [RISCV] Improve worklist management in the DAG combine for SLLW/SRLW/SRAW
This combine makes two calls to SimplifyDemandedBits, one for the LHS and one
for the RHS. If the LHS call returns true, we don't make the RHS call. When
SimplifyDemandedBits makes a change, it will add the nodes around the change to
the DAG combiner worklist. If the simplification happens on the first recursion
step, the N will get added to the worklist. But if the simplification happens
deeper in the recursion, then N will not be revisited until the next time the
DAG combiner runs.

This patch explicitly addes N to the worklist anytime a Simplification is made.
Without this we might miss additional simplifications on the LHS or never
simplify the RHS. Special care also needs to be taken to not add N if it has
been CSEd by the simplification. There are similar examples in DAGCombiner and
the X86 target, but I don't have a test for it for RISC-V. I've also returned
SDValue(N, 0) instead of SDValue() so DAGCombiner knows a change was made and
will update its Statistic variable.

The test here was constructed so that 2 simplifications happen to the LHS.
Without this fix one happens in the post type legalization DAG combine and the
other happens after LegalizeDAG. This prevents the RHS from ever being
simplified causing the left and right shift to clear the upper 32 bits of the
RHS to be left behind.

Differential Revision: https://reviews.llvm.org/D90339
2020-10-29 14:52:53 -07:00
StephenFan a96921afa7 [RISCV] eliminate the repetition declare of SDLoc DL
Differential revision: https://reviews.llvm.org/D85002
2020-08-03 10:24:30 +08:00
Yuanfang Chen ca1e69a675 [NFC] remove unused includes of SelectionDAGISel.h 2020-07-20 10:43:29 -07:00
lewis-revill c9c955ada8 [RISCV] Add matching of codegen patterns to RISCV Bit Manipulation Zbt asm instructions
This patch provides optimization of bit manipulation operations by
enabling the +experimental-b target feature.
It adds matching of single block patterns of instructions to specific
bit-manip instructions from the ternary subset (zbt subextension) of the
experimental B extension of RISC-V.
It adds also the correspondent codegen tests.

This patch is based on Claire Wolf's proposal for the bit manipulation
extension of RISCV:
https://github.com/riscv/riscv-bitmanip/blob/master/bitmanip-0.92.pdf

Differential Revision: https://reviews.llvm.org/D79875
2020-07-15 12:19:34 +01:00
lewis-revill 6144f0a1e5 [RISCV] Add matching of codegen patterns to RISCV Bit Manipulation Zbbp asm instructions
This patch provides optimization of bit manipulation operations by
enabling the +experimental-b target feature.
It adds matching of single block patterns of instructions to specific
bit-manip instructions belonging to both the permutation and the base
subsets of the experimental B extension of RISC-V.
It adds also the correspondent codegen tests.

This patch is based on Claire Wolf's proposal for the bit manipulation
extension of RISCV:
https://github.com/riscv/riscv-bitmanip/blob/master/bitmanip-0.92.pdf

Differential Revision: https://reviews.llvm.org/D79873
2020-07-15 12:19:34 +01:00
lewis-revill 31b52b4345 [RISCV] Add matching of codegen patterns to RISCV Bit Manipulation Zbp asm instructions
This patch provides optimization of bit manipulation operations by
enabling the +experimental-b target feature.
It adds matching of single block patterns of instructions to specific
bit-manip instructions from the permutation subset (zbp subextension) of
the experimental B extension of RISC-V.
It adds also the correspondent codegen tests.

This patch is based on Claire Wolf's proposal for the bit manipulation
extension of RISCV:
https://github.com/riscv/riscv-bitmanip/blob/master/bitmanip-0.92.pdf

Differential Revision: https://reviews.llvm.org/D79871
2020-07-15 12:19:34 +01:00
lewis-revill e2692f0ee7 [RISCV] Add matching of codegen patterns to RISCV Bit Manipulation Zbb asm instructions
This patch provides optimization of bit manipulation operations by
enabling the +experimental-b target feature.
It adds matching of single block patterns of instructions to specific
bit-manip instructions from the base subset (zbb subextension) of the
experimental B extension of RISC-V.
It adds also the correspondent codegen tests.

This patch is based on Claire Wolf's proposal for the bit manipulation
extension of RISCV:
https://github.com/riscv/riscv-bitmanip/blob/master/bitmanip-0.92.pdf

Differential Revision: https://reviews.llvm.org/D79870
2020-07-15 12:19:34 +01:00
Ben Shi cb82de2960 [RISCV] Optimize multiplication by constant
... to shift/add or shift/sub.

Do not enable it on riscv32 with the M extension where decomposeMulByConstant
may not be an optimization.

Reviewed By: luismarques, MaskRay

Differential Revision: https://reviews.llvm.org/D82660
2020-07-07 18:50:24 -07:00
Sam Elliott 7dc892661e [RISCV] Implement Hooks to avoid chaining SELECT
Summary:
This implements two hooks that attempt to avoid control flow for RISC-V. RISC-V
will lower SELECTs into control flow, which is not a great idea.

The hook `hasMultipleConditionRegisters()` turns off the following
DAGCombiner folds:
    select(C0|C1, x, y) <=> select(C0, x, select(C1, x, y))
    select(C0&C1, x, y) <=> select(C0, select(C1, x, y), y)

The second hook `setJumpIsExpensive` controls a flag that has a similar purpose
and is used in CodeGenPrepare and the SelectionDAGBuilder.

Both of these have the effect of ensuring more logic is done before fewer jumps.

Note: with the `B` extension, we may be able to lower select into a conditional
move instruction, so at some point these hooks will need to be guarded based on
enabled extensions.

Reviewed By: luismarques

Differential Revision: https://reviews.llvm.org/D79268
2020-07-01 11:56:31 +01:00
Matt Arsenault 08649f0a9d RISCV: Don't store function in RISCVMachineFunctionInfo
Targets should not depend on the MachineFunction state during the
MachineFunctionInfo construction.
2020-06-30 16:08:51 -04:00
Guillaume Chatelet 2e7bba693e [Alignment][NFC] Use Align for TargetCallingConv::OrigAlign
This patch replaces D69249.

This is patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790

Differential Revision: https://reviews.llvm.org/D82307
2020-06-25 13:21:22 +00:00
Kamlesh Kumar 7622ea5835 [RISCV64] Emit correct lib call for fp(float/double) to ui/si
Since i32 is not legal in riscv64,
it always promoted to i64 before emitting lib call and
for conversions like float/double to int and float/double to unsigned int
wrong lib call was emitted. This commit fix it using custom lowering.

Differential Revision: https://reviews.llvm.org/D80526
2020-06-18 19:34:16 +05:30
Guillaume Chatelet 1778564f91 [Alignment][NFC] Migrate the rest of backends
Summary: This is a followup on D81196

Reviewers: courbet

Subscribers: arsenm, dschuff, jyknight, dylanmckay, sdardis, nemanjai, jvesely, nhaehnle, sbc100, jgravelle-google, hiraditya, aheejin, kbarton, fedor.sergeev, asb, rbar, johnrusso, simoncook, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, PkmX, jocewei, Jim, lenary, s.egerton, pzheng, sameer.abuasal, apazos, luismarques, kerbowa, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D81278
2020-06-08 07:17:20 +00:00
Ben Shi 4b6f0ea66c [RISCV] Fix a typo in RISCVISelLowering.cpp
The 9th parameter of "static bool CC_RISCV(...)" is isFixed, not isRet.

Reviewed By: MaskRay

Differential Revision: https://reviews.llvm.org/D81333
2020-06-06 18:41:00 -07:00
Zequan Wu 80e107ccd0 Add NoMerge MIFlag to avoid MIR branch folding
Let the codegen recognized the nomerge attribute and disable branch folding when the attribute is given

Differential Revision: https://reviews.llvm.org/D79537
2020-05-29 12:31:06 -07:00
Craig Topper d1119980e5 [SelectionDAG] Use Align/MaybeAlign for ConstantPoolSDNode.
This patch stores the alignment for ConstantPoolSDNode as an
Align and updates the getConstantPool interface to take a MaybeAlign.

Removing getAlignment() will be done as a follow up.

Differential Revision: https://reviews.llvm.org/D79436
2020-05-08 16:04:11 -07:00
Craig Topper 113f37a1f9 [CallSite removal][TargetLowering] Replace ImmutableCallSite with CallBase
Differential Revision: https://reviews.llvm.org/D77995
2020-04-13 13:50:15 -07:00
Matt Arsenault 84aa58cbe2 CodeGen: Use Register in TargetLowering 2020-04-08 12:10:58 -04:00
Guillaume Chatelet bdf77209b9 [Alignment][NFC] Use Align version of getMachineMemOperand
Summary:
This is patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790

Reviewers: courbet

Subscribers: jyknight, sdardis, nemanjai, hiraditya, kbarton, fedor.sergeev, asb, rbar, johnrusso, simoncook, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, jfb, PkmX, jocewei, Jim, lenary, s.egerton, pzheng, sameer.abuasal, apazos, luismarques, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D77059
2020-03-30 15:46:27 +00:00
Kamlesh Kumar aabc24acf0 [RISCV] Support llvm.thread.pointer
Fixes https://bugs.llvm.org/show_bug.cgi?id=45303 (clang crashed on __builtin_thread_pointer)

Reviewed By: lenary, MaskRay, luismarques

Differential Revision: https://reviews.llvm.org/D76828
2020-03-27 17:30:12 -07:00
Roger Ferrer Ibanez 3c24aee7ee [RISCV] Select +0.0 immediate using fmv.{w,d}.x / fcvt.d.w
Floating point positive zero can be selected using fmv.w.x / fmv.d.x /
fcvt.d.w and the zero source register.

Differential Revision: https://reviews.llvm.org/D75729
2020-03-20 09:42:24 +00:00
Andrew Wei 4ca753f4e3 [RISCV] Implement mayBeEmittedAsTailCall for tail call optimization
Implement TargetLowering callback mayBeEmittedAsTailCall for riscv in CodeGenPrepare,
which will duplicate return instructions to enable tailcall optimization.

Differential Revision: https://reviews.llvm.org/D73699
2020-02-18 23:56:42 +08:00
Craig Topper eeb63944e4 [LegalizeTypes][ARM][AArch64][PowerPC][RISCV][X86] Use BUILD_PAIR to return expanded integer results from ReplaceNodeResults instead of just returning two results.
Remove code from LegalizeTypes that allowed this to work.

We were already using BUILD_PAIR for this in some places so this
standardizes on a single way to do this.
2020-02-08 09:52:31 -08:00
Guillaume Chatelet 333f2ad8b8 [Alignment][NFC] Use Align for getMemcpy/Memmove/Memset
Summary:
This is patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790

Reviewers: courbet

Subscribers: arsenm, dschuff, jyknight, sdardis, nemanjai, jvesely, nhaehnle, sbc100, jgravelle-google, hiraditya, aheejin, kbarton, fedor.sergeev, asb, rbar, johnrusso, simoncook, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, PkmX, jocewei, jsji, Jim, lenary, s.egerton, pzheng, sameer.abuasal, apazos, luismarques, kerbowa, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D73885
2020-02-03 17:13:19 +01:00
Zakk Chen 0cb274de39 [RISCV] Support ABI checking with per function target-features
1. if users don't specific -mattr, the default target-feature come
from IR attribute.
2. fixed bug and re-land this patch

Reviewers: lenary, asb

Reviewed By: lenary

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D70837
2020-01-22 08:12:28 -08:00
Zakk Chen cef838e65f Revert "[RISCV] Support ABI checking with per function target-features"
This reverts commit 7bc58a779a.
It breaks EXPENSIVE_CHECKS on Windows
2020-01-16 18:01:07 -08:00
Zakk Chen 7bc58a779a [RISCV] Support ABI checking with per function target-features
if users don't specific -mattr, the default target-feature come
from IR attribute.

Reviewers: lenary, asb

Reviewed By: lenary, asb

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D70837
2020-01-15 04:35:01 -08:00
Zakk Chen 3bc2860e92 Revert "[RISCV] Support ABI checking with per function target-features"
This reverts commit 109e4d12ed.
2020-01-15 04:32:57 -08:00
Zakk Chen 109e4d12ed [RISCV] Support ABI checking with per function target-features
if users don't specific -mattr, the default target-feature come
from IR attribute.
2020-01-15 02:30:43 -08:00
Matt Arsenault 255cc5a760 CodeGen: Use LLT instead of EVT in getRegisterByName
Only PPC seems to be using it, and only checks some simple cases and
doesn't distinguish between FP. Just switch to using LLT to simplify
use from GlobalISel.
2020-01-09 17:37:52 -05:00
Reid Kleckner 9c2b72821b Move tail call disabling code to target independent code
When the "disable-tail-calls" attribute was added, checks were added for
it in various backends. Now this code has proliferated, and it is
something the target is responsible for checking. Move that
responsibility back to the ISels (fast, global, and SD).

There's no major functionality change, except for targets that never
implemented this check.

This LLVM attribute was originally added in
d9699bc7bd (2015).

Reviewers: echristo, MaskRay

Differential Revision: https://reviews.llvm.org/D72118
2020-01-03 11:27:41 -08:00
Reid Kleckner 5d986953c8 [IR] Split out target specific intrinsic enums into separate headers
This has two main effects:
- Optimizes debug info size by saving 221.86 MB of obj file size in a
  Windows optimized+debug build of 'all'. This is 3.03% of 7,332.7MB of
  object file size.
- Incremental step towards decoupling target intrinsics.

The enums are still compact, so adding and removing a single
target-specific intrinsic will trigger a rebuild of all of LLVM.
Assigning distinct target id spaces is potential future work.

Part of PR34259

Reviewers: efriedma, echristo, MaskRay

Reviewed By: echristo, MaskRay

Differential Revision: https://reviews.llvm.org/D71320
2019-12-11 18:02:14 -08:00
James Clarke da7b129b1b [RISCV] Don't force Local Exec TLS for non-PIC
Summary:
Forcing Local Exec TLS requires the use of copy relocations. Copy
relocations need special handling in the runtime linker when being used
against TLS symbols, which is present in glibc, but not in FreeBSD nor
musl, and so cannot be relied upon. Moreover, copy relocations are a
hack that embed the size of an object in the ABI when it otherwise
wouldn't be, and break protected symbols (which are expected to be DSO
local), whilst also wasting space, thus they should be avoided whenever
possible. As discussed in D70398, RISC-V should move away from forcing
Local Exec, and instead use Initial Exec like other targets, with
possible linker relaxation to follow. The RISC-V GCC maintainers also
intend to adopt this more-conventional behaviour (see
https://github.com/riscv/riscv-elf-psabi-doc/issues/122).

Reviewers: asb, MaskRay

Reviewed By: MaskRay

Subscribers: emaste, krytarowski, hiraditya, rbar, johnrusso, simoncook, sabuasal, niosHD, kito-cheng, shiva0217, zzheng, edward-jones, rogfer01, MartinMosbeck, brucehoult, the_o, rkruppe, PkmX, jocewei, psnobl, benna, Jim, lenary, s.egerton, pzheng, sameer.abuasal, apazos, llvm-commits, bsdjhb

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D70649
2019-12-03 22:04:54 +00:00
Luís Marques 51b4b17eb7 [RISCV] Implement the TargetLowering::getRegisterByName hook
Summary: The hook should work for any RISC-V register. Non-allocatable registers
do not need to be reserved, for the remaining the hook will only succeed
if you pass clang the -ffixed-xX flag. This builds upon D67185, which
currently only allows reserving GPRs.

Reviewers: asb, lenary

Reviewed By: lenary

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D69130
2019-11-04 11:23:54 +00:00
Sam Elliott 7214f7a79f [RISCV] Lower llvm.trap and llvm.debugtrap
Summary:
Until this commit, these have lowered to a call to abort().

`llvm.trap()` now lowers to `unimp`, which should trap on all systems.

`llvm.debugtrap()` now lowers to `ebreak`, which is exactly what this
instruction is for.

Reviewers: asb, luismarques

Reviewed By: asb

Subscribers: hiraditya, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, kito-cheng, shiva0217, jrtc27, MaskRay, zzheng, edward-jones, rogfer01, MartinMosbeck, brucehoult, the_o, rkruppe, PkmX, jocewei, psnobl, benna, Jim, s.egerton, pzheng, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D69390
2019-10-28 09:54:33 +00:00
Luís Marques 1baa50396d [RISCV] Add support for half-precision floats
Complete fp16 support by ensuring that load extension / truncate store
operations are properly expanded.

Reviewers: asb, lenary
Reviewed By: lenary
Differential Revision: https://reviews.llvm.org/D69246
2019-10-25 14:02:02 +01:00
Simon Cook aed9d6d64a [RISCV] Add support for -ffixed-xX flags
This adds support for reserving GPRs such that the compiler will not
choose a register for register allocation. The implementation follows
the same design as for AArch64; each reserved register becomes a target
feature and used for getting the reserved registers for a given
MachineFunction. The backend checks that it does not need to write to
any reserved register; if it does a relevant error is generated.

Differential Revision: https://reviews.llvm.org/D67185
2019-10-22 21:25:01 +01:00
Shiva Chen 078bec6c48 [RISCV] Support fast calling convention
LLVM may annotate the function with fastcc if there has only one caller
and there're no other caller out of the module and the function is not
naked or contain variable arguments.

The fastcc functions could pass the arguments by the caller saved registers.

Differential Revision: https://reviews.llvm.org/D68559

llvm-svn: 374857
2019-10-15 02:04:29 +00:00
Luis Marques aae97bfd0c [RISCV] Rename FPRs and use Register arithmetic
The new names for FPRs ensure that the Register values within the same class are
enumerated consecutively (the order is determined by the `LessRecordRegister`
function object). Where there were tables mapping between 32- and 64-bit FPRs
(and vice versa) this patch replaces them with Register arithmetic. The
enumeration order between different register classes is expected to continue to
be arbitrary, although it does impact the conversion from the (overloaded) asm
FPR names to Register values, and therefore might require updates to the target
if the sorting algorithm is changed. Static asserts were added to ensure that
changes to the ordering that would impact the current implementation are
detected.

Differential Revision: https://reviews.llvm.org/D67423

llvm-svn: 373096
2019-09-27 15:49:10 +00:00
Guillaume Chatelet 18f805a7ea [Alignment][NFC] Remove unneeded llvm:: scoping on Align types
llvm-svn: 373081
2019-09-27 12:54:21 +00:00
Luis Marques 2d0cd6cac8 [RISCV] Fix static analysis issues
Unlikely to be problematic but still worth fixing.

Differential Revision: https://reviews.llvm.org/D67640

llvm-svn: 372391
2019-09-20 13:48:02 +00:00
Guillaume Chatelet ad1cea0dda [Alignment][NFC] Use Align with TargetLowering::setPrefFunctionAlignment
Summary:
This is patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790

Reviewers: courbet

Subscribers: nemanjai, javed.absar, hiraditya, kbarton, asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, rogfer01, MartinMosbeck, brucehoult, the_o, PkmX, jocewei, jsji, s.egerton, pzheng, ychen, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D67267

llvm-svn: 371212
2019-09-06 15:03:49 +00:00
Guillaume Chatelet 4fc3ad9e13 [Alignment][NFC] Use Align with TargetLowering::setMinFunctionAlignment
Summary:
This is patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790

Reviewers: courbet

Subscribers: jyknight, sdardis, nemanjai, javed.absar, hiraditya, kbarton, fedor.sergeev, asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, PkmX, jocewei, jsji, s.egerton, pzheng, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D67229

llvm-svn: 371200
2019-09-06 12:48:34 +00:00
Guillaume Chatelet aff45e4b23 [LLVM][Alignment] Make functions using log of alignment explicit
Summary:
This patch renames functions that takes or returns alignment as log2, this patch will help with the transition to llvm::Align.
The renaming makes it explicit that we deal with log(alignment) instead of a power of two alignment.
A few renames uncovered dubious assignments:

 - `MirParser`/`MirPrinter` was expecting powers of two but `MachineFunction` and `MachineBasicBlock` were using deal with log2(align). This patch fixes it and updates the documentation.
 - `MachineBlockPlacement` exposes two flags (`align-all-blocks` and `align-all-nofallthru-blocks`) supposedly interpreted as power of two alignments, internally these values are interpreted as log2(align). This patch updates the documentation,
 - `MachineFunctionexposes` exposes `align-all-functions` also interpreted as power of two alignment, internally this value is interpreted as log2(align). This patch updates the documentation,

Reviewers: lattner, thegameg, courbet

Subscribers: dschuff, arsenm, jyknight, dylanmckay, sdardis, nemanjai, jvesely, nhaehnle, javed.absar, hiraditya, kbarton, fedor.sergeev, asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, dexonsmith, PkmX, jocewei, jsji, Jim, s.egerton, llvm-commits, courbet

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D65945

llvm-svn: 371045
2019-09-05 10:00:22 +00:00
Jim Lin b77aa1d248 [RISCV] Enable tail call opt for variadic function
Summary: Tail call opt can treat variadic function call the same as normal function call

Reviewers: mgrang, asb, lenary, lewis-revill

Reviewed By: lenary

Subscribers: luismarques, pzheng, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, kito-cheng, shiva0217, jrtc27, MaskRay, zzheng, edward-jones, rogfer01, MartinMosbeck, brucehoult, the_o, rkruppe, PkmX, jocewei, psnobl, benna, s.egerton, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D66278

llvm-svn: 370835
2019-09-04 02:03:36 +00:00
Shiva Chen b39876d8cd [RISCV] Avoid generating AssertZext for LP64 ABI when lowering floating LibCall
The patch fixed the issue that RV64 didn't clear the upper bits
when return complex floating value with lp64 ABI.

float _Complex
complex_add(float _Complex a, float _Complex b)
{
   return a + b;
}

RealResult = zero_extend(RealA + RealB)
ImageResult = ImageA + ImageB
Return (RealResult | (ImageResult << 32))

The patch introduces shouldExtendTypeInLibCall target hook to suppress
the AssertZext generation when lowering floating LibCall.

Thanks to Eli's comments from the Bugzilla
https://bugs.llvm.org/show_bug.cgi?id=42820

Differential Revision: https://reviews.llvm.org/D65497

llvm-svn: 370275
2019-08-28 23:40:37 +00:00
Benjamin Kramer dc5f805d31 Do a sweep of symbol internalization. NFC.
llvm-svn: 369803
2019-08-23 19:59:23 +00:00
Luis Marques fa06e95898 [RISCV] Convert registers from unsigned to Register
Only in public interfaces that have not yet been converted should there remain
registers with unsigned type.

Differential Revision: https://reviews.llvm.org/D66252

llvm-svn: 369114
2019-08-16 14:27:50 +00:00
Lewis Revill 7abf863f76 [RISCV] Lower inline asm constraint A for RISC-V
This allows arguments with the constraint A to be lowered to input nodes
for RISC-V, which implies a memory address stored in a register.

This patch adds the minimal amount of code required to get operands with
the right constraints to compile.

https://reviews.llvm.org/D54296

llvm-svn: 369095
2019-08-16 10:28:34 +00:00
Daniel Sanders 3836874dbb [risc-v] Apply llvm-prefer-register-over-unsigned from clang-tidy to LLVM
Summary:
This clang-tidy check is looking for unsigned integer variables whose initializer
starts with an implicit cast from llvm::Register and changes the type of the
variable to llvm::Register (dropping the llvm:: where possible).

Depends on D65919

Reviewers: lenary

Subscribers: jholewinski, MatzeB, qcolombet, dschuff, jyknight, dylanmckay, sdardis, nemanjai, jvesely, wdng, nhaehnle, sbc100, jgravelle-google, kristof.beyls, hiraditya, aheejin, kbarton, fedor.sergeev, javed.absar, asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, tpr, PkmX, jocewei, jsji, Petar.Avramovic, asbirlea, Jim, s.egerton, llvm-commits

Tags: #llvm

Differential Revision for full review was: https://reviews.llvm.org/D65962

llvm-svn: 368629
2019-08-12 22:41:02 +00:00
Sam Elliott fee242aed4 [RISCV] Fix ICE in isDesirableToCommuteWithShift
Summary:
Ana Pazos reported a bug where we were not checking that an APInt would
fit into 64-bits before calling `getSExtValue()`. This caused asserts when
compiling large constants, such as i128s, as happens when compiling compiler-rt.

This patch adds a testcase and makes the callback less error-prone.

Reviewers: apazos, asb, luismarques

Reviewed By: luismarques

Subscribers: hiraditya, rbar, johnrusso, simoncook, sabuasal, niosHD, kito-cheng, shiva0217, jrtc27, MaskRay, zzheng, edward-jones, rogfer01, MartinMosbeck, brucehoult, the_o, rkruppe, PkmX, jocewei, psnobl, benna, Jim, s.egerton, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D66081

llvm-svn: 368572
2019-08-12 13:51:00 +00:00
Sam Elliott 856d5c5817 [RISCV] Allow ABI Names in Inline Assembly Constraints
Summary:
Clang will replace references to registers using ABI names in inline
assembly constraints with references to architecture names, but other
frontends do not. LLVM uses the regular assembly parser to parse inline asm,
so inline assembly strings can contain references to registers using their
ABI names.

This patch adds support for parsing constraints using either the ABI name or
the architectural register name. This means we do not need to implement the
ABI name replacement code in every single frontend, especially those like
Rust which are a very thin shim on top of LLVM IR's inline asm, and that
constraints can more closely match the assembly strings they refer to.

Reviewers: asb, simoncook

Reviewed By: simoncook

Subscribers: hiraditya, rbar, johnrusso, JDevlieghere, apazos, sabuasal, niosHD, kito-cheng, shiva0217, jrtc27, MaskRay, zzheng, edward-jones, rogfer01, MartinMosbeck, brucehoult, the_o, rkruppe, PkmX, jocewei, psnobl, benna, Jim, s.egerton, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D65947

llvm-svn: 368303
2019-08-08 14:59:16 +00:00
Shiva Chen b12056bd33 [RISCV] Custom legalize i32 operations for RV64 to reduce signed extensions
Differential Revision: https://reviews.llvm.org/D65434

llvm-svn: 367960
2019-08-06 00:24:00 +00:00
Guillaume Chatelet c97a3d15d2 [LLVM][Alignment] Introduce Alignment Type
Summary:
This is patch is part of a serie to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790

Reviewers: courbet, jfb, jakehehrlich

Reviewed By: jfb

Subscribers: wuzish, jholewinski, arsenm, dschuff, nemanjai, jvesely, nhaehnle, javed.absar, sbc100, jgravelle-google, hiraditya, aheejin, kbarton, asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, rogfer01, MartinMosbeck, brucehoult, the_o, dexonsmith, PkmX, jocewei, jsji, s.egerton, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D65514

llvm-svn: 367828
2019-08-05 11:02:05 +00:00
Bill Wendling 41a2847a9a Emit diagnostic if an inline asm constraint requires an immediate
Summary:
An inline asm call can result in an immediate after inlining. Therefore emit a
diagnostic here if constraint requires an immediate but one isn't supplied.

Reviewers: joerg, mgorny, efriedma, rsmith

Reviewed By: joerg

Subscribers: asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, zzheng, edward-jones, rogfer01, MartinMosbeck, brucehoult, the_o, PkmX, jocewei, s.egerton, MaskRay, jyknight, dylanmckay, javed.absar, fedor.sergeev, jrtc27, Jim, krytarowski, eraman, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D60942

llvm-svn: 367750
2019-08-03 05:52:47 +00:00
Sam Elliott 9e6b2e1605 [RISCV] Support 'f' Inline Assembly Constraint
Summary:
This adds the 'f' inline assembly constraint, as supported by GCC. An
'f'-constrained operand is passed in a floating point register. Exactly
which kind of floating-point register (32-bit or 64-bit) is decided
based on the operand type and the available standard extensions (-f and
-d, respectively).

This patch adds support in both the clang frontend, and LLVM itself.

Reviewers: asb, lewis-revill

Reviewed By: asb

Subscribers: hiraditya, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, kito-cheng, shiva0217, jrtc27, MaskRay, zzheng, edward-jones, rogfer01, MartinMosbeck, brucehoult, the_o, rkruppe, PkmX, jocewei, psnobl, benna, Jim, s.egerton, cfe-commits, llvm-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D65500

llvm-svn: 367403
2019-07-31 09:45:55 +00:00
Simon Cook 8d7ec4d644 [RISCV] Add support for lowering floating point inlineasm clobbers
This adds the required extension to RISC-V's getRegForInlineAsmConstraint
in order to be able to correctly distringuish between the 32 and 64-bit
floating point registers when the generic fX name appears in inlineasm
clobber contraints. It also adds a check to validate that callee saved
floating point registers are only saved in this case when a hard-float
ABI is selected.

Differential Revision: https://reviews.llvm.org/D64751

llvm-svn: 367397
2019-07-31 09:07:21 +00:00
Alex Bradbury b8d352a08b [RISCV] Reset NoPHIS MachineFunctionProperty in emitSelectPseudo
We insered PHIS were there were none before, so the property must be
reset. This error was found on an EXPENSIVE_CHECKS build.

llvm-svn: 366412
2019-07-18 07:52:41 +00:00
Sam Elliott 114d2db49b [RISCV] Fix ICE in isDesirableToCommuteWithShift
Summary:
There was an error being thrown from isDesirableToCommuteWithShift in
some tests. This was tracked down to the method being called before
legalisation, with an extended value type, not a machine value type.

In the case I diagnosed, the error was only hit with an instruction sequence
involving `i24`s in the add and shift. `i24` is not a Machine ValueType, it is
instead an Extended ValueType which was causing the issue.

I have added a test to cover this case, and fixed the error in the callback.

Reviewers: asb, luismarques

Reviewed By: asb

Subscribers: hiraditya, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, kito-cheng, shiva0217, jrtc27, MaskRay, zzheng, edward-jones, rogfer01, MartinMosbeck, brucehoult, the_o, rkruppe, PkmX, jocewei, psnobl, benna, Jim, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D64425

llvm-svn: 365511
2019-07-09 16:24:16 +00:00
Alex Bradbury 0b9addb8c0 [RISCV] Specify registers used in DWARF exception handling
Defines RISCV registers for getExceptionPointerRegister() and
getExceptionSelectorRegister().

Differential Revision: https://reviews.llvm.org/D63411
Patch by Edward Jones.
Modified by Alex Bradbury to add CHECK lines to exception-pointer-register.ll.

llvm-svn: 365301
2019-07-08 09:16:47 +00:00
Sam Elliott b2c9eed0d7 [RISCV] Support @llvm.readcyclecounter() Intrinsic
On RISC-V, the `cycle` CSR holds a 64-bit count of the number of clock
cycles executed by the core, from an arbitrary point in the past. This
matches the intended semantics of `@llvm.readcyclecounter()`, which we
currently leave to the default lowering (to the constant 0).

With this patch, we will now correctly lower this intrinsic to the
intended semantics, using the user-space instruction `rdcycle`. On
64-bit targets, we can directly lower to this instruction.

On 32-bit targets, we need to do more, as `rdcycle` only returns the low
32-bits of the `cycle` CSR. In this case, we perform a custom lowering,
based on the PowerPC lowering, using `rdcycleh` to obtain the high
32-bits of the `cycle` CSR. This custom lowering inserts a new basic
block which detects overflow in the high 32-bits of the `cycle` CSR
during reading (because multiple instructions are required to read). The
emitted assembly matches the suggested assembly in the RISC-V
specification.

Differential Revision: https://reviews.llvm.org/D64125

llvm-svn: 365201
2019-07-05 12:35:21 +00:00
Lewis Revill 39263ac5d1 [RISCV] Add lowering of global TLS addresses
This patch adds lowering for global TLS addresses for the TLS models of
InitialExec, GlobalDynamic, LocalExec and LocalDynamic.

LocalExec support required using a 4-operand add instruction, which uses
the fourth operand to express a relocation on the symbol. The necessary
fixup is emitted when the instruction is emitted.

Differential Revision: https://reviews.llvm.org/D55305

llvm-svn: 363771
2019-06-19 08:40:59 +00:00
Sam Elliott 9f155bc6e5 [RISCV] Prevent re-ordering some adds after shifts
Summary:
DAGCombine will normally turn a `(shl (add x, c1), c2)` into `(add (shl x, c2), c1 << c2)`, where `c1` and `c2` are constants. This can be prevented by a callback in TargetLowering.

On RISC-V, materialising the constant `c1 << c2` can be more expensive than materialising `c1`, because materialising the former may take more instructions, and may use a register, where materialising the latter would not.

This patch implements the hook in RISCVTargetLowering to prevent this transform, in the cases where:
- `c1` fits into the immediate field in an `addi` instruction.
- `c1` takes fewer instructions to materialise than `c1 << c2`.

In future, DAGCombine could do the check to see whether `c1` fits into an add immediate, which might simplify more targets hooks than just RISC-V.

Reviewers: asb, luismarques, efriedma

Reviewed By: asb

Subscribers: xbolva00, lebedev.ri, craig.topper, lewis-revill, Jim, hiraditya, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, kito-cheng, shiva0217, jrtc27, zzheng, edward-jones, rogfer01, MartinMosbeck, brucehoult, the_o, rkruppe, PkmX, jocewei, psnobl, benna, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D62857

llvm-svn: 363736
2019-06-18 20:38:08 +00:00
Lewis Revill 74c8364954 [RISCV] Lower calls through PLT
This patch adds support for generating calls through the procedure
linkage table where required for a given ExternalSymbol or GlobalAddress
callee.

Differential Revision: https://reviews.llvm.org/D55304

llvm-svn: 363686
2019-06-18 14:29:45 +00:00
Lewis Revill a5240361dd [RISCV] Add lowering of addressing sequences for PIC
This patch allows lowering of PIC addresses by using PC-relative
addressing for DSO-local symbols and accessing the address through the
global offset table for non-DSO-local symbols.

Differential Revision: https://reviews.llvm.org/D55303

llvm-svn: 363058
2019-06-11 12:57:47 +00:00
Lewis Revill 28a5cadb3a [RISCV] Lower inline asm constraints I, J & K for RISC-V
This validates and lowers arguments to inline asm nodes which have the
constraints I, J & K, with the following semantics (equivalent to GCC):

I: Any 12-bit signed immediate.
J: Immediate integer zero only.
K: Any 5-bit unsigned immediate.

Differential Revision: https://reviews.llvm.org/D54093

llvm-svn: 363054
2019-06-11 12:42:13 +00:00
Sam Elliott f720647ddd [RISCV] Support Bit-Preserving FP in F/D Extensions
Summary:
This allows some integer bitwise operations to instead be performed by
hardware fp instructions. This is correct because the RISC-V spec
requires the F and D extensions to use the IEEE-754 standard
representation, and fp register loads and stores to be bit-preserving.

This is tested against the soft-float ABI, but with hardware float
extensions enabled, so that the tests also ensure the optimisation also
fires in this case.

Reviewers: asb, luismarques

Reviewed By: asb

Subscribers: hiraditya, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, kito-cheng, shiva0217, jrtc27, zzheng, edward-jones, rogfer01, MartinMosbeck, brucehoult, the_o, rkruppe, PkmX, jocewei, psnobl, benna, Jim, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D62900

llvm-svn: 362790
2019-06-07 12:20:14 +00:00
Luis Marques 20d2424016 [RISCV] Custom lower SHL_PARTS, SRA_PARTS, SRL_PARTS
When not optimizing for minimum size (-Oz) we custom lower wide shifts
(SHL_PARTS, SRA_PARTS, SRL_PARTS) instead of expanding to a libcall.

Differential Revision: https://reviews.llvm.org/D59477

llvm-svn: 358498
2019-04-16 14:38:32 +00:00
Lewis Revill 24a74096a4 Test commit: Remove double variable assignment
llvm-svn: 357601
2019-04-03 15:54:30 +00:00
Alex Bradbury 44668ae7c7 [RISCV] Attach VK_RISCV_CALL to symbols upon creation
This patch replaces the addition of VK_RISCV_CALL in RISCVMCCodeEmitter by
creating the RISCVMCExpr when tail/call are parsed, or in the codegen case
when the callee symbols are created.

This required adding a new CallSymbol operand to allow only adding
VK_RISCV_CALL to tail/call instructions.

This patch will allow further expansion of parsing and codegen to easily
include PLT symbols which must generate the R_RISCV_CALL_PLT relocation.

Differential Revision: https://reviews.llvm.org/D55560
Patch by Lewis Revill.

llvm-svn: 357396
2019-04-01 14:53:17 +00:00
Alex Bradbury da20f5ca74 [RISCV] Generate address sequences suitable for mcmodel=medium
This patch adds an implementation of a PC-relative addressing sequence to be
used when -mcmodel=medium is specified. With absolute addressing, a 'medium'
codemodel may cause addresses to be out of range. This is because while
'medium' implies a 2 GiB addressing range, this 2 GiB can be at any offset as
opposed to 'small', which implies the first 2 GiB only.

Note that LLVM/Clang currently specifies code models differently to GCC, where
small and medium imply the same functionality as GCC's medlow and medany
respectively.

Differential Revision: https://reviews.llvm.org/D54143
Patch by Lewis Revill.

llvm-svn: 357393
2019-04-01 14:42:56 +00:00
Luis Marques 3091884e25 [RISCV] Add seto pattern expansion
Adds a `seto` pattern expansion. Without it the lowerings of `fcmp one` and 
`fcmp ord` would be inefficient due to an unoptimized double negation.

Differential Revision: https://reviews.llvm.org/D59699

llvm-svn: 357378
2019-04-01 09:54:14 +00:00
Alex Bradbury 0b2803ee65 [RISCV] Add codegen support for ilp32f, ilp32d, lp64f, and lp64d ("hard float") ABIs
This patch adds support for the RISC-V hard float ABIs, building on top of
rL355771, which added basic target-abi parsing and MC layer support. It also
builds on some re-organisations and expansion of the upstream ABI and calling
convention tests which were recently committed directly upstream.

A number of aspects of the RISC-V float hard float ABIs require frontend
support (e.g. flattening of structs and passing int+fp for fp+fp structs in a
pair of registers), and will be addressed in a Clang patch.

As can be seen from the tests, it would be worthwhile extending
RISCVMergeBaseOffsets to handle constant pool as well as global accesses.

Differential Revision: https://reviews.llvm.org/D59357

llvm-svn: 357352
2019-03-30 17:59:30 +00:00
Alex Bradbury 9681b01c21 [RISCV] Add DAGCombine for (SplitF64 (ConstantFP x))
The SplitF64 node is used on RV32D to convert an f64 directly to a pair of i32
(necessary as bitcasting to i64 isn't legal). When performed on a ConstantFP,
this will result in a FP load from the constant pool followed by a store to
the stack and two integer loads from the stack (necessary as there is no way
to directly move between f64 FPRs and i32 GPRs on RV32D). It's always cheaper
to just materialise integers for the lo and hi parts of the FP constant, so do
that instead.

llvm-svn: 357341
2019-03-30 09:15:47 +00:00
Alex Bradbury dab1f6fc4e [RISCV] Add basic RV32E definitions and MC layer support
The RISC-V ISA defines RV32E as an alternative "base" instruction set
encoding, that differs from RV32I by having only 16 rather than 32 registers.
This patch adds basic definitions for RV32E as well as MC layer support
(assembling, disassembling) and tests. The only supported ABI on RV32E is
ILP32E.

Add a new RISCVFeatures::validate() helper to RISCVUtils which can be called
from codegen or MC layer libraries to validate the combination of TargetTriple
and FeatureBitSet. Other targets have similar checks (e.g. erroring if SPE is
enabled on PPC64 or oddspreg + o32 ABI on Mips), but they either duplicate the
checks (Mips), or fail to check for both codegen and MC codepaths (PPC).

Codegen for the ILP32E ABI support and RV32E codegen are left for a future
patch/patches.

Differential Revision: https://reviews.llvm.org/D59470

llvm-svn: 356744
2019-03-22 11:21:40 +00:00
Alex Bradbury b9e78c3994 [RISCV] Optimize emission of SELECT sequences
This patch optimizes the emission of a sequence of SELECTs with the same
condition, avoiding the insertion of unnecessary control flow. Such a sequence
often occurs when a SELECT of values wider than XLEN is legalized into two
SELECTs with legal types. We have identified several use cases where the
SELECTs could be interleaved with other instructions. Therefore, we extend the
sequence to include non-SELECT instructions if we are able to detect that the
non-SELECT instructions do not impact the optimization.

This patch supersedes https://reviews.llvm.org/D59096, which attempted to
address this issue by introducing a new SelectionDAG node. Hat tip to Eli
Friedman for his feedback on how to best handle this issue.

Differential Revision: https://reviews.llvm.org/D59355
Patch by Luís Marques.

llvm-svn: 356741
2019-03-22 10:45:03 +00:00
Alex Bradbury 2c6c84e52c [RISCV][NFC] Convert some MachineBaiscBlock::iterator(MI) to MI.getIterator()
llvm-svn: 355864
2019-03-11 20:43:29 +00:00
Alex Bradbury 62c8a57a74 [RISCV][NFC] Minor refactoring of CC_RISCV
Immediately check if we need to early-exit as we have a return value that
can't be returned directly. Also tweak following if/else.

llvm-svn: 355773
2019-03-09 11:16:27 +00:00
Alex Bradbury bd0eff316a [RISCV][NFC] Split out emitSelectPseudo from EmitInstrWithCustomInserter
It's cleaner and more consistent to have a separate helper function here.

llvm-svn: 355772
2019-03-09 09:30:14 +00:00
Alex Bradbury fea4957177 [RISCV] Support -target-abi at the MC layer and for codegen
This patch adds proper handling of -target-abi, as accepted by llvm-mc and
llc. Lowering (codegen) for the hard-float ABIs will follow in a subsequent
patch. However, this patch does add MC layer support for the hard float and
RVE ABIs (emission of the appropriate ELF flags
https://github.com/riscv/riscv-elf-psabi-doc/blob/master/riscv-elf.md#-file-header).

ABI parsing must be shared between codegen and the MC layer, so we add
computeTargetABI to RISCVUtils. A warning will be printed if an invalid or
unrecognized ABI is given.

Differential Revision: https://reviews.llvm.org/D59023

llvm-svn: 355771
2019-03-09 09:28:06 +00:00
Alex Bradbury db67be889d [RISCV][NFC] IsEligibleForTailCallOptimization -> isEligibleForTailCallOptimization
Also clang-format the modified hunks.

llvm-svn: 354584
2019-02-21 14:31:41 +00:00
Alex Bradbury 7539fa2c2d [RISCV] Implement RV64D codegen
This patch:
* Adds necessary RV64D codegen patterns
* Modifies CC_RISCV so it will properly handle f64 types (with soft float ABI)

Note that in general there is no reason to try to select fcvt.w[u].d rather than fcvt.l[u].d for i32 conversions because fptosi/fptoui produce poison if the input won't fit into the target type.

Differential Revision: https://reviews.llvm.org/D53237

llvm-svn: 352833
2019-02-01 03:53:30 +00:00
Alex Bradbury d834d8301d [RISCV] Add RV64F codegen support
This requires a little extra work due tothe fact i32 is not a legal type. When
call lowering happens post-legalisation (e.g. when an intrinsic was inserted
during legalisation). A bitcast from f32 to i32 can't be introduced. This is
similar to the challenges with RV32D. To handle this, we introduce
target-specific DAG nodes that perform bitcast+anyext for f32->i64 and
trunc+bitcast for i64->f32.

Differential Revision: https://reviews.llvm.org/D53235

llvm-svn: 352807
2019-01-31 22:48:38 +00:00
Alex Bradbury 0092df0669 [RISCV] Add target DAG combine for bitcast fabs/fneg on RV32FD
DAGCombiner::visitBITCAST will perform:
 fold (bitconvert (fneg x)) -> (xor (bitconvert x), signbit)
 fold (bitconvert (fabs x)) -> (and (bitconvert x), (not signbit))

As shown in double-bitmanip-dagcombines.ll, this can be advantageous. But
RV32FD doesn't use bitcast directly (as i64 isn't a legal type), and instead
uses RISCVISD::SplitF64. This patch adds an equivalent DAG combine for
SplitF64.

llvm-svn: 352247
2019-01-25 21:55:48 +00:00
Alex Bradbury 456d3798d6 [RISCV] Custom-legalise i32 SDIV/UDIV/UREM on RV64M
Follow the same custom legalisation strategy as used in D57085 for
variable-length shifts (see that patch summary for more discussion). Although
we may lose out on some late-stage DAG combines, I think this custom
legalisation strategy is ultimately easier to reason about.

There are some codegen changes in rv64m-exhaustive-w-insts.ll but they are all
neutral in terms of the number of instructions.

Differential Revision: https://reviews.llvm.org/D57096

llvm-svn: 352171
2019-01-25 05:11:34 +00:00
Alex Bradbury 299d690a50 [RISCV] Custom-legalise 32-bit variable shifts on RV64
The previous DAG combiner-based approach had an issue with infinite loops
between the target-dependent and target-independent combiner logic (see
PR40333). Although this was worked around in rL351806, the combiner-based
approach is still potentially brittle and can fail to select the 32-bit shift
variant when profitable to do so, as demonstrated in the pr40333.ll test case.

This patch instead introduces target-specific SelectionDAG nodes for
SHLW/SRLW/SRAW and custom-lowers variable i32 shifts to them. pr40333.ll is a
good example of how this approach can improve codegen.

This adds DAG combine that does SimplifyDemandedBits on the operands (only
lower 32-bits of first operand and lower 5 bits of second operand are read).
This seems better than implementing SimplifyDemandedBitsForTargetNode as there
is no guarantee that would be called (and it's not for e.g. the anyext return
test cases). Also implements ComputeNumSignBitsForTargetNode.

There are codegen changes in atomic-rmw.ll and atomic-cmpxchg.ll but the new
instruction sequences are semantically equivalent.

Differential Revision: https://reviews.llvm.org/D57085

llvm-svn: 352169
2019-01-25 05:04:00 +00:00
Matt Arsenault 39508331ef Reapply "IR: Add fp operations to atomicrmw"
This reapplies commits r351778 and r351782 with
RISCV test fixes.

llvm-svn: 351850
2019-01-22 18:18:02 +00:00