Commit Graph

1083 Commits

Author SHA1 Message Date
Craig Topper a21c557955 [RISCV] Remove Zbproposedc extension
This consists of 3 compressed instructions, c.not, c.neg, and c.zext.w.
I believe these have been picked up by the Zce effort using different
encodings. I don't think it makes sense to keep them in bitmanip. It
will eventually cause a conflict if/when Zce is implemented in llvm.

Differential Revision: https://reviews.llvm.org/D110871
2021-09-30 14:23:05 -07:00
Sander de Smalen 6709b193ea [SelectionDAG] Make WidenVecRes_EXTRACT_SUBVECTOR work for scalable vectors.
The legalizer handles this by breaking up an EXTRACT_SUBVECTOR into
smaller parts, and combines those together, padding the result with
UNDEF vectors, e.g.

  nxv6i64 extract_subvector(nxv12i64, 6)
  <->
  nxv8i64 concat(
    nxv2i64 extract_subvector(nxv16i64, 6)
    nxv2i64 extract_subvector(nxv16i64, 8)
    nxv2i64 extract_subvector(nxv16i64, 10)
    nxv2i64 undef)

Reviewed By: frasercrmck, david-arm

Differential Revision: https://reviews.llvm.org/D110253
2021-09-29 11:33:45 +01:00
Craig Topper a2a07e8db3 [RISCV] Fold store of vmv.x.s to a vse with VL=1.
This can avoid a loss of decoupling with the scalar unit on cores
with decoupled scalar and vector units.

We should support FP too, but those use extract_element and not a
custom ISD node so it is a little different. I also left a FIXME
in the test for i64 extract and store on RV32.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D109482
2021-09-27 09:54:46 -07:00
Craig Topper 933182e948 [RISCV] Improve support for forming widening multiplies when one input is a scalar splat.
If one input of a fixed vector multiply is a sign/zero extend and
the other operand is a splat of a scalar, we can use a widening
multiply if the scalar value has sufficient sign/zero bits.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D110028
2021-09-27 09:37:07 -07:00
Fraser Cormack e2b46e336b [DAGCombiner][VP] Fold zero-length or false-masked VP ops
This patch adds a generic DAGCombine for vector-predicated (VP) nodes.
Those for which we can determine that no vector element is active can be
replaced by either undef or, for reductions, the start value.

This is tested rather trivially at the IR level, where it's possible
that we want to teach instcombine to perform this optimization.

However, we can also see the zero-evl case arise during SelectionDAG
legalization, when wide VP operations can be split into two and the
upper operation emerges as trivially false.

It's possible that we could perform this optimization "proactively"
(both on legal vectors and before splitting) and reduce the width of an
operation and insert it into a larger undef vector:

```
v8i32 vp_add x, y, mask, 4
->
v8i32 insert_subvector (v8i32 undef), (v4i32 vp_add xsub, ysub, mask, 4), i32 0
```

This is somewhat analogous to similar vector narrow/widening
optimizations, but it's unclear at this point whether that's beneficial
to do this for VP ops for any/all targets.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D109148
2021-09-27 11:30:09 +01:00
Craig Topper 715cf6ffb9 [RISCV] Add another isel optimization for (and (shl X, c2), c1).
Where c1 is a shifted mask with 32-c2 leading zeros and c3 trailing
zeros and c3>c2. We can select it as (slli (srliw X, c3-c2), c3).
2021-09-24 15:10:25 -07:00
Stanislav Mekhanoshin 08d7eec06e Revert "Allow rematerialization of virtual reg uses"
Reverted due to two distcint performance regression reports.

This reverts commit 92c1fd19ab.
2021-09-24 10:26:11 -07:00
Hsiangkai Wang 7d39a8a921 [RISCV] (1/2) Add the tail policy argument to builtins/intrinsics.
Add the tail policy argument to LLVM IR intrinsics. There are two policies for tail elements. Tail agnostic means users do not care about the values in the tail elements and tail undisturbed means the values in the tail elements need to be kept after the operation. In order to let users control the tail policy, we add an additional argument at the end of the argument list.

For unmasked operations, we have no maskedoff and the tail policy is always tail agnostic. If users want to keep tail elements under unmasked operations, they could use all one mask in the masked operations to do it. So, we only add the additional argument for masked operations for most cases. There are exceptions listed below.

In this patch, we do not handle the following cases to reduce the complexity of the patch. There could be two separate patches for them.

* Use dest argument to control tail policy
vmerge.vvm/vmerge.vxm/vmerge.vim (add _t builtins with additional dest argument)
vfmerge.vfm (add _t builtins with additional dest argument)
vmv.v.v (add _t builtins with additional dest argument)
vmv.v.x (add _t builtins with additional dest argument)
vmv.v.i (add _t builtins with additional dest argument)
vfmv.v.f (add _t builtins with additional dest argument)
vadc.vvm/vadc.vxm/vadc.vim (add _t builtins with additional dest argument)
vsbc.vvm/vsbc.vxm (add _t builtins with additional dest argument)

* Always has tail argument for masked/unmasked intrinsics
Vector Single-Width Integer Multiply-Add Instructions (add _t and _mt builtins)
Vector Widening Integer Multiply-Add Instructions (add _t and _mt builtins)
Vector Single-Width Floating-Point Fused Multiply-Add Instructions (add _t and _mt builtins)
Vector Widening Floating-Point Fused Multiply-Add Instructions (add _t and _mt builtins)
Vector Reduction Operations (add _t and _mt builtins)
Vector Slideup Instructions (add _t and _mt builtins)
Vector Slidedown Instructions (add _t and _mt builtins)

Discussion: https://github.com/riscv/rvv-intrinsic-doc/pull/101

Differential Revision: https://reviews.llvm.org/D105092
2021-09-24 17:09:50 +08:00
Craig Topper 40b230f685 [RISCV] Limit transformAddImmMulImm to prevent an infinite loop.
This fixes an issue reported in D108607.
2021-09-23 15:53:11 -07:00
Craig Topper 70f50114f3 [RISCV] Add another isel optimization for (and (shl x, c2), c1)
Turn (and (shl x, c2), c1) -> (slli (srli x, c3-c2), c3) if c1 is a
shifted mask with no leading zeros and c3 trailing zeros where c3
is greater than c2.
2021-09-23 14:18:07 -07:00
Craig Topper 8811227a0c [RISCV] Add more tests for (and (shl x, C2), C1) that can be improved by using a pair of shifts. NFC
These tests have C1 as a shifted mask having no leading zeros and
C3 trailing zeros. If C3 is more than C2, we can select this as
(slli (srli x, C3-C2), C3).
2021-09-23 14:18:07 -07:00
Craig Topper 4a69551d66 [RISCV] Add more isel optimizations for (and (shr x, c2), c1).
Turn (and (shr x, c2), c1) -> (slli (srli x, c2+c3), c3) if c1 is a
shifted mask with c2 leading zeros and c3 trailing zeros.

When the leading zeros is C2+32 we can use SRLIW in place of SRLI.
2021-09-23 11:29:04 -07:00
Craig Topper 19734ae6f0 [RISCV] Add more tests for (and (srl x, C2), C1) that can be improved by using a pair of shifts. NFC
These tests have C1 as a shifted mask having C2 leading zeros and some
number of trailing zeros, C3. We can select this as
(slli (srli x, C2+C3), C3) or (slli (srliw x, C2+C3), C3).
2021-09-23 11:29:04 -07:00
Fraser Cormack e7c879a69d [RISCV][VP] Add support for VP_REDUCE_* operations
This patch adds codegen support for lowering the vector-predicated
reduction intrinsics to RVV instructions. The process is similar to that
of the other reduction intrinsics, save for the fact that every VP
reduction has a start value. We reuse the existing custom "VL" nodes,
adding extra patterns where required to handle non-true masks.

To support these nodes, the `RISCVISD::VECREDUCE_*_VL` nodes have been
given an explicit "merge" operand. This is to faciliate the VP
reductions, where we must be careful to ensure that even if no operation
is performed (when VL=0) we still produce the start value. The RVV
reductions don't update the destination register under these conditions,
so we tie the splatted start value to the output register.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D107657
2021-09-23 11:11:05 +01:00
Hsiangkai Wang ebc5feb4ed [RISCV] Update mir tests. 2021-09-23 09:42:16 +08:00
Craig Topper 16ba77d19c [RISCV] Remove stale FIXMEs from float-convert.ll and double-convert.ll. NFC 2021-09-22 14:25:40 -07:00
Craig Topper f0a422f935 [RISCV] Add fcvt.s.w(u)/fcvt.d.w(u)/fcvt.h.w(u) to hasAllNBitUsers
These instructions only read the lower 32 bits of their input.
2021-09-22 14:24:26 -07:00
Craig Topper c7e78150f7 [RISCV] Add test cases showing failure to use ADDIW before fcvt.s.w/fcvt.d.w/fcvt.h.w. NFC
By not using ADDIW we can cause both an ADDIW and ADDI to be emitted
when the add has multiple users.

These instructions needed be added to the list of instructions that
only use the lower 32 bits of input.

I've also added tests for the wu versions, but I'm having trouble
showing bad codegen from it.
2021-09-22 14:24:26 -07:00
Craig Topper b33a1cc05b [RISCV] Optimize vp.store with an all ones mask to avoid a vmset.
We can use riscv_vse intrinsic instead of riscv_vse_mask. The code here
is based on similar code for handling masked.scatter and vp.scatter.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D110206
2021-09-22 09:12:47 -07:00
Craig Topper aeb63d464f [RISCV] Teach RISCVTargetLowering::shouldSinkOperands to sink splats for and/or/xor.
This requires a minor change to CodeGenPrepare to ensure that
shouldSinkOperands will be called for And.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D110106
2021-09-21 10:07:29 -07:00
Ben Shi b3052013b4 [RISCV] Optimize (add (mul x, c0), c1)
Optimize (add (mul x, c0), c1) -> (ADDI (MUL (ADDI, c1/c0), c0), c1%c0),
if c1/c0 and c1%c0 are simm12, while c1 is not.

Optimize (add (mul x, c0), c1) -> (MUL (ADDI, c1/c0), c0),
if c1%c0 is zero, and c1/c0 is simm12 while c1 is not.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D108607
2021-09-21 14:13:14 +00:00
Craig Topper c6e52b1e85 [RISCV] Add test cases for missed opportunities to use vand/vor/vxor.vx. NFC
These are cases were the splat is in another basic block. CGP
needs to sink it to expose the opportunity to SelectionDAG.
2021-09-20 13:45:40 -07:00
Craig Topper a95ba81073 [RISCV] Teach RISCVTargetLowering::shouldSinkOperands to sink splats for FMA.
If either of the multiplicands is a splat, we can sink it to use
vfmacc.vf or similar.
2021-09-20 11:49:50 -07:00
Craig Topper 792101fff7 [RISCV] Add test cases for missed opportunity to use vfmacc.vf. NFC
This is another case of a splat being in another basic block
preventing SelectionDAG from optimizing it.
2021-09-20 11:49:50 -07:00
Craig Topper 04ab6c85ef [RISCV] Teach RISCVTargetLowering::shouldSinkOperands to sink splats for FAdd/FSub/FMul/FDiv. 2021-09-20 10:25:46 -07:00
Craig Topper 890027b314 [RISCV] Add test cases showing failure to use .vf vector operations when splat is in another basic block. NFC
We should have CGP copy the splats into the same basic block as the
FP operation so that SelectionDAG can fold them.
2021-09-20 10:25:38 -07:00
Craig Topper d85e347a28 [RISCV] Add a pass to recognize VLS strided loads/store from gather/scatter.
For strided accesses the loop vectorizer seems to prefer creating a
vector induction variable with a start value of the form
<i32 0, i32 1, i32 2, ...>. This value will be incremented each
loop iteration by a splat constant equal to the length of the vector.
Within the loop, arithmetic using splat values will be done on this
vector induction variable to produce indices for a vector GEP.

This pass attempts to dig through the arithmetic back to the phi
to create a new scalar induction variable and a stride. We push
all of the arithmetic out of the loop by folding it into the start,
step, and stride values. Then we create a scalar GEP to use as the
base pointer for a strided load or store using the computed stride.
Loop strength reduce will run after this pass and can do some
cleanups to the scalar GEP and induction variable.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D107790
2021-09-20 09:39:44 -07:00
Ben Shi dee5a8ca32 [RISCV] Optimize (add (shl x, c0), (shl y, c1)) with SH*ADD
Optimize (add (shl x, c0), (shl y, c1)) ->
         (SLLI (SH*ADD x, y), c1), if c0-c1 == 1/2/3.

Reviewed By: craig.topper, luismarques

Differential Revision: https://reviews.llvm.org/D108916
2021-09-19 16:35:12 +08:00
Craig Topper 73e5b9ea90 [RISCV] Select (srl (sext_inreg X, i32), uimm5) to SRAIW if only lower 32 bits are used.
SimplifyDemandedBits can turn srl into sra if the bits being shifted
in aren't demanded. This patch can recover the original sra in some cases.

I've renamed the tablegen class for detecting W users since the "overflowing operator"
term I originally borrowed from Operator.h does not include srl.

Reviewed By: luismarques

Differential Revision: https://reviews.llvm.org/D109162
2021-09-16 11:03:35 -07:00
Matt Arsenault 4a36e96c3f RegAllocGreedy: Account for reserved registers in num regs heuristic
This simple heuristic uses the estimated live range length combined
with the number of registers in the class to switch which heuristic to
use. This was taking the raw number of registers in the class, even
though not all of them may be available. AMDGPU heavily relies on
dynamically reserved numbers of registers based on user attributes to
satisfy occupancy constraints, so the raw number is highly misleading.

There are still a few problems here. In the original testcase that
made me notice this, the live range size is incorrect after the
scheduler rearranges instructions, since the instructions don't have
the original InstrDist offsets. Additionally, I think it would be more
appropriate to use the number of disjointly allocatable registers in
the class. For the AMDGPU register tuples, there are a large number of
registers in each tuple class, but only a small fraction can actually
be allocated at the same time since they all overlap with each
other. It seems we do not have a query that corresponds to the number
of independently allocatable registers. Relatedly, I'm still debugging
some allocation failures where overlapping tuples seem to not be
handled correctly.

The test changes are mostly noise. There are a handful of x86 tests
that look like regressions with an additional spill, and a handful
that now avoid a spill. The worst looking regression is likely
test/Thumb2/mve-vld4.ll which introduces a few additional
spills. test/CodeGen/AMDGPU/soft-clause-exceeds-register-budget.ll
shows a massive improvement by completely eliminating a large number
of spills inside a loop.
2021-09-14 21:00:29 -04:00
Craig Topper 1b736bda3b [RISCV] Enable CGP to sink splat operands of Add/Sub/Mul/Shl/LShr/AShr
LICM may have pulled out a splat, but with .vx instructions we
can fold it into an operation.

This patch enables CGP to reverse the LICM transform and move the
splat back into the loop.

I've started with the commutable integer operations and shifts, but we can
extend this with more operations in future patches.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D109394
2021-09-10 09:04:01 -07:00
Craig Topper 6c7cadb8c1 [RISCV] Teach vsetvli insertion that stores don't use the policy bits in vtype.
This can avoid a vsetvl after a tail undisturbed operation.

Differential Revision: https://reviews.llvm.org/D109549
2021-09-10 09:03:20 -07:00
Craig Topper 8c4803dc93 [RISCV] Add test cases showing failure to fold splatted shift amounts across basic blocks.
We should have CGP copy the splats into the same basic block as the
shift so that SelectionDAG can fold them.
2021-09-09 12:45:30 -07:00
Yvan Roux 261cbe98c3 [RISCV] Fix Machine Outliner jump table handling.
Don't outline machine instructions which are using jump table indexes
since they are materialized as local labels (like the already handled
case of constant pools).

Reviewed By: paquette

Differential Revision: https://reviews.llvm.org/D109436
2021-09-09 07:32:30 +02:00
Craig Topper a574f0e0c3 [RISCV] Disable use of i128 shift libcalls on RV32.
Since i128 isn't a legal C type on RV32, I don't believe
libgcc implements these functions for RV32. compiler-rt
does implement them because i128 support is enabled
in order to handle long double.

This is consistent with 32-bit X86 and ARM.

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D109383
2021-09-08 14:26:07 -07:00
Craig Topper c00cb52854 [RISCV] Pre-commit tests for D109394. NFC 2021-09-08 10:25:38 -07:00
Craig Topper b04c09c07c [RISCV] Use V0 instead of VMV0: for mask vectors in isel patterns.
This is consistent with the RVV intrinsic patterns. This has been
shown to prevent some "ran out of registers" errors in our internal
testing.

Unfortunately, there are some regressions on LMUL=8 tests in here.
I think the lack of registers with LMUL=8 just makes it very hard
to schedule correctly.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D109245
2021-09-08 09:46:21 -07:00
Craig Topper 1f16191906 [RISCV] Add an GPR def to the Zvlseg SPILL/RELOAD pseudos
The expansion of these pseudos creates ADD instructions. Those
ADDs modify a GPR so that it is no longer contains the same value
as the input base pointer. Therefore, I believe we should have a
GPR as a Def on these instructions and expansion should get the
destination register for the ADDs from that operand.

At least in our tests here this works out so that register
scavenging picks the same register as the base pointer.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D109405
2021-09-08 09:23:33 -07:00
David Green d8d24c64fe [DAG] Fix GT -> GE condition when creating SetCC
79845ed6df folded some setcc(ashr) conditions to setcc, but got
the condition for NE incorrect, using GT where it should be using GE.
2021-09-08 12:41:51 +01:00
Fraser Cormack 2c5568a6a9 [LegalizeTypes][VP] Add promotion support for binary VP ops
This patch extends the preliminary support for vector-predicated (VP)
operation legalization to include promotion of illegal integer vector
types.

Integer promotion of binary VP operations is relatively simple and
piggy-backs on the non-VP logic, but passing the two extra mask and VP
operands through to the promoted operation.

Tests have been added to the RISC-V target to cover the basic scenarios
for integer promotion for both fixed- and scalable-vector types.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D108288
2021-09-08 10:22:57 +01:00
Fraser Cormack a823bdf3ab [RISCV][VP] Custom lower VP_STORE and VP_LOAD
This patch adds support for the vector-predicated `VP_STORE` and
`VP_LOAD` nodes. We do this in the same way we lower `MSTORE` and
`MLOAD`: to regular load/store instructions via intrinsics.

One necessary change was made to `SelectionDAGLegalize` so that
`VP_STORE` nodes' operation actions are taken from the stored "value"
operands, in the same vein as `STORE` or `MSTORE`.

Reviewed By: craig.topper, rogfer01

Differential Revision: https://reviews.llvm.org/D108999
2021-09-07 10:53:25 +01:00
Fraser Cormack f4dee8cb82 [RISCV][VP] Custom lower VP_SCATTER and VP_GATHER
This patch adds support for the `VP_SCATTER` and `VP_GATHER` nodes by
lowering them to RVV's `vsox`/`vlux` instructions, respectively. This
process is almost identical to the existing `MSCATTER`/`MGATHER` support.

One extra change was made to `SelectionDAGLegalize` so that
`VP_SCATTER`'s operation action is derived from its stored "value"
operand rather than its return type (which is always the chain).

Reviewed By: craig.topper, rogfer01

Differential Revision: https://reviews.llvm.org/D108987
2021-09-07 10:43:07 +01:00
David Green 8523fb96a6 [DAG] Fold select_cc setgt X, -1, C, ~C -> xor (ashr X, BW-1), C
Given a select_cc producing a constant and a invertion of the constant
for a comparison more than zero, we can produce an xor with ashr
instead, which produces smaller code. The ashr either sets all bits or
clear all bits depending on if the value is negative. This is then xor'd
with the constant to optionally negate the value.
https://alive2.llvm.org/ce/z/DTFaBZ

This includes a OneUseCheck on the Cmp, which seems to make thinks a
little worse and will be removed in a followup.

Differential Revision: https://reviews.llvm.org/D109149
2021-09-05 16:04:01 +01:00
David Green 79845ed6df [DAG] Fold setcc eq with ashr to compare to zero.
Pulled out of D109149, this folds set_cc seteq (ashr X, BW-1), -1 ->
set_cc setlt X, 0 to prevent some regressions later on when folding
select_cc setgt X, -1, C, ~C -> xor (ashr X, BW-1), C

Differential Revision: https://reviews.llvm.org/D109214
2021-09-05 14:06:47 +01:00
David Green 7801d7963d [DAG] Add tests for select_cc and setcc with constant patterns. 2021-09-05 10:17:21 +01:00
Craig Topper 75620fadf5 [RISCV] Change how we encode AVL operands in vector pseudoinstructions to use GPRNoX0.
This patch changes the register class to avoid accidentally setting
the AVL operand to X0 through MachineIR optimizations.

There are cases where we really want to use X0, but we can't get that
past the MachineVerifier with the register class as GPRNoX0. So I've
use a 64-bit -1 as a sentinel for X0. All other immediate values should
be uimm5. I convert it to X0 at the earliest possible point in the VSETVLI
insertion pass to avoid touching the rest of the algorithm. In
SelectionDAG lowering I'm using a -1 TargetConstant to hide it from
instruction selection and treat it differently than if the user
used -1. A user -1 should be selected to a register since it doesn't
fit in uimm5.

This is the rest of the changes started in D109110. As mentioned there,
I don't have a failing test from MachineIR optimizations anymore.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D109116
2021-09-03 09:19:25 -07:00
Evandro Menezes cd6064bb9e [RISCV] Improve shrink wrap test (NFC)
Restore test for shrink wrapping disabled.
2021-09-02 12:14:04 -05:00
Craig Topper 498e8ae412 [RISCV] Add Zba command line to rv64i-exhaustive-w-insts.ll
Zba adds a zext.w pseudoinstruction using ADDUW. This can simplify
the generated code for many of these tests.

There are at least 2 suboptimal cases in this config that I've marked
with TODOs.
2021-09-02 08:36:27 -07:00
Craig Topper eaa560582a [RISCV] Remove stale TODOs from test. NFC
These were fixed by D106230.
2021-09-02 08:36:27 -07:00
Craig Topper b5fd6b46f5 [RISCV] Teach instruction selection to elide sext.w in some cases.
If a sext_inreg is up for isel, and all its users are W instructions,
we can skip emitting the sext_inreg. This helpful if the producing
instruction can't become a W instruction.

Reviewed By: asb

Differential Revision: https://reviews.llvm.org/D108966
2021-09-02 07:54:34 -07:00