Commit Graph

464 Commits

Author SHA1 Message Date
Craig Topper 173332d175 [RISCV] Manually emit the best shift for VSCALE lowering to improve codegen.
We assume VLENB is a multiple of 8 and previously relied on shift
pairs being optimized to an AND+SHL/SHR and computeKnownBits
removing the AND. This doesn't happen if (vlenb >> 3) gets CSEd
to have multiple uses. This patch manually emits the best shift
to workaround this.
2021-07-17 00:52:07 -07:00
Craig Topper 4dbb788068 [RISCV] Teach constant materialization that it can use zext.w at the end with Zba to reduce number of instructions.
If the upper 32 bits are zero and bit 31 is set, we might be able to
use zext.w to fill in the zeros after using an lui and/or addi.

Most of this patch is plumbing the subtarget features into the constant
materialization.

Reviewed By: luismarques

Differential Revision: https://reviews.llvm.org/D105509
2021-07-16 09:35:56 -07:00
Craig Topper 0ce13f92b7 [RISCV] Add curly braces around a case body that declares variables. NFC
This is at the end of the switch so doesn't cause any issues now,
but if a new case is added it will break.
2021-07-16 09:35:56 -07:00
Fraser Cormack e3fa2b1eab Revert "[RISCV] Lower more BUILD_VECTOR sequences to RVV's VID"
This reverts commit a6ca88e908.

More caution is required to avoid overflow/underflow. Thanks to the
santizers for catching this.
2021-07-16 15:00:20 +01:00
Fraser Cormack a6ca88e908 [RISCV] Lower more BUILD_VECTOR sequences to RVV's VID
This patch teaches the compiler to identify a wider variety of
`BUILD_VECTOR`s which form integer arithmetic sequences, and to lower
them to `vid.v` with modifications for non-unit steps and non-zero
addends.

The sequences handled by this optimization must either be monotonically
increasing or decreasing. Consecutive elements holding the same value
indicate a fractional step which, while simple mathematically,
becomes more complex to handle both in the realm of lossy integer
division and in the presence of `undef`s.

For example, a common "interleaving" shuffle index will be lowered by
LLVM to both `<0,u,1,u,2,...>` and `<u,0,u,1,u,...>` `BUILD_VECTOR`
nodes. Either of these would ideally be lowered to `vid.v` shifted right
by 1. Detection of this sequence in presence of general `undef` values
is more complicated, however: `<0,u,u,1,>` could match either
`<0,0,0,1,>` or `<0,0,1,1,>` depending on later values in the sequence.
Both are possible, so backtracking or multiple passes is inevitable.

Sticking to monotonic sequences keeps the logic simpler as it can be
done in one pass. Fractional steps will likely be a separate
optimization in a future patch.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D104921
2021-07-16 10:35:13 +01:00
Fraser Cormack 03a4702c88 [RISCV] Fix the neutral element in vector 'fadd' reductions
Using positive zero as the neutral element in 'fadd' reductions, while
it generates better code, is incorrect. The correct neutral element is
negative zero: 0.0 + -0.0 = 0.0, whereas -0.0 + -0.0 = -0.0.

There are perhaps more optimal lowerings of negative zero avoiding
constant-pool loads which could be left as future work.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D105902
2021-07-14 10:18:38 +01:00
Craig Topper 1e670dc7d7 [RISCV] Use DIVUW/REMUW/DIVW instructions for i8/i16/i32 udiv/urem/sdiv when LHS is constant.
We don't really have optimizations for division with a constant
LHS. If we don't use a W instruction we end up needing to sign
or zero extend the RHS to use the 64-bit instruction.

I had to sign_extend i32 constants on the LHS instead of using
any_extend which becomes zero_extend. If we don't do this, constants
that were originally negative become harder to materialize. I think
this problem exists for more of our W instruction cases. For example
(i32 (shl -1, X)), but we don't have lit tests. I'll work on that
as a follow up.

I also left a FIXME for enabling W instruction for RHS constants
under -Oz.

Reviewed By: luismarques

Differential Revision: https://reviews.llvm.org/D105769
2021-07-13 10:33:57 -07:00
Fangrui Song 3d89fb4d13 [RISCV] Support machine constraint "S"
Similar to D46745, "S" represents an absolute symbolic operand, which
can be used to specify the access models, e.g.

  extern int var;
  void *addr_via_asm() {
    void *ret;
    asm("lui %0, %%hi(%1)\naddi %0,%0,%%lo(%1)" : "=r"(ret) : "S"(&var));
    return ret;
  }

'S' is documented in trunk GCC: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101275

Reviewed By: luismarques

Differential Revision: https://reviews.llvm.org/D105254
2021-07-13 09:30:09 -07:00
Fraser Cormack d991b7212b [RISCV] Pass undef VECTOR_SHUFFLE indices on to BUILD_VECTOR
Often when lowering vector shuffles, we split the shuffle into two
LHS/RHS shuffles which are then blended together. To do so we split the
original indices into two, indexed into each respective vector. These
two index vectors are then separately lowered as BUILD_VECTORs.

This patch forwards on any undef indices to the BUILD_VECTOR, rather
than having the VECTOR_SHUFFLE lowering decide on an optimal concrete
index. The motiviation for ths change is so that we don't duplicate
optimization logic between the two lowering methods and let BUILD_VECTOR
do what it does best.

Propagating undef in this way allows us, for example, to generate
`vid.v` to produce the LHS indices of commonly-used interleave-type
shuffles. I have designs on further optimizing interleave-type and other
common shuffle patterns in the near future.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D104789
2021-07-13 10:41:54 +01:00
Craig Topper 12d51f95fe [RISCV] Implement lround*/llround*/lrint*/llrint* with fcvt instruction with -fno-math-errno
These are fp->int conversions using either RMM or dynamic rounding modes.

The lround and lrint opcodes have a return type of either i32 or
i64 depending on sizeof(long) in the frontend which should follow
xlen. llround/llrint should always return i64 so we'll need a libcall
for those on rv32.

The frontend will only emit the intrinsics if -fno-math-errno is in
effect otherwise a libcall will be emitted which will not use
these ISD opcodes.

gcc also does this optimization.

Reviewed By: arcbbb

Differential Revision: https://reviews.llvm.org/D105206
2021-07-06 11:43:22 -07:00
Craig Topper 2b5e53111a [RISCV] Add support for matching vwmul(u) and vwmacc(u) from fixed vectors.
This adds a DAG combine to detect sext/zext inputs and emit a
new ISD opcode. The extends will either be removed or replaced
with narrower extends.

Isel patterns are used to match add and widening mul to vwmacc
similar to the recently added vmacc patterns.

There's still some work to be to match vmulsu.
We should also rewrite splats that were extended as scalars and
then splatted.

Reviewed By: arcbbb

Differential Revision: https://reviews.llvm.org/D104802
2021-07-06 10:24:31 -07:00
Craig Topper 3b6dfa381e [RISCV] Protect the SHL/SRA/SRL handlers in LowerOperation against being called for an illegal i32 shift amount.
It seems it is possible for DAG combine to create a shl with an
i64 result type and an i32 shift amount. This is ok before type
legalization since the type don't need to match in SelectionDAG.
This results in type legalization calling LowerOperation to
legalize just the amount. We weren't expecting this so we
asserted for not finding a fixed vector shift.

To fix this, I've added a check for the fixed vector case and
returned SDValue() to get the default type legalizer. I've
factored all shifts together and added a fixed vector specific
handler to avoid repeating similar code for each in
LowerOperation.

The particular case I found was exposed by D104581, but the bad
shift is created after that patch triggers.
2021-06-29 09:45:13 -07:00
Jim Lin 779d2b0a42 [RISCV][NFC] Combine the control flow for different RetOp of interrupt function
Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D104838
2021-06-26 17:28:03 +08:00
Craig Topper d4f4a1ba62 [RISCV] Add DAG combine to detect opportunities to replace (i64 (any_extend (i32 X)) with sign_extend.
If type legalization is going to insert a sign_extend for other users
of X and we can fold the sign_extend into ADDW/MULW/SUBW, it is
better to replace the ANY_EXTEND so we don't end up with a separate
ADD/MUL/SUB instruction for the users of the ANY_EXTEND.

I'm only handling setcc uses right now, but there are other
instructions that force sign_extends like ashr.

There are probably other *W instructions we could use in addition
to ADDW/SUBW/MULW.

My motivating case was a loop terminating compare and a phi use
as seen in the new test file.

Reviewed By: asb

Differential Revision: https://reviews.llvm.org/D104581
2021-06-25 23:16:37 -07:00
Fraser Cormack a4729f7f88 [RISCV] Lower RVV vector SELECTs to VSELECTs
This patch optimizes the code generation of vector-type SELECTs (LLVM
select instructions with scalar conditions) by custom-lowering to
VSELECTs (LLVM select instructions with vector conditions) by splatting
the condition to a vector. This avoids the default expansion path which
would either introduce control flow or fully scalarize.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D104772
2021-06-24 10:12:51 +01:00
Fraser Cormack fed1503e85 [RISCV][VP] Lower FP VP ISD nodes to RVV instructions
With the exception of `frem`, this patch supports the current set of VP
floating-point binary intrinsics by lowering them to to RVV instructions. It
does so by using the existing `RISCVISD *_VL` custom nodes as an intermediate
layer. Both scalable and fixed-length vectors are supported by using this
method.

The `frem` node is unsupported due to a lack of available instructions. For
fixed-length vectors we could scalarize but that option is not (currently)
available for scalable-vector types. The support is intentionally left out so
it equivalent for both vector types.

The matching of vector/scalar forms is currently lacking, as scalable vector
types do not lower to the custom `VFMV_V_F_VL` node. We could either make
floating-point scalable vector splats lower to this node, or support the
matching of multiple kinds of splat via a `ComplexPattern`, much like we do for
integer types.

Reviewed By: rogfer01

Differential Revision: https://reviews.llvm.org/D104237
2021-06-17 10:04:00 +01:00
Fraser Cormack c75e454cb9 [RISCV] Transform unaligned RVV vector loads/stores to aligned ones
This patch adds support for loading and storing unaligned vectors via an
equivalently-sized i8 vector type, which has support in the RVV
specification for byte-aligned access.

This offers a more optimal path for handling of unaligned fixed-length
vector accesses, which are currently scalarized. It also prevents
crashing when `LegalizeDAG` sees an unaligned scalable-vector load/store
operation.

Future work could be to investigate loading/storing via the largest
vector element type for the given alignment, in case that would be more
optimal on hardware. For instance, a 4-byte-aligned nxv2i64 vector load
could loaded as nxv4i32 instead of as nxv16i8.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D104032
2021-06-14 18:12:18 +01:00
Fraser Cormack 502edebd9d [ValueTypes][RISCV] Cap RVV fixed-length vectors by size
This patch changes RVV's policy for its supported list of fixed-length
vector types by capping by vector size rather than element count. Now
all 1024-byte vectors (of supported element types) are supported, rather
than all 256-element vectors.

This is a more natural fit for the architecture, and allows us to, for
example, improve the support for vector bitcasts.

This change necessitated the adding of some new simple types to avoid
"regressing" on the number of currently-supported vectors. We round out
the 1024-byte types by adding `v512i8`, `v1024i8`, `v512i16` and
`v512f16`.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D103884
2021-06-09 12:15:37 +01:00
Fraser Cormack e8f1f89103 [RISCV] Support CONCAT_VECTORS on scalable masks
This patch is a simple fix which registers CONCAT_VECTORS as
custom-lowered for scalable mask vectors. This follows the pattern of
all other scalable-vector types, as the default expansion of
CONCAT_VECTORS cannot handle scalable types, and even if it did it'd go
through the stack and generate worse code.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D103896
2021-06-09 09:07:44 +01:00
Craig Topper f30f8b4f12 [RISCV] Lower i8/i16 bswap/bitreverse to grevi/greviw with Zbp.
Include known bits support so we know we don't need to zext the
output if the input was already zero extended.

Reviewed By: luismarques

Differential Revision: https://reviews.llvm.org/D103757
2021-06-07 10:31:51 -07:00
Craig Topper 8bde5f06a1 [RISCV] Replace && with ||. Spotted by coverity.
We should be exiting when the shift amount is greater than
the bit width regardless of whether it is a power of 2.

Reported by Simon Pilgrim here https://reviews.llvm.org/D96661

This requires getting a shift amount that is out of bounds that
wasn't already optimized by SelectionDAG. This would be pretty
trick to construct a test for.

Or it would require a non-power of 2 shift amount and a mask
that has runs of ones and zeros of the next lowest power of 2 from
that shift amount. I tried a little to produce a test for this,
but didn't get it to work.
2021-06-06 13:09:51 -07:00
Nikita Popov 1ffa6499ea [TargetLowering] Use IRBuilderBase instead of IRBuilder<> (NFC)
Don't require a specific kind of IRBuilder for TargetLowering hooks.
This allows us to drop the IRBuilder.h include from TargetLowering.h.

Differential Revision: https://reviews.llvm.org/D103759
2021-06-06 16:29:50 +02:00
Nikita Popov 9914200393 [CodeGen] Add missing includes (NFC)
These currently rely on the IRBuilder.h include in TargetLowering.h.
Make them explicit.
2021-06-06 15:48:27 +02:00
Fraser Cormack 3b0a33d0ad [RISCV] Expand unaligned fixed-length vector memory accesses
RVV vectors must be aligned to their element types, so anything less is
unaligned.

For regular loads and stores, our custom-lowering of fixed-length
vectors meant that we opted out of LegalizeDAG's built-in unaligned
expansion. This patch adds that logic in to our custom lower function.

For masked intrinsics, we declare that anything unaligned is not legal,
leaving the ScalarizeMaskedMemIntrin pass to do the expansion for us.

Note that neither of these methods can handle the expansion of
scalable-vector memory ops, so those cases are left alone by this patch.
Scalable loads and stores already go through expansion by default but
hit an assertion, and scalable masked intrinsics will silently generate
incorrect code. It may be prudent to return an error in both of these
cases.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D102493
2021-06-02 09:27:44 +01:00
Fraser Cormack 4f500c402b [RISCV] Support vector types in combination with fastcc
This patch extends the RISC-V lowering of the 'fastcc' calling
convention to vector types, both fixed-length and scalable. Without this
patch, any function passing or returning vector types by value would
throw a compiler error.

Vectors are handled in 'fastcc' much as they are in the default calling
convention, the noticeable difference being the extended set of scalar
GPR registers that can be used to pass vectors indirectly.

Reviewed By: HsiangKai

Differential Revision: https://reviews.llvm.org/D102505
2021-06-01 10:31:18 +01:00
Fraser Cormack 2b37c405cc [RISCV] Scale scalably-typed split argument offsets by VSCALE
This patch fixes a bug in lowering scalable-vector types in RISC-V's
main calling convention. When scalable-vector types are split and passed
indirectly, the target is responsible for scaling the offset --
initially set to the known-minimum store size -- by the scalable factor.

Before this we were issuing overlapping loads or stores to the different
parts, leading to incorrect codegen.

Credit to @HsiangKai for spotting this.

Reviewed By: HsiangKai

Differential Revision: https://reviews.llvm.org/D103262
2021-05-31 10:43:13 +01:00
Fraser Cormack eb23936591 [RISCV] Support vector conversions between fp and i1
This patch custom lowers FP_TO_[US]INT and [US]INT_TO_FP conversions
between floating-point and boolean vectors. As the default action is
scalarization, this patch both supports scalable-vector conversions and
improves the code generation for fixed-length vectors.

The lowering for these conversions can piggy-back on the existing
lowering, which lowers the operations to a supported narrowing/widening
conversion and then either an extension or truncation.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D103312
2021-05-31 09:55:39 +01:00
Fraser Cormack 5a80dc4988 [VP][SelectionDAG] Add a target-configurable EVL operand type
This patch adds a way for the target to configure the type it uses for
the explicit vector length operands of VP SDNodes. The type must be a
legal integer type (there is still no target-independent legalization of
this operand) and must currently be at least as big as i32, the type
used by the IR intrinsics. An implicit zero-extension takes place on
targets which choose a larger type. All VP nodes should be created with
this type used for the EVL operand.

This allows 64-bit RISC-V to avoid custom legalization of all VP nodes,
keeping them in their target-independent form for that bit longer.

Reviewed By: simoll

Differential Revision: https://reviews.llvm.org/D103027
2021-05-27 15:27:36 +01:00
Fraser Cormack b7101e218c [DAGCombine][RISCV] Don't try to trunc-store combined vector stores
DAGCombine's `mergeStoresOfConstantsOrVecElts` optimization is told
whether it's to use vector types and also whether it's to issue a
truncating store. However, the truncating store code path assumes a
scalar integer `ConstantSDNode`, and when using vector types it creates
either a `BUILD_VECTOR` or `CONCAT_VECTORS` to store: neither of which
is a constant.

The `riscv64` target is able to expose a crash here because it switches
on both code paths at the same time. The `f32` is stored as `i32` which
must be promoted to `i64`, necessitating a truncating store.
It also decides later that it prefers a vector store of `v2f32`.

While vector truncating stores are legal, this combine is not able to
emit them. We also don't have a test case. This patch adds an assert to
catch this case more gracefully, and updates one of the caller functions
to the function to turn off the use of truncating stores when preferring
vectors.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D103173
2021-05-27 14:16:32 +01:00
Fraser Cormack 8c73a31c11 [RISCV] Allow passing fixed-length vectors via the stack
The vector calling convention dictates that when the vector argument
registers are exhaused, GPRs are used to pass the address via the stack.
When the GPRs themselves are exhausted, at best we would previously
crash with an assertion, and at worst we'd generate incorrect code.

This patch addresses this issue by passing fixed-length vectors via the
stack with their full fixed-length size and aligned to their element
type size. Since the calling convention lowering can't yet handle
scalable vector types, this patch adds a fatal error to make it clear
that we are lacking in this regard.

Reviewed By: HsiangKai

Differential Revision: https://reviews.llvm.org/D102422
2021-05-27 14:14:07 +01:00
Fraser Cormack 772b58a641 [SelectionDAG][RISCV] Don't unroll 0/1-type bool VSELECTs
This patch extends the cases in which the legalizer is able to express
VSELECT in terms of XOR/AND/OR. When dealing with a VSELECT between
boolean vector types, the mask itself is an all-ones or all-ones value
of the operand type, so a 0/1 boolean type behaves identically to a 0/-1
type.

This greatly helps RISC-V which relies on expansion for these nodes. It
also allows scalable-vector bool VSELECTs to use the default expansion,
where before it would crash in SelectionDAG::UnrollVectorOp.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D103147
2021-05-27 10:08:57 +01:00
Craig Topper 9065118b64 [RISCV] Optimize SEW=64 shifts by splat on RV32.
SEW=64 shifts only uses the log2(64) bits of shift amount. If we're
splatting a 64 bit value in 2 parts, we can avoid splatting the
upper bits and just let the low bits be sign extended. They won't
be read anyway.

For the purposes of SelectionDAG semantics of the generic ISD opcodes,
if hi was non-zero or bit 31 of the low is 1, the shift was already
undefined so it should be ok to replace high with sign extend of low.

In order do be able to find the split i64 value before it becomes
a stack operation, I added a new ISD opcode that will be expanded
to the stack spill in PreprocessISelDAG. This new node is conceptually
similar to BuildPairF64, but I expanded earlier so that we could
go through regular isel to get the right VLSE opcode for the LMUL.
BuildPairF64 is expanded in a CustomInserter.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D102521
2021-05-26 10:23:32 -07:00
Craig Topper b510e4cf1b [RISCV] Add a vsetvli insert pass that can be extended to be aware of incoming VL/VTYPE from other basic blocks.
This is a replacement for D101938 for inserting vsetvli
instructions where needed. This new version changes how
we track the information in such a way that we can extend
it to be aware of VL/VTYPE changes in other blocks. Given
how much it changes the previous patch, I've decided to
abandon the previous patch and post this from scratch.

For now the pass consists of a single phase that assumes
the incoming state from other basic blocks is unknown. A
follow up patch will extend this with a phase to collect
information about how VL/VTYPE change in each block and
a second phase to propagate this information to the entire
function. This will be used by a third phase to do the
vsetvli insertion.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D102737
2021-05-24 11:47:27 -07:00
Fraser Cormack 7a211ed110 [RISCV] Prevent store combining from infinitely looping
RVV code generation does not successfully custom-lower BUILD_VECTOR in all
cases. When it resorts to default expansion it may, on occasion, be expanded to
scalar stores through the stack. Unfortunately these stores may then be picked
up by the post-legalization DAGCombiner which merges them again. The merged
store uses a BUILD_VECTOR which is then expanded, and so on.

This patch addresses the issue by overriding the `mergeStoresAfterLegalization`
hook. A lack of granularity in this method (being passed the scalar type) means
we opt out in almost all cases when RVV fixed-length vector support is enabled.
The only exception to this rule are mask vectors, which are always either
custom-lowered or are expanded to a load from a constant pool.

Reviewed By: HsiangKai

Differential Revision: https://reviews.llvm.org/D102913
2021-05-24 10:19:32 +01:00
Fraser Cormack c74ab891fc [RISCV] Ensure small mask BUILD_VECTORs aren't expanded
The default expansion for BUILD_VECTORs -- save for going through
shuffles -- is to go through the stack. This method only works when the
type is at least byte-sized, so for v2i1 and v4i1 we would crash.

This patch ensures that small mask-type BUILD_VECTORs are always handled
without crashing. We lower to a SETCC of the equivalent i8 type.

This also exposes some pre-existing issues where the lowering when
optimizing for size results in larger code than without. Those will be
tackled in future patches.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D102767
2021-05-20 19:12:29 +01:00
Fraser Cormack 26bd2250c1 [RISCV] Ensure shuffle splat operands are type-legal
The use of `SelectionDAG::getSplatValue` isn't guaranteed to return a
type-legal splat value as it may implicitly extract a vector element
from another shuffle. It is not permitted to introduce an illegal type
when lowering shuffles.

This patch addresses the crash by adding a boolean flag to
`getSplatValue`, defaulting to false, which when set will ensure a
type-legal return value. If it is unable to do that it will fail to
return a splat value.

I've been through the existing uses of `getSplatValue` in other targets
and was unable to find a need or test cases showing a need to update
their uses. In some cases, the call is made during `LegalizeVectorOps`
which may still produce illegal scalar types. In other situations, the
illegally-typed splat value may be quickly patched up to a legal type
(such as any-extending the returned `extract_vector_elt` up to a legal
type) before `LegalizeDAG` notices.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D102687
2021-05-20 18:00:03 +01:00
Fraser Cormack ca2c245ba4 [RISCV] Support INSERT_VECTOR_ELT into i1 vectors
Like the element extraction of these vectors, we choose to promote up to
an i8 vector type and perform the insertion there.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D102697
2021-05-19 09:41:50 +01:00
Craig Topper 9c345407b4 [RISCV] Remove RISCVII:VSEW enum. Make encodeVYPE operate directly on SEW.
The VSEW encoding isn't a useful value to pass around. It's better
to use SEW or log2(SEW) directly. The only real ugliness is that
the vsetvli IR intrinsics use the VSEW encoding, but it's easy
enough to decode that when the intrinsic is processed.
2021-05-12 13:19:08 -07:00
Evandro Menezes 3a64b7080d [RISCV] Move instruction information into the RISCVII namespace (NFC)
Move instruction attributes into the `RISCVII` namespace and add associated helper functions.

Differential Revision: https://reviews.llvm.org/D102268
2021-05-11 16:32:42 -05:00
Craig Topper ce6e4f27dd [RISCV] Use fractional LMULs for fixed length types smaller than riscv-v-vector-bits-min.
My thought process is that if v2i64 is an LMUL=1 type then v2i32
should be an LMUL=1/2 type. We limit the fractional LMUL so that
SEW=64 clips to LMUL=1, SEW=32 clips to LMUL=1/2, etc. This
ensures there's always a fractional LMUL available to truncate a type.
This does reduce the number of vsetvlis in some cases.

Some tests increase vsetvlis because the best container type for a
mask type is dependent on the LMUL+SEW that the mask was produced
from, but you can't tell that from the type. I think this is
something we need to solve this in the machine IR when optimizing
vsetvlis.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D101215
2021-05-11 09:42:48 -07:00
Craig Topper 80b9510806 [RISCV] Correct VL for fixed length masked scatter.
We were incorrectly calling getVectorNumElements on a scalable
vector type. This shouldn't be allowed. This gives a warning on
EVT, but not MVT.
2021-05-10 09:50:08 -07:00
Fraser Cormack 6db0cedd23 [LegalizeVectorOps][RISCV] Add scalable-vector SELECT expansion
This patch extends VectorLegalizer::ExpandSELECT to permit expansion
also for scalable vector types. The only real change is conditionally
checking for BUILD_VECTOR or SPLAT_VECTOR legality depending on the
vector type.

We can use this to fix "cannot select" errors for scalable vector
selects on the RISCV target. Note that in future patches RISCV will
possibly custom-lower vector SELECTs to VSELECTs for branchless codegen.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D102063
2021-05-10 08:22:35 +01:00
Craig Topper 6660319cef [RISCV] Remove unused RISCV::VLEFF and VLEFF_MASK. NFC
Looks like these got left behind when vleff isel was moved to
X86ISelDAGToDAG.cpp
2021-05-06 09:41:29 -07:00
Fraser Cormack 6f17613bfb [RISCV][VP] Lower VP ISD nodes to RVV instructions
This patch supports all of the current set of VP integer binary
intrinsics by lowering them to to RVV instructions. It does so by using
the existing RISCVISD *_VL custom nodes as an intermediate layer. Both
scalable and fixed-length vectors are supported by using this method.

One notable change to the existing vector codegen strategy is that
scalable all-ones and all-zeros mask SPLAT_VECTORs are now lowered to
RISCVISD VMSET_VL and VMCLR_VL nodes to match their fixed-length
BUILD_VECTOR counterparts. This allows them to reuse the existing
"all-ones" VL patterns.

To reduce the size of the phabricator diff, some tests are intentionally
left out and will be added later if the patch is accepted.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D101826
2021-05-05 12:32:24 +01:00
Fraser Cormack cd6a52fede [RISCV] Cap legal fixed-length vectors to 256-element types
Previously, RISC-V would make legal all fixed-length vectors types whose
size are less than or equal to some function of the minimum value of
VLEN and the maximum-permissible LMUL grouping.

Due to vector legalization issues, this patch instead caps the legal
fixed-length vector types to those with 256 elements. This value was
chosen because it is the longest vector length which has corresponding
MVTs across all supported element types.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D101839
2021-05-05 09:51:08 +01:00
Fraser Cormack 46fa214a6f [RISCV] Lower splats of non-constant i1s as SETCCs
This patch adds support for splatting i1 types to fixed-length or
scalable vector types. It does so by lowering the operation to a SETCC
of the equivalent i8 type.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D101465
2021-05-04 09:14:05 +01:00
Fraser Cormack d23e4f6872 [RISCV] Add support for fmin/fmax vector reductions
Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D101518
2021-05-03 10:33:51 +01:00
Craig Topper ba63cdb8f2 [RISCV] Store SEW in RISCV vector pseudo instructions in log2 form.
This shrinks the immediate that isel table needs to emit for these
instructions. Hoping this allows me to change OPC_EmitInteger to
use a better variable length encoding for representing negative
numbers. Similar to what was done a few months ago for OPC_CheckInteger.

The alternative encoding uses less bytes for negative numbers, but
increases the number of bytes need to encode 64 which was a very
common number in the RISCV table due to SEW=64. By using Log2 this
becomes 6 and is no longer a problem.
2021-05-02 12:09:20 -07:00
Fraser Cormack 791766e6d2 [RISCV] Support STEP_VECTOR with a step greater than one
DAGCombiner was recently taught how to combine STEP_VECTOR nodes,
meaning the step value is no longer guaranteed to be one by the time it
reaches the backend for lowering.

This patch supports such cases on RISC-V by lowering to other step
values to a multiply following the vid.v instruction. It includes a
small optimization for common cases where the multiply can be expressed
as a shift left.

Reviewed By: rogfer01

Differential Revision: https://reviews.llvm.org/D100856
2021-04-30 09:36:18 +01:00
Craig Topper dcdda2bdf2 [RISCV] Teach DAG combine to fold (and (select_cc lhs, rhs, cc, -1, c), x) -> (select_cc lhs, rhs, cc, x, (and, x, c))
Similar for or/xor with 0 in place of -1.

This is the canonical form produced by InstCombine for something like `c ? x & y : x;` Since we have to use control flow to expand select we'll usually end up with a mv in basic block. By folding this we may be able to pull the and/or/xor into the block instead and avoid a mv instruction.

The code here is based on code from ARM that uses this to create predicated instructions. I'm doing it on SELECT_CC so it happens late, but we could do it on select earlier which is what ARM does. I'm not sure if we lose any combine opportunities if we do it earlier.

I left out add and sub because this can separate sext.w from the add/sub. It also made a conditional i64 addition/subtraction on RV32 worse. I guess both of those would be fixed by doing this earlier on select.

The select-binop-identity.ll test has not been commited yet, but I made the diff show the changes to it.

Reviewed By: luismarques

Differential Revision: https://reviews.llvm.org/D101485
2021-04-29 09:43:51 -07:00
Craig Topper 0c330afdfa [RISCV] Enable SPLAT_VECTOR for fixed vXi64 types on RV32.
This replaces D98479.

This allows type legalization to form SPLAT_VECTOR_PARTS so we don't
lose the splattedness when the scalar type is split.

I'm handling SPLAT_VECTOR_PARTS for fixed vectors separately so
we can continue using non-VL nodes for scalable vectors.

I limited to RV32+vXi64 because DAGCombiner::visitBUILD_VECTOR likes
to form SPLAT_VECTOR before seeing if it can replace the BUILD_VECTOR
with other operations. Especially interesting is a splat BUILD_VECTOR of
the extract_vector_elt which can become a splat shuffle, but won't if
we form SPLAT_VECTOR first. We either need to reorder visitBUILD_VECTOR
or add visitSPLAT_VECTOR.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D100803
2021-04-29 08:20:09 -07:00
Craig Topper 25391cec3a [RISCV] Teach computeKnownBits that vsetvli returns number less than 2^31.
This seems like a reasonable upper bound on VL. WG discussions for
the V spec would probably allow us to use 2^16 as an upper bound
on VLEN, but this is good enough for now.

This allows us to remove sext and zext if user happens to assign
the size_t result into an int and then uses it as a VL intrinsic
argument which is size_t.

Reviewed By: frasercrmck, rogfer01, arcbbb

Differential Revision: https://reviews.llvm.org/D101472
2021-04-29 08:07:59 -07:00
Fraser Cormack 43ad058a01 [RISCV] Fix stack slot for argument types (Bug 49500)
This is an complementary/alternative fix for D99068. It takes a slightly
different approach by explicitly summing up all of the required split
part type sizes and ensuring we allocate enough space for them. It also
takes the maximum alignment of each part.

Compared with D99068 there are fewer changes to the stack objects in
existing tests. However, @luismarques has shown in that patch that there
are opportunities to reduce our stack usage in the future.

Reviewed By: luismarques

Differential Revision: https://reviews.llvm.org/D99087
2021-04-29 09:10:48 +01:00
Craig Topper ce09dd54e6 [RISCV] Select 5 bit immediate for VSETIVLI during isel rather than peepholing in the custom inserter.
This adds a special operand type that is allowed to be either
an immediate or register. By giving it a unique operand type the
machine verifier will ignore it.

This perturbs a lot of tests but mostly it is just slightly different
instruction orders. Something bad did happen to some min/max reduction
tests. We're spilling vector registers when we weren't before.

Reviewed By: khchen

Differential Revision: https://reviews.llvm.org/D101246
2021-04-27 14:38:16 -07:00
Craig Topper 262a72f50f [RISCV] Use stack slot to handle SPLAT_VECTOR_PARTS on RV32.
Reduces the amount of vector ALU operations and reduces vector
register pressure.
2021-04-26 15:43:02 -07:00
Craig Topper e2cd92cb9b [RISCV] Match splatted load to scalar load + splat. Form strided load during isel.
This modifies my previous patch to push the strided load formation
to isel. This gives us opportunity to fold the splat into a .vx
operation first. Using a scalar register and a .vx operation reduces
vector register pressure which can be important for larger LMULs.

If we can't fold the splat into a .vx operation, then it can make
sense to use a strided load to free up the vector arithmetic
ALU to do actual arithmetic rather than tying it up with vmv.v.x.

Reviewed By: khchen

Differential Revision: https://reviews.llvm.org/D101138
2021-04-26 13:32:03 -07:00
Craig Topper 837442de9c [RISCV] Cleanup setOperationAction calls for INTRINSIC_WO_CHAIN/INTRINSIC_W_CHAIN
We have several extensions that need i32 to be Custom for
INTRINSIC_WO_CHAIN with RV64 so enable it for all RV64.

For V extension, make i32 Custom for RV64 and i64 Custom for RV32.
When the i32 or i64 is legal, the operation action doesn't matter.
LegalizeDAG checks MVT::Other rather than the real type.
2021-04-25 23:44:28 -07:00
Craig Topper 8f5cd49405 [RISCV] Teach DAG combine what bits Zbp instructions demanded from their inputs.
This teaches DAG combine that shift amount operands for grev, gorc
shfl, unshfl only read a few bits.

This also teaches DAG combine that grevw, gorcw, shflw, unshflw,
bcompressw, bdecompressw only consume the lower 32 bits of their
inputs.

In the future we can teach SimplifyDemandedBits to also propagate
demanded bits of the output to the inputs in some cases.
2021-04-25 21:54:06 -07:00
Levy Hsu 8cf54c7ff5 [RISCV] [1/2] Add IR intrinsic for Zbe extension
RV32/64:
bcompress
bdecompress

RV64 ONLY:
bcompressw
bdecompressw

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D101143
2021-04-25 19:14:34 -07:00
Craig Topper bd28d86119 [RISCV] Removed getLMULForFixedLengthVector.
Use getContainerForFixedLengthVector and getRegClassIDForVecVT to
get the register class to use when making a fixed vector type legal.

Inline it into the other two call sites.

I'm looking into using fractional lmul for fixed length vectors
and getLMULForFixedLengthVector returned an integer making it
unable to express this. I considered returning the LMUL
enum, but that seemed like it would introduce more complexity to
convert it for use.
2021-04-23 16:56:46 -07:00
Craig Topper bcf321015b [RISCV] Move getLMULForFixedLengthVector out of RISCVSubtarget.
Make it a static function RISCVISelLowering, the only place it
is used.

I think I'm going to make this return a fractional LMULs in some
cases so I'm sorting out where it should live before I start
making changes.
2021-04-23 15:06:20 -07:00
Craig Topper baa107f018 [RISCV] Only expose one interface for getContainerForFixedLengthVector in the RISCVTargetLowering class
We can have RISCVISelDAGToDAG.cpp call the VT only version by
finding the RISCVTargetLowering object via the Subtarget.

Make the static versions just global static functions in
RISCVISelLowering that can be called by static functions in that
file.
2021-04-23 15:06:10 -07:00
Fraser Cormack 83b8f8da82 [RISCV] Custom lower vector F(MIN|MAX)NUM to vf(min|max)
This patch adds support for both scalable- and fixed-length vector code
lowering of the llvm.minnum and llvm.maxnum intrinsics to the equivalent
RVV instructions.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D101035
2021-04-23 12:22:15 +01:00
Levy Hsu b49337bbb9 [RISCV] [1/2] Add IR intrinsic for Zbp extension
RV32/64:
    grev
    grevi
    gorc
    gorci
    shfl
    shfli
    unshfl
    unshfli

RV64 ONLY:
    grevw
    greviw
    gorcw
    gorciw
    shflw
    shfli     (For non-existing shfliw)
    unshfli   (For non-existing unshfliw)

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D100830
2021-04-22 16:34:51 -07:00
Craig Topper 70254ccb69 [RISCV] Turn splat shuffles of vector loads into strided load with stride of x0.
Implementations are allowed to optimize an x0 stride to perform
less memory accesses. This is the case in SiFive cores.

No idea if this is the case in other implementations. We might
need a tuning flag for this.

Reviewed By: frasercrmck, arcbbb

Differential Revision: https://reviews.llvm.org/D100815
2021-04-22 10:02:57 -07:00
Craig Topper 77f14c96e5 [RISCV] Use stack temporary to splat two GPRs into SEW=64 vector on RV32.
Rather than doing splatting each separately and doing bit manipulation
to merge them in the vector domain, copy the data to the stack
and splat it using a strided load with x0 stride. At least on
some implementations this vector load is optimized to not do
a load for each element.

This is equivalent to how we move i64 to f64 on RV32.

I've only implemented this for the intrinsic fallbacks in this
patch. I think we do similar splatting/shifting/oring in other
places. If this is approved, I'll refactor the others to share
the code.

Differential Revision: https://reviews.llvm.org/D101002
2021-04-22 09:50:07 -07:00
Serge Pavlov 740962e5d0 [RISCV] Custom lowering of SET_ROUNDING
Differential Revision: https://reviews.llvm.org/D91242
2021-04-22 15:04:55 +07:00
Craig Topper 58c5b4c2c3 [RISCV] Use TargetConstant for condition code of RISCVISD::SELECT_CC.
The value is always an immediate and can never be in a register.
This the kind of thing TargetConstant is for.

Saves a step GenDAGISel to convert a Constant to a TargetConstant.
2021-04-21 23:08:52 -07:00
Serge Pavlov 6e63dfdae2 [RISCV] Custom lowering of FLT_ROUNDS_
Differential Revision: https://reviews.llvm.org/D90854
2021-04-22 11:39:15 +07:00
Craig Topper f6d8cf7798 [RISCV] Teach lowerSPLAT_VECTOR_PARTS to detect cases where Hi is sign extended from Lo.
This recognizes the case when Hi is (sra Lo, 31). We can use
SPLAT_VECTOR_I64 rather than splatting the high bits and
combining them in the vector register.
2021-04-21 20:24:23 -07:00
Craig Topper 87afefcd22 [RISCV] Fix mistake in comment. NFC 2021-04-19 11:15:32 -07:00
Craig Topper 7ed01a420a [RISCV] Pad v4i1/v2i1/v1i1 stores with 0s to make a full byte.
As noted in the FIXME there's a sort of agreement that the any
extra bits stored will be 0.

The generated code is pretty terrible. I was really hoping we
could use a tail undisturbed trick, but tail undisturbed no
longer applies to masked destinations in the current draft
spec.

Fingers crossed that it isn't common to do this. I doubt IR
from clang or the vectorizer would ever create this kind of store.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D100618
2021-04-19 11:05:18 -07:00
Fraser Cormack c9a93c3e01 [RISCV] Lower vector shuffles to vrgather operations
This patch extends the lowering of RVV fixed-length vector shuffles to
avoid the default stack expansion and instead lower to vrgather
instructions.

For "permute"-style shuffles where one vector is swizzled, we can lower
to one vrgather. For shuffles involving two vector operands, we lower to
one unmasked vrgather (or splat, where appropriate) followed by a masked
vrgather which blends in the second half.

On occasion, when it's not possible to create a legal BUILD_VECTOR for
the indices, we use vrgatherei16 instructions with 16-bit index types.

For 8-bit element vectors where we may have indices over 255, we have a
fairly blunt fallback to the stack expansion to avoid custom-splitting
of the vector types.

To enable the selection of masked vrgather instructions, this patch
extends the various RISCVISD::VRGATHER nodes to take a passthru operand.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D100549
2021-04-19 11:13:13 +01:00
Craig Topper 1afdfc6169 [RISCV] Rename RISCVISD::GREVI(W)/GORCI(W) to RISCVISD::GREV(W)/GORC(W). Don't require second operand to be a constant.
Prep work for adding intrinsics for these instructions in the
future.
2021-04-13 11:04:28 -07:00
Craig Topper 7c9bbbf735 [RISCV] Rename RISCVISD::SHFLI to RISCVISD::SHFL and don't require the second operand to be an immediate.
Prep work for adding intrinsics in the future.

Left an assert that the input is constant in ReplaceNodeResults,
as the intrinsic shouldn't go through that path.
2021-04-12 23:46:50 -07:00
Craig Topper cb4c793e46 [RISCV] Update computeKnownBitsForTargetNode to treat READ_VLENB as being 16 byte aligned.
According to the 0.10 spec, VLEN is at least 128 bits and is a
power of 2.
2021-04-11 17:54:23 -07:00
Craig Topper bc0e052730 [RISCV] Teach targetShrinkDemandedConstant to preserve (and X, 0xffff) when zext.h is supported.
Similar to what we do for zext.w.

Disable the (srl (and X, 0xffff), C) custom isel when zext.h is
available.
2021-04-11 10:03:35 -07:00
Fraser Cormack a5693445ca [RISCV] Support OR/XOR/AND reductions on vector masks
This patch adds RVV codegen support for OR/XOR/AND reductions for both
scalable- and fixed-length vector types. There are a few possible
codegen strategies for each -- vmfirst.m, vmsbf.m, and vmsif.m could be
used to some extent -- but the vpopc.m instruction was chosen since it
produces the scalar result in one instruction, after which scalar
instructions can finish off the computation.

The reductions are lowered identically for both scalable- and
fixed-length vectors, although some alternate strategies may be more
optimal on fixed-length vectors since it's cheaper to get the length of
those types.

Other reduction types were not deemed to be relevant for mask vectors.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D100030
2021-04-08 09:46:38 +01:00
Serge Pavlov 65b1103798 [RISCV] DAG nodes and pseudo instructions for CSR access
New custom DAG nodes were added to represent operations on CSR. These
nodes are lowered to corresponding pseudo instruction. Using the pseudo
instructions allows to specify different scheduling information for
operations on different system registers. It also make possible to
specify dependencies of instructions on specific system registers.

Differential Revision: https://reviews.llvm.org/D98936
2021-04-08 10:36:36 +07:00
Craig Topper 56ea2e2fdd [RISCV] Add a special case to lowerSELECT for select of 2 constants with a SETLT condition.
If the constants have a difference of 1 we can convert one to
the other by adding or subtracting the condition.

We have a DAG combine for this, but it only runs before type
legalization. If the select is introduced later during type
legalization or op legalization we will miss it.

We don't need a specific condition, but some conditions are
harder to materialize than others on RISCV. I know that SETLT
will be a single instruction and it is what is used by the
motivating pattern from signed saturating add/sub.

Differential Revision: https://reviews.llvm.org/D99021
2021-04-07 13:47:17 -07:00
Craig Topper f087d7544a [RISCV] Support vslide1up/down intrinsics for SEW=64 on RV32.
This can't use our normal strategy of splatting the scalar and using
a .vv operation instead of .vx.

Instead this patch bitcasts the vector to the equivalent SEW=32
vector and inserts the scalar parts using two vslide1up/down. We
do that unmasked and apply the mask separately at the end with
a vmerge.

For vslide1up there maybe some other options here like getting
i64 into element 0 and using vslideup.vi with this vector as
vd and the original source as vs1. Masking would still need to
be done afterwards.

That idea doesn't work for vslide1down. We need to slidedown and
then insert a single scalar at vl-1 which we could do with a
vslideup, but that assumes vl > 0 which I don't think we can assume.

The i32 double slide1down implemented here is the best I could come
up with and I just made vslide1up consistent.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D99910
2021-04-07 10:44:53 -07:00
Craig Topper 01a23dccb1 [RISCV] Add an assertion to the ReplaceNodeResults handling of bitcasts to make sure the VT is always a scalar integer. 2021-04-06 16:48:40 -07:00
Craig Topper 2641c1f15e [RISCV] Don't custom type legalize fixed vector to scalar integer bitcasts if the fixed vector type isn't legal.
We encountered a hang in our internal code base. I'm having trouble
creating a test case because the test that hit it was testing some
code that is not upstream.
2021-04-06 15:00:33 -07:00
Craig Topper af2837675a [RISCV] Split RISCVISD::VMV_S_XF_VL into separate integer and FP.
It's a bit silly, but it allows us to write stricter type
constraints for isel. There's still some extra type checks in
the generated table due to some type interference limitations
around HWMode.
2021-04-05 12:57:35 -07:00
Fraser Cormack af3a839c70 [RISCV] Add support for bitcasts between scalars and fixed-length vectors
This patch supports bitcasts from scalar types to fixed-length vectors
and vice versa. It custom-lowers and custom-legalizes them to
EXTRACT_VECTOR_ELT/INSERT_VECTOR_ELT operations, using a single-element
vectors to hold the scalar where appropriate.

Previously, some of these would fail to select, others would be expanded
through stack loads and stores. Effort was made to ensure the codegen
avoids the stack for both legal and illegal scalar types.

Some of the codegen could be improved, but on first glance it looks like
a general optimization of EXTRACT_VECTOR_ELT when extracting an i64
element on RV32.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D99667
2021-04-05 17:21:55 +01:00
Fraser Cormack 3f0df4d7b0 [RISCV] Expand scalable-vector truncstores and extloads
Caught in internal testing, these operations are assumed legal by
default, even for scalable vector types. Expand them back into separate
truncations and stores, or loads and extensions.

Also add explicit fixed-length vector tests for these operations, even
though they should have been correct already.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D99654
2021-04-05 17:03:45 +01:00
Craig Topper 4708a05da0 [RISCV] Use gorciw for i32 orc.b intrinsic when Zbp is enabled.
The W version of orc.b does not exist in Zbp so we need to use
gorci encoding. If we have Zbp, we can use gorciw which can avoid a
sext.w in some cases.
2021-04-04 17:14:28 -07:00
Craig Topper 98d5db3e3a [RISCV] Lower orc.b intrinsic to RISCVISD::GORCI.
This will allow us to share any future known bits, demaned bits,
or sign bits improvements.
2021-04-04 12:31:41 -07:00
Craig Topper a2ea003fcb [RISCV] Don't convert fshr/fshl to target specific FSL/FSR node if shift amount is a constant.
As long as it's a constant we can directly pattern match it
without any problems. It's only when it isn't a constant that
we need to add an AND.

In theory this should allow more target independent optimizations
to remain active.
2021-04-03 23:13:30 -07:00
Levy Hsu 944adbf285 Recommit "[RISCV] Add IR intrinsic for Zbb extension"
Forgot to amend the Author.

Original commit message:

Header files are included in a separate patch in case the name needs to be changed.

RV32 / 64:
orc.b

Differential Revision: https://reviews.llvm.org/D99320
2021-04-02 11:50:19 -07:00
Craig Topper 1f0b309f24 Revert "[RISCV] Add IR intrinsic for Zbb extension"
This reverts commit 1808194590.

I forgot to change the author.
2021-04-02 11:47:02 -07:00
Craig Topper 1808194590 [RISCV] Add IR intrinsic for Zbb extension
Header files are included in a separate patch in case the name needs to be changed.

RV32 / 64:
orc.b
2021-04-02 11:23:57 -07:00
Craig Topper dbbc95e3e5 [RISCV] Use softPromoteHalf legalization for fp16 without Zfh rather than PromoteFloat.
The default legalization strategy is PromoteFloat which keeps
half in single precision format through multiple floating point
operations. Conversion to/from float is done at loads, stores,
bitcasts, and other places that care about the exact size being 16
bits.

This patches switches to the alternative method softPromoteHalf.
This aims to keep the type in 16-bit format between every operation.
So we promote to float and immediately round for any arithmetic
operation. This should be closer to the IR semantics since we
are rounding after each operation and not accumulating extra
precision across multiple operations. X86 is the only other
target that enables this today. See https://reviews.llvm.org/D73749

I had to update getRegisterTypeForCallingConv to force f16 to
use f32 when the F extension is enabled. This way we can still
pass it in the lower bits of an FPR for ilp32f and lp64f ABIs.
The softPromoteHalf would otherwise always give i16 as the
argument type.

Reviewed By: asb, frasercrmck

Differential Revision: https://reviews.llvm.org/D99148
2021-04-01 12:41:57 -07:00
Craig Topper d157e3f387 [RISCV] Fix handling of nxvXi64 vmsgt(u).vx intrinsics on RV32.
We need to splat the scalar separately and use .vv, but there is
no vmsgt(u).vv. So add isel patterns to select vmslt(u).vv with
swapped operands.

We also need to get VT to use for the splat from an operand rather
than the result since the result VT is nxvXi1.

Reviewed By: HsiangKai

Differential Revision: https://reviews.llvm.org/D99704
2021-04-01 10:38:05 -07:00
Craig Topper b7c2e577cc [RISCV] Add custom type legalization to form MULHSU when possible.
There's no target independent ISD opcode for MULHSU, so custom
legalize 2*XLen multiplies ourselves. We have to be a little
careful to prefer MULHU or MULHSU.

I thought about doing this in isel by pattern matching the
(add (mul X, (srai Y, XLen-1)), (mulhu X, Y)) pattern. I decided
against this because the add might become part of a chain of adds.
I don't trust DAG combine not to reassociate with other adds making
it difficult to find both pieces again.

Reviewed By: asb

Differential Revision: https://reviews.llvm.org/D99479
2021-04-01 10:15:55 -07:00
Craig Topper 2a8b7cab6a [RISCV] Add RISCVISD opcodes for CLZW and CTZW.
Our CLZW isel pattern is quite easily broken by surrounding code
preventing it from matching sometimes. This usually results in
failing to remove the and X, 0xffffffff inserted by type
legalization. The add with -32 that type legalization also inserts
will often gets combined into other add/sub nodes. That doesn't
usually result in extra code when we don't use clzw.

CTTZ seems to be less fragile, but I wanted to keep it consistent
with CTLZ.

Reviewed By: asb, HsiangKai

Differential Revision: https://reviews.llvm.org/D99317
2021-03-31 09:40:07 -07:00
Fraser Cormack 10fc6e4358 [RISCV] Add support for the stepvector intrinsic
This adds almost everything required for supporting the new stepvector
intrinsic on RVV. It is lowered to the existing VID_VL SDNode.

The only exception is a limitation that RV32 cannot yet lower the
intrinsic on i64 vectors. This is because the step operand is
(currently) required to be at least as large as the vector element type.
I will look into patching that out and loosening the requirement to only
an integer pointer type.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D99594
2021-03-31 11:41:17 +01:00
Craig Topper a33fcafaf0 [RISCV] Pass 'half' in the lower 16 bits of an f32 value when F extension is enabled, but Zfh is not.
Without Zfh the half type isn't legal, but it could still be
used as an argument/return in IR. Clang will not generate this today.

Previously we promoted the half value to float for arguments and
returns if the F extension is enabled but Zfh isn't. Then depending on
which ABI is enabled we would pass it in either an FPR or a GPR in
float format.

If the F extension isn't enabled, it would get passed in the lower
16 bits of a GPR in half format.

With this patch the value will always in half format and will be
in the lower bits of a GPR or FPR. This should be consistent
with where the bits are located when Zfh is enabled.

I've based this implementation off of how this is done on ARM.

I've manually nan-boxed the value to 32 bits using integer ops.
It looks like flw, fsw, fmv.s, fmv.w.x, fmf.x.w won't
canonicalize nans so should leave the value alone. I think those
are the instructions that could get used on this value.

Reviewed By: kito-cheng

Differential Revision: https://reviews.llvm.org/D98670
2021-03-30 09:47:54 -07:00
Craig Topper f069000b43 [RISCV] Remove floating point condition code legalization from lowerFixedLengthVectorSetccToRVV.
After D98939, this is done by LegalizeVectorOps making this code dead.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D99519
2021-03-30 09:11:56 -07:00
Craig Topper c40cea6f08 [RISCV] Teach targetShrinkDemandedConstant to preserve (and X, 0xffffffff).
We look for this pattern frequently in isel patterns so its a
good idea to try to preserve it.

This also let's us remove our special isel handling for srliw
and use a direct pattern match of (srl (and X, 0xffffffff), C)
since no bits will be removed from the and mask.

Differential Revision: https://reviews.llvm.org/D99042
2021-03-25 09:03:25 -07:00