Commit Graph

1015 Commits

Author SHA1 Message Date
Craig Topper ac87133f1d [RISCV] Teach vsetvli insertion to remember when predecessors have same AVL and SEW/LMUL ratio if their VTYPEs otherwise mismatch.
Previously we went directly to unknown state on VTYPE mismatch.
If we instead remember the partial match, we can use this to
still use X0, X0 vsetvli in successors if AVL and needed SEW/LMUL
ratio match.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D104069
2021-06-18 12:16:07 -07:00
Saleem Abdulrasool e70d4994ea test: clean up some of the RISCV tests (NFC)
This addresses some post-commit comments from jrtc27 to make the tests
easier to process.
2021-06-17 09:51:09 -07:00
Saleem Abdulrasool bbea64250f RISCV: adjust handling of relocation emission for RISCV
This re-architects the RISCV relocation handling to bring the
implementation closer in line with the implementation in binutils.  We
would previously aggressively resolve the relocation.  With this
restructuring, we always will emit a paired relocation for any symbolic
difference of the type of S±T[±C] where S and T are labels and C is a
constant.

GAS has a special target hook controlled by `RELOC_EXPANSION_POSSIBLE`
which indicates that a fixup may be expanded into multiple relocations.
This is used by the RISCV backend to always emit a paired relocation -
either ADD[WIDTH] + SUB[WIDTH] for text relocations or SET[WIDTH] +
SUB[WIDTH] for a debug info relocation.  Irrespective of whether linker
relaxation support is enabled, symbolic difference is always emitted as
a paired relocation.

This change also sinks the target specific behaviour down into the
target specific area rather than exposing it to the shared relocation
handling.  In the process, we also sink the "special" handling for debug
information down into the RISCV target.  Although this improves the path
for the other targets, this is not necessarily entirely ideal either.
The changes in the debug info emission could be done through another
type of hook as this functionality would be required by any other target
which wishes to do linker relaxation.  However, as there are no other
targets in LLVM which currently do this, this is a reasonable thing to
do until such time as the code needs to be shared.

Improve the handling of the relocation (and add a reduced test case from
the Linux kernel) to ensure that we handle complex expressions for
symbolic difference.  This ensures that we correct relocate symbols with
the adddends normalized and associated with the addition portion of the
paired relocation.

This change also addresses some review comments from Alex Bradbury about
the relocations meant for use in the DWARF CFA being named incorrectly
(using ADD6 instead of SET6) in the original change which introduced the
relocation type.

This resolves the issues with the symbolic difference emission
sufficiently to enable building the Linux kernel with clang+IAS+lld
(without linker relaxation).

Resolves PR50153, PR50156!
Fixes: ClangBuiltLinux/linux#1023, ClangBuiltLinux/linux#1143

Reviewed By: nickdesaulniers, maskray

Differential Revision: https://reviews.llvm.org/D103539
2021-06-17 08:20:02 -07:00
Fraser Cormack fed1503e85 [RISCV][VP] Lower FP VP ISD nodes to RVV instructions
With the exception of `frem`, this patch supports the current set of VP
floating-point binary intrinsics by lowering them to to RVV instructions. It
does so by using the existing `RISCVISD *_VL` custom nodes as an intermediate
layer. Both scalable and fixed-length vectors are supported by using this
method.

The `frem` node is unsupported due to a lack of available instructions. For
fixed-length vectors we could scalarize but that option is not (currently)
available for scalable-vector types. The support is intentionally left out so
it equivalent for both vector types.

The matching of vector/scalar forms is currently lacking, as scalable vector
types do not lower to the custom `VFMV_V_F_VL` node. We could either make
floating-point scalable vector splats lower to this node, or support the
matching of multiple kinds of splat via a `ComplexPattern`, much like we do for
integer types.

Reviewed By: rogfer01

Differential Revision: https://reviews.llvm.org/D104237
2021-06-17 10:04:00 +01:00
Bjorn Pettersson 4c7f820b2b Update @llvm.powi to handle different int sizes for the exponent
This can be seen as a follow up to commit 0ee439b705,
that changed the second argument of __powidf2, __powisf2 and
__powitf2 in compiler-rt from si_int to int. That was to align with
how those runtimes are defined in libgcc.
One thing that seem to have been missing in that patch was to make
sure that the rest of LLVM also handle that the argument now depends
on the size of int (not using the si_int machine mode for 32-bit).
When using __builtin_powi for a target with 16-bit int clang crashed.
And when emitting libcalls to those rtlib functions, typically when
lowering @llvm.powi), the backend would always prepare the exponent
argument as an i32 which caused miscompiles when the rtlib was
compiled with 16-bit int.

The solution used here is to use an overloaded type for the second
argument in @llvm.powi. This way clang can use the "correct" type
when lowering __builtin_powi, and then later when emitting the libcall
it is assumed that the type used in @llvm.powi matches the rtlib
function.

One thing that needed some extra attention was that when vectorizing
calls several passes did not support that several arguments could
be overloaded in the intrinsics. This patch allows overload of a
scalar operand by adding hasVectorInstrinsicOverloadedScalarOpd, with
an entry for powi.

Differential Revision: https://reviews.llvm.org/D99439
2021-06-17 09:38:28 +02:00
Ben Shi 0799057181 [RISCV][test] Add new tests of SH*ADD in the zba extension
These tests will show the following optimization by future patches.

Rx + Ry * 6  => (SH1ADD (SH2ADD Rx, Ry), Ry)
Rx + Ry * 10 => (SH1ADD (SH3ADD Rx, Ry), Ry)
Rx + Ry * 12 => (SH2ADD (SH3ADD Rx, Ry), Ry)

Reviewed By: MaskRay

Differential Revision: https://reviews.llvm.org/D104210
2021-06-17 07:02:33 +08:00
Fraser Cormack c75e454cb9 [RISCV] Transform unaligned RVV vector loads/stores to aligned ones
This patch adds support for loading and storing unaligned vectors via an
equivalently-sized i8 vector type, which has support in the RVV
specification for byte-aligned access.

This offers a more optimal path for handling of unaligned fixed-length
vector accesses, which are currently scalarized. It also prevents
crashing when `LegalizeDAG` sees an unaligned scalable-vector load/store
operation.

Future work could be to investigate loading/storing via the largest
vector element type for the given alignment, in case that would be more
optimal on hardware. For instance, a 4-byte-aligned nxv2i64 vector load
could loaded as nxv4i32 instead of as nxv16i8.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D104032
2021-06-14 18:12:18 +01:00
Hsiangkai Wang 643b6407fa [RISCV] Avoid scalar outgoing argumetns overwriting vector frame objects.
When using FP to access stack objects, the scalable stack objects will
be put at the lower end of the frame. It looks like

```
|-------------------|  <-- FP
| callee-saved regs |
|-------------------|
| scalar local vars |
|-------------------|
| RVV local vars    |
|-------------------|  <-- SP
```

If there are scalar arguments that need to pass through memory and there
are vector objects on the stack using FP to access. The outgoing scalar
arguments will overwrite the vector objects. It looks like

```
|-------------------|  <-- FP
| callee-saved regs |
|-------------------|
| scalar local vars |
|-------------------|         |-------------------|
| RVV local vars    |         | outgoing args     | <- outgoing arguments
|-------------------|  <-- SP |-------------------|    overwrite from here.
```

In this patch, we reserve the stack for the outgoing arguments before
function calls if using FP to access and there are scalable vector frame
objects. It looks like

```
|-------------------|  <-- FP
| callee-saved regs |
|-------------------|
| scalar local vars |
|-------------------|
| RVV local vars    |
|-------------------|
| outgoing args     |
|-------------------|  <-- SP
```

Differential Revision: https://reviews.llvm.org/D103622
2021-06-11 12:26:29 +08:00
Craig Topper 420bd5ee8e [RISCV] Use ComputeNumSignBits/MaskedValueIsZero in RISCVDAGToDAGISel::selectSExti32/selectZExti32.
This helps us select W instructions in more cases. Most of the
affected tests have had the sign_extend_inreg or AND folded into
sextload/zextload.

Differential Revision: https://reviews.llvm.org/D104079
2021-06-10 19:06:45 -07:00
Craig Topper b35a842581 [RISCV] Add test cases that show failure to use some W instructions if they are proceeded by a load. NFC
The loads end up becoming sextload/zextload which prevent our
isel patterns from finding the sign_extend_inreg or AND instruction
we need.

The easiest way to fix this is to use computeKnownBits or
ComputeNumSignBits in our isel matching to catch this.
2021-06-10 16:55:49 -07:00
Fraser Cormack 502edebd9d [ValueTypes][RISCV] Cap RVV fixed-length vectors by size
This patch changes RVV's policy for its supported list of fixed-length
vector types by capping by vector size rather than element count. Now
all 1024-byte vectors (of supported element types) are supported, rather
than all 256-element vectors.

This is a more natural fit for the architecture, and allows us to, for
example, improve the support for vector bitcasts.

This change necessitated the adding of some new simple types to avoid
"regressing" on the number of currently-supported vectors. We round out
the 1024-byte types by adding `v512i8`, `v1024i8`, `v512i16` and
`v512f16`.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D103884
2021-06-09 12:15:37 +01:00
Fraser Cormack e8f1f89103 [RISCV] Support CONCAT_VECTORS on scalable masks
This patch is a simple fix which registers CONCAT_VECTORS as
custom-lowered for scalable mask vectors. This follows the pattern of
all other scalable-vector types, as the default expansion of
CONCAT_VECTORS cannot handle scalable types, and even if it did it'd go
through the stack and generate worse code.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D103896
2021-06-09 09:07:44 +01:00
Jim Lin 242ddd5089 [RISCV][NFC] Add a single space after comma for VType
In most of cases, it has a single space after comma in assembly operands.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D103790
2021-06-09 11:18:22 +08:00
Craig Topper c09b37553e [RISCV] Remove dead code from fixed-vectors-abs.ll test cases. NFC
We had two pointer arguments and a dead load presumably copied
from a binary operation test and modified into unary abs.
2021-06-08 11:24:23 -07:00
Craig Topper 8b4c80d380 Further improve register allocation for vwadd(u).wv, vwsub(u).wv, vfwadd.wv, and vfwsub.wv.
The first source has the same EEW as the destination, but we're
using earlyclobber which prevents them from ever being the same
register. This patch attempts to work around this.

-For unmasked .wv, add a special TIED pseudo that pretends like
 the first operand and the destination must be the same register. This
 disables the earlyclobber for that source. Mark the instruction
 as convertible to 3 address form which will switch it to the
 original untied pseudo when the TwoAddressInstructionPass decides
 that keeping them tied would require an extra copy. This uses
 code in RISCVInstrInfo.cpp to do the conversion to the untied
 opcode.

The untie test case show that we can generate the untied version.
Not sure it was profitable to do it in this case, but they have
really simple IR.

Reviewed By: arcbbb

Differential Revision: https://reviews.llvm.org/D103552
2021-06-08 09:43:43 -07:00
Craig Topper c57bce9cc5 [RISCV] Remove ForceTailAgnostic flag from vmv.s.x, vfmv.s.f and reductions.
In 0.9 these were defined to leave elements other than 0 in the
destination unmodified. They were changed to use the tail policy
in 0.10. I missed that update.

I assume no one has noticed because in order cores treat tail
agnostic the same as tail undisturbed. I believe Spike and QEMU do
the same.

Reviewed By: arcbbb, frasercrmck

Differential Revision: https://reviews.llvm.org/D103736
2021-06-08 09:22:40 -07:00
Fraser Cormack ccd1e087f3 [RISCV] Add a test case showing inefficient vector codegen 2021-06-08 11:07:12 +01:00
David Green b889c6ee99 [DAG] Allow isNullOrNullSplat to see truncated zeroes
This sets the AllowTruncation flag on isConstOrConstSplat in
isNullOrNullSplat, allowing it to see truncated constant zeroes on
architectures such as AArch64, where only a i32.i64 are legal. As a
truncation of 0 is always 0, this should always be valid, allowing some
extra folding to happen including some of the cases from D103755.

Differential Revision: https://reviews.llvm.org/D103756
2021-06-08 10:18:58 +01:00
Craig Topper 7c4e9a6826 [RISCV] Use 0 for Log2SEW for vle1/vse1 intrinsics to enable vsetvli optimization.
Missed in D103299.
2021-06-07 22:41:14 -07:00
Craig Topper ae3ab4f0ec [RISCV] Masked compares should use a tail agnostic policy.
Writes of a mask result are always tail agnostic.

Unfortunately, this seems to have made codegen worse. I can only
think this must be because the vsetvli was acting as some sort
of barrier that prevented some code movement in the scheduler.

Reviewed By: arcbbb

Differential Revision: https://reviews.llvm.org/D103331
2021-06-07 21:43:44 -07:00
Ben Shi c705b7b04d [RISCV] Optimize bitwise and with constant for the Zbs extension
This patch optimizes (and r i) to
(BCLRI (BCLRI r, i0), i1) in which i = ~((1<<i0) | (1<<i1)).
or
(BCLRI (ANDI r, i0), i1) in which i = i0 & ~(1<<i1).

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D103743
2021-06-08 07:26:00 +08:00
Craig Topper f30f8b4f12 [RISCV] Lower i8/i16 bswap/bitreverse to grevi/greviw with Zbp.
Include known bits support so we know we don't need to zext the
output if the input was already zero extended.

Reviewed By: luismarques

Differential Revision: https://reviews.llvm.org/D103757
2021-06-07 10:31:51 -07:00
Craig Topper c653711fd3 [RISCV] Teach vsetvli insertion pass that operations on masks don't care about SEW/LMUL.
All that really matters is that the VLMAX of the preceding
instructions is the same as the VLMAX required by the mask
operation.

Also update the vmsge(u) handling to use the SEW/LMUL we use for
other mask register operations. We were matching it to the compare
before. Some cases will be improve if we fix masked compares to
use tail agnostic policy. I think they ignore the tail policy
anyway.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D103299
2021-06-04 09:17:46 -07:00
Fraser Cormack aec9cbbeb8 [SelectionDAG] Extend FoldConstantVectorArithmetic to SPLAT_VECTOR
This patch extends the SelectionDAG's ability to constant-fold vector
arithmetic to include support for SPLAT_VECTOR. This is not only for
scalable-vector types but also for fixed-length vector types, which
helps Hexagon in a couple of cases.

The original RISC-V test case was in fact an infinite DAGCombine loop.
The pattern `and (truncate v1), (truncate v2)` can be combined to
`truncate (and v1, v2)` but the truncate can similarly be combined back
to `truncate (and v1, v2)` (but, crucially, only when one of `v1` or
`v2` is a constant vector).

It wasn't exposed in on fixed-length types because a TRUNCATE of a
constant BUILD_VECTOR was folded into the BUILD_VECTOR itself, whereas
this did not happen for the equivalent (scalable-vector) SPLAT_VECTOR.

Reviewed By: RKSimon, craig.topper

Differential Revision: https://reviews.llvm.org/D103246
2021-06-04 09:53:15 +01:00
Eli Friedman 44cdf771fe [AtomicExpand] Merge cmpxchg success and failure ordering when appropriate.
If we're not emitting separate fences for the success/failure cases, we
need to pass the merged ordering to the target so it can emit the
correct instructions.

For the PowerPC testcase, we end up with extra fences, but that seems
like an improvement over missing fences.  If someone wants to improve
that, the PowerPC backed could be taught to emit the fences after isel,
instead of depending on fences emitted by AtomicExpand.

Fixes https://bugs.llvm.org/show_bug.cgi?id=33332 .

Differential Revision: https://reviews.llvm.org/D103342
2021-06-03 11:34:35 -07:00
Hsiangkai Wang 9d4922eab4 [RISCV] Precommit a test case to show overwriting vector frame objects. 2021-06-03 23:25:45 +08:00
Fraser Cormack 8790e85255 [RISCV] Reserve an emergency spill slot for any RVV spills
This patch addresses an issue in which fixed-length (VLS) vector RVV
code could fail to reserve an emergency spill slot for their frame index
elimination. This is because we were previously only reserving a spill
slot when there were `scalable-vector` frame indices being used.
However, fixed-length codegen uses regular-type frame indices if it
needs to spill.

This patch does the fairly brute-force method of checking ahead of time
whether the function contains any RVV spill instructions, in which case
it reserves one slot. Note that the second RVV slot is still only
reserved for `scalable-vector` frame indices.

This unfortunately causes quite a bit of churn in existing tests, where
we chop and change stack offsets for spill slots.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D103269
2021-06-03 10:44:34 +01:00
Fraser Cormack 1de1887f5f [CodeGen] Fix a scalable-vector crash in VSELECT legalization
The `DAGTypeLegalizer::WidenVSELECTMask` function is not (yet) ready for
scalable vector types, and has numerous places in which it tries to grab
either the fixed size or number of elements of its types.

I believe that it should be possible to update this method to properly
account for scalable-vector types, but we don't have test cases for
that; RISC-V bails out early on as it has legal i1 vector masks. As
such, this patch just prevents it from crashing.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D103536
2021-06-03 10:24:55 +01:00
Fraser Cormack 2dd20a31f2 [ValueTypes] Fix scalable-vector changeExtendedVectorTypeToInteger
The attached tests check for the regression in DAGCombiner's
`visitVSELECT`, which may call this method.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D103534
2021-06-03 09:36:56 +01:00
Fraser Cormack 1cea1189c2 [RISCV][NFC] Add '+mattr=+experimental-v' to RVV test 2021-06-02 13:09:13 +01:00
Fraser Cormack 3b0a33d0ad [RISCV] Expand unaligned fixed-length vector memory accesses
RVV vectors must be aligned to their element types, so anything less is
unaligned.

For regular loads and stores, our custom-lowering of fixed-length
vectors meant that we opted out of LegalizeDAG's built-in unaligned
expansion. This patch adds that logic in to our custom lower function.

For masked intrinsics, we declare that anything unaligned is not legal,
leaving the ScalarizeMaskedMemIntrin pass to do the expansion for us.

Note that neither of these methods can handle the expansion of
scalable-vector memory ops, so those cases are left alone by this patch.
Scalable loads and stores already go through expansion by default but
hit an assertion, and scalable masked intrinsics will silently generate
incorrect code. It may be prudent to return an error in both of these
cases.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D102493
2021-06-02 09:27:44 +01:00
Craig Topper 41ff1e0e29 [RISCV] Improve register allocation for masked vwadd(u).wv, vwsub(u).wv, vfwadd.wv, and vfwsub.wv.
The first source has the same EEW as the destination, but we're
using earlyclobber which prevents them from ever being the same
register.

To workaround this, add a special TIED pseudo to use whenever the
first source and merge operand are the same value. This allows
us to use a single operand for the merge operand and first source
which we can then tie to the destination. A tied source disables
earlyclobber for that operand.

Reviewed By: arcbbb

Differential Revision: https://reviews.llvm.org/D103211
2021-06-01 18:59:00 -07:00
Ben Shi 59f44f9ad4 [RISCV][test] Add new tests of bitwise and with constant for the Zbs extension
These tests will show how (and r i) will be optimized to
(BCLRI (BCLRI r, i0), i1) or (BCLRI (ANDI r, i0), i1) by future
commits.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D103359
2021-06-02 09:10:21 +08:00
Craig Topper 896f9bc350 [RISCV] Remove earlyclobber from vnsrl/vnsra/vnclip(u) when the source and dest are a single vector register.
This guarantees they meet this overlap exception:

"The destination EEW is smaller than the source EEW and the overlap
is in the lowest-numbered part of the source register group"

Being a single register guarantees the overlap is always in the
lowerst-number part of the group.

Reviewed By: frasercrmck, khchen

Differential Revision: https://reviews.llvm.org/D103351
2021-06-01 09:17:52 -07:00
Craig Topper 5a5219a0f9 [RISCV] Remove earlyclobber from compares with LMUL<=1.
Compares are considered a narrowing operation for register overlap.
I believe for LMUL<=1 they meet this exception to allow overlap

"The destination EEW is smaller than the source EEW and the overlap is in the
lowest-numbered part of the source register group"

Both the result and the sources will occupy a single register for
LMUL<=1 so the overlap would always be in the "lowest-numbered part".

Reviewed By: frasercrmck, HsiangKai

Differential Revision: https://reviews.llvm.org/D103336
2021-06-01 09:08:11 -07:00
Fraser Cormack 4f500c402b [RISCV] Support vector types in combination with fastcc
This patch extends the RISC-V lowering of the 'fastcc' calling
convention to vector types, both fixed-length and scalable. Without this
patch, any function passing or returning vector types by value would
throw a compiler error.

Vectors are handled in 'fastcc' much as they are in the default calling
convention, the noticeable difference being the extended set of scalar
GPR registers that can be used to pass vectors indirectly.

Reviewed By: HsiangKai

Differential Revision: https://reviews.llvm.org/D102505
2021-06-01 10:31:18 +01:00
Fraser Cormack 2b37c405cc [RISCV] Scale scalably-typed split argument offsets by VSCALE
This patch fixes a bug in lowering scalable-vector types in RISC-V's
main calling convention. When scalable-vector types are split and passed
indirectly, the target is responsible for scaling the offset --
initially set to the known-minimum store size -- by the scalable factor.

Before this we were issuing overlapping loads or stores to the different
parts, leading to incorrect codegen.

Credit to @HsiangKai for spotting this.

Reviewed By: HsiangKai

Differential Revision: https://reviews.llvm.org/D103262
2021-05-31 10:43:13 +01:00
Fraser Cormack eb23936591 [RISCV] Support vector conversions between fp and i1
This patch custom lowers FP_TO_[US]INT and [US]INT_TO_FP conversions
between floating-point and boolean vectors. As the default action is
scalarization, this patch both supports scalable-vector conversions and
improves the code generation for fixed-length vectors.

The lowering for these conversions can piggy-back on the existing
lowering, which lowers the operations to a supported narrowing/widening
conversion and then either an extension or truncation.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D103312
2021-05-31 09:55:39 +01:00
Jessica Clarke 00dfd4f870 Revert "[RISCV] Remove -riscv-no-aliases in favour of new -M no-aliases"
The replacement doesn't work for llc, but it is needed by
patchable-function-entry.ll.

This reverts commit aa9a30b83a.
2021-05-29 15:11:37 +01:00
Jessica Clarke aa9a30b83a [RISCV] Remove -riscv-no-aliases in favour of new -M no-aliases
Whilst here, also remove a couple of unnecessary -o - instances.

Reviewed By: MaskRay

Differential Revision: https://reviews.llvm.org/D103201
2021-05-29 14:58:28 +01:00
Craig Topper a41309966a [RISCV] Pre-commit test cases for D103211. NFC 2021-05-28 13:21:58 -07:00
Eli Friedman 0b3b0a727a [AArch64][RISCV] Make sure isel correctly honors failure orderings.
If a cmpxchg specifies acquire or seq_cst on failure, make sure we
generate code consistent with that ordering even if the success ordering
is not acquire/seq_cst.

At one point, it was ambiguous whether this sort of construct was valid,
but the C++ standad and LLVM now accept arbitrary combinations of
success/failure orderings.

This doesn't address the corresponding issue in AtomicExpand. (This was
reported as https://bugs.llvm.org/show_bug.cgi?id=33332 .)

Fixes https://bugs.llvm.org/show_bug.cgi?id=50512.

Differential Revision: https://reviews.llvm.org/D103284
2021-05-28 12:47:40 -07:00
Fraser Cormack 3f5ae36833 [RISCV][NFC] Merge identical RV32 and RV64 test checks 2021-05-28 12:31:57 +01:00
Fraser Cormack f3afd0d193 [RISCV] Add tests for fixed vector conversions between fp to/from i1
These fixed-length versions don't crash unlike the corresponding
scalable ones, but the code generation is scalarized. An imminent patch
will support scalable-vector conversions and improve the codegen for
these fixed-length conversions.
2021-05-28 12:31:47 +01:00
Craig Topper 0fa5aac292 [RISCV] Teach VSETVLI insertion to look through PHIs to prove we don't need to insert a vsetvli.
If an instruction's AVL operand is a PHI node in the same block,
we may be able to peek through the PHI to find vsetvli instructions
that produce the AVL in other basic blocks. If we can prove those
vsetvli instructions have the same VTYPE and were the last vsetvli
in their respective blocks, then we don't need to insert a vsetvli
for this pseudo instruction.

Reviewed By: rogfer01

Differential Revision: https://reviews.llvm.org/D103277
2021-05-27 15:34:08 -07:00
Craig Topper 020df692d8 [RISCV] Fix typo, use addImm instead of addReg. 2021-05-27 14:04:51 -07:00
Craig Topper d7ae2438b9 [RISCV] Add a test showing missed opportunity to avoid a vsetvli in a loop.
This is another case we need to look through a phi to prove.
2021-05-27 11:30:25 -07:00
Craig Topper 527cd01314 [RISCV] Teach vsetvli insertion to use vsetvl x0, x0 form when we can tell that VLMAX and AVL haven't changed.
This can help avoid needing a virtual register for the vsetvl output
when the AVL is X0. For other register AVLs it can shorter the live
range of the AVL register if it isn't needed later.

There's probably no advantage when AVL is a 5 bit immediate that
can use vsetivli. But do it anyway for consistency.

Reviewed By: rogfer01

Differential Revision: https://reviews.llvm.org/D103215
2021-05-27 10:11:38 -07:00
Fraser Cormack 6f4794feb6 [RISCV] Add a test case showing incorrect call-conv lowering
@HsiangKai helped find a bug in the lowering of indirect split
scalable-vector types in our calling convention. An imminent patch will
fix this.
2021-05-27 16:55:48 +01:00
Fraser Cormack b7101e218c [DAGCombine][RISCV] Don't try to trunc-store combined vector stores
DAGCombine's `mergeStoresOfConstantsOrVecElts` optimization is told
whether it's to use vector types and also whether it's to issue a
truncating store. However, the truncating store code path assumes a
scalar integer `ConstantSDNode`, and when using vector types it creates
either a `BUILD_VECTOR` or `CONCAT_VECTORS` to store: neither of which
is a constant.

The `riscv64` target is able to expose a crash here because it switches
on both code paths at the same time. The `f32` is stored as `i32` which
must be promoted to `i64`, necessitating a truncating store.
It also decides later that it prefers a vector store of `v2f32`.

While vector truncating stores are legal, this combine is not able to
emit them. We also don't have a test case. This patch adds an assert to
catch this case more gracefully, and updates one of the caller functions
to the function to turn off the use of truncating stores when preferring
vectors.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D103173
2021-05-27 14:16:32 +01:00
Fraser Cormack 8c73a31c11 [RISCV] Allow passing fixed-length vectors via the stack
The vector calling convention dictates that when the vector argument
registers are exhaused, GPRs are used to pass the address via the stack.
When the GPRs themselves are exhausted, at best we would previously
crash with an assertion, and at worst we'd generate incorrect code.

This patch addresses this issue by passing fixed-length vectors via the
stack with their full fixed-length size and aligned to their element
type size. Since the calling convention lowering can't yet handle
scalable vector types, this patch adds a fatal error to make it clear
that we are lacking in this regard.

Reviewed By: HsiangKai

Differential Revision: https://reviews.llvm.org/D102422
2021-05-27 14:14:07 +01:00
Fraser Cormack 772b58a641 [SelectionDAG][RISCV] Don't unroll 0/1-type bool VSELECTs
This patch extends the cases in which the legalizer is able to express
VSELECT in terms of XOR/AND/OR. When dealing with a VSELECT between
boolean vector types, the mask itself is an all-ones or all-ones value
of the operand type, so a 0/1 boolean type behaves identically to a 0/-1
type.

This greatly helps RISC-V which relies on expansion for these nodes. It
also allows scalable-vector bool VSELECTs to use the default expansion,
where before it would crash in SelectionDAG::UnrollVectorOp.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D103147
2021-05-27 10:08:57 +01:00
Craig Topper fdf10e6197 [RISCV] Use X0 as destination of inserted vsetvli when possible.
We aren't going to connect the result to anything so we might
as well avoid allocating a register.

Reviewed By: frasercrmck, HsiangKai

Differential Revision: https://reviews.llvm.org/D102031
2021-05-26 13:08:51 -07:00
Craig Topper 9065118b64 [RISCV] Optimize SEW=64 shifts by splat on RV32.
SEW=64 shifts only uses the log2(64) bits of shift amount. If we're
splatting a 64 bit value in 2 parts, we can avoid splatting the
upper bits and just let the low bits be sign extended. They won't
be read anyway.

For the purposes of SelectionDAG semantics of the generic ISD opcodes,
if hi was non-zero or bit 31 of the low is 1, the shift was already
undefined so it should be ok to replace high with sign extend of low.

In order do be able to find the split i64 value before it becomes
a stack operation, I added a new ISD opcode that will be expanded
to the stack spill in PreprocessISelDAG. This new node is conceptually
similar to BuildPairF64, but I expanded earlier so that we could
go through regular isel to get the right VLSE opcode for the LMUL.
BuildPairF64 is expanded in a CustomInserter.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D102521
2021-05-26 10:23:32 -07:00
Jessica Clarke d63d662d3c [RISCV] Remove --riscv-no-aliases from RVV tests
This serves no useful purpose other than to clutter things up. Diff
summary as the real diff is extremely unwieldy:

   24844 -; CHECK-NEXT:    jalr zero, 0(ra)
   24844 +; CHECK-NEXT:    ret
       8 -; CHECK-NEXT:    vl4re8.v v28, (a0)
       8 +; CHECK-NEXT:    vl4r.v v28, (a0)
      64 -; CHECK-NEXT:    vl8re8.v v24, (a0)
      64 +; CHECK-NEXT:    vl8r.v v24, (a0)
     392 -; RUN:   --riscv-no-aliases < %s | FileCheck %s
     392 +; RUN:   < %s | FileCheck %s
       1 -; RUN:   -verify-machineinstrs --riscv-no-aliases < %s \
       1 +; RUN:   -verify-machineinstrs < %s \

As discussed in D103004.
2021-05-26 17:59:38 +01:00
Craig Topper b2c7ac874f [RISCV] Don't propagate VL/VTYPE across inline assembly in the Insert VSETVLI pass.
It's conceivable someone could put a vsetvli in inline assembly
so its safer to consider them as barriers. The alternative would
be to trust that the user marks VL and VTYPE registers as clobbers
of the inline assembly if they do that, but hat seems error prone.

I'm assuming inline assembly in vector code is going to be rare.

Reviewed By: frasercrmck, HsiangKai

Differential Revision: https://reviews.llvm.org/D103126
2021-05-26 09:56:20 -07:00
Craig Topper 1b47a3de48 [RISCV] Enable cross basic block aware vsetvli insertion
This patch extends D102737 to allow VL/VTYPE changes to be taken
into account before adding an explicit vsetvli.

We do this by using a data flow analysis to propagate VL/VTYPE
information from predecessors until we've determined a value for
every value in the function.

We use this information to determine if a vsetvli needs to be
inserted before the first vector instruction the block.

Differential Revision: https://reviews.llvm.org/D102739
2021-05-26 09:25:42 -07:00
Fraser Cormack 7e27e4273d [RISCV] Pre-commit fixed-length mask vselect tests
These are default-expanded but later unrolled due to RISC-V's vector
boolean content policy. A patch to improve this codegen will follow
shortly.
2021-05-26 10:44:45 +01:00
Ben Shi bf77317049 [RISCV] Optimize xor/or with immediate in the zbs extension
Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D102893
2021-05-25 14:14:09 +08:00
Craig Topper b510e4cf1b [RISCV] Add a vsetvli insert pass that can be extended to be aware of incoming VL/VTYPE from other basic blocks.
This is a replacement for D101938 for inserting vsetvli
instructions where needed. This new version changes how
we track the information in such a way that we can extend
it to be aware of VL/VTYPE changes in other blocks. Given
how much it changes the previous patch, I've decided to
abandon the previous patch and post this from scratch.

For now the pass consists of a single phase that assumes
the incoming state from other basic blocks is unknown. A
follow up patch will extend this with a phase to collect
information about how VL/VTYPE change in each block and
a second phase to propagate this information to the entire
function. This will be used by a third phase to do the
vsetvli insertion.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D102737
2021-05-24 11:47:27 -07:00
serge-sans-paille 4ab3041acb Revert "[NFC] remove explicit default value for strboolattr attribute in tests"
This reverts commit bda6e5bee0.

See https://lab.llvm.org/buildbot/#/builders/109/builds/15424 for instance
2021-05-24 19:43:40 +02:00
serge-sans-paille bda6e5bee0 [NFC] remove explicit default value for strboolattr attribute in tests
Since d6de1e1a71, no attributes is quivalent to
setting attribute to false.

This is a preliminary commit for https://reviews.llvm.org/D99080
2021-05-24 19:31:04 +02:00
luxufan d70e9195a3 [RISCV] Optimize getVLENFactoredAmount function.
If the local variable `NumOfVReg` isPowerOf2_32(NumOfVReg - 1) or isPowerOf2_32(NumOfVReg + 1), the ADDI and MUL instructions can be replaced with SLLI and ADD(or SUB) instructions.

Based on original patch by StephenFan.

Reviewed By: frasercrmck, StephenFan

Differential Revision: https://reviews.llvm.org/D100577
2021-05-24 10:04:37 -07:00
Fraser Cormack 7a211ed110 [RISCV] Prevent store combining from infinitely looping
RVV code generation does not successfully custom-lower BUILD_VECTOR in all
cases. When it resorts to default expansion it may, on occasion, be expanded to
scalar stores through the stack. Unfortunately these stores may then be picked
up by the post-legalization DAGCombiner which merges them again. The merged
store uses a BUILD_VECTOR which is then expanded, and so on.

This patch addresses the issue by overriding the `mergeStoresAfterLegalization`
hook. A lack of granularity in this method (being passed the scalar type) means
we opt out in almost all cases when RVV fixed-length vector support is enabled.
The only exception to this rule are mask vectors, which are always either
custom-lowered or are expanded to a load from a constant pool.

Reviewed By: HsiangKai

Differential Revision: https://reviews.llvm.org/D102913
2021-05-24 10:19:32 +01:00
Jessica Clarke e10958c807 [SelectionDAG][Mips][PowerPC][RISCV][WebAssembly] Teach computeKnownBits/ComputeNumSignBits about atomics
Unlike normal loads these don't have an extension field, but we know
from TargetLowering whether these are sign-extending or zero-extending,
and so can optimise away unnecessary extensions.

This was noticed on RISC-V, where sign extensions in the calling
convention would result in unnecessary explicit extension instructions,
but this also fixes some Mips inefficiencies. PowerPC sees churn in the
tests as all the zero extensions are only for promoting 32-bit to
64-bit, but these zero extensions are still not optimised away as they
should be, likely due to i32 being a legal type.

This also simplifies the WebAssembly code somewhat, which currently
works around the lack of target-independent combines with some ugly
patterns that break once they're optimised away.

Re-landed with correct handling in ComputeNumSignBits for Tmp == VTBits,
where zero-extending atomics were incorrectly returning 0 rather than
the (slightly confusing) required return value of 1.

Re-landed again after D102819 fixed PowerPC to correctly zero-extend all
of its atomics as it claimed to do, since the combination of that bug
and this optimisation caused buildbot regressions.

Reviewed By: RKSimon, atanasyan

Differential Revision: https://reviews.llvm.org/D101342
2021-05-20 20:34:23 +01:00
Fraser Cormack c74ab891fc [RISCV] Ensure small mask BUILD_VECTORs aren't expanded
The default expansion for BUILD_VECTORs -- save for going through
shuffles -- is to go through the stack. This method only works when the
type is at least byte-sized, so for v2i1 and v4i1 we would crash.

This patch ensures that small mask-type BUILD_VECTORs are always handled
without crashing. We lower to a SETCC of the equivalent i8 type.

This also exposes some pre-existing issues where the lowering when
optimizing for size results in larger code than without. Those will be
tackled in future patches.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D102767
2021-05-20 19:12:29 +01:00
Fraser Cormack 26bd2250c1 [RISCV] Ensure shuffle splat operands are type-legal
The use of `SelectionDAG::getSplatValue` isn't guaranteed to return a
type-legal splat value as it may implicitly extract a vector element
from another shuffle. It is not permitted to introduce an illegal type
when lowering shuffles.

This patch addresses the crash by adding a boolean flag to
`getSplatValue`, defaulting to false, which when set will ensure a
type-legal return value. If it is unable to do that it will fail to
return a splat value.

I've been through the existing uses of `getSplatValue` in other targets
and was unable to find a need or test cases showing a need to update
their uses. In some cases, the call is made during `LegalizeVectorOps`
which may still produce illegal scalar types. In other situations, the
illegally-typed splat value may be quickly patched up to a legal type
(such as any-extending the returned `extract_vector_elt` up to a legal
type) before `LegalizeDAG` notices.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D102687
2021-05-20 18:00:03 +01:00
Fraser Cormack ca2c245ba4 [RISCV] Support INSERT_VECTOR_ELT into i1 vectors
Like the element extraction of these vectors, we choose to promote up to
an i8 vector type and perform the insertion there.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D102697
2021-05-19 09:41:50 +01:00
Fraser Cormack 175bdf127d [RISCV] Fix operand order in fixed-length VM(OR|AND)NOT patterns
Where the RVV specification writes `vs2, vs1`, our TableGen patterns use
`rs1, rs2`. These differences can easily cause confusion. The VMANDNOT
instruction performs `LHS && !RHS`, and similarly for VMORNOT.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D102606
2021-05-18 09:21:25 +01:00
Ben Shi 3cf7983cbe [RISCV][test] Add new tests of or/xor in the zbs extension
Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D102625
2021-05-18 07:10:17 +08:00
Fraser Cormack 85e31eddf2 [DAGCombiner] Relax an assertion to an early return
The select-of-constants transform was asserting that its constant vector
inputs did not implicitly truncate their input without that as an
explicit precondition to the function. This patch relaxes that assertion
into an early return to skip the optimization.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D102393
2021-05-17 09:15:55 +01:00
Ben Shi 7746e818a5 [RISCV] Optimize or/xor with immediate in the zbs extension
Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D102398
2021-05-17 10:59:52 +08:00
Ben Shi 1dfd7d5041 [RISCV][test] Add new tests of or/xor in the zbs extension
Reviewed By: MaskRay

Differential Revision: https://reviews.llvm.org/D102396
2021-05-17 09:47:23 +08:00
Hsiangkai Wang b41e1306b8 [RISCV] Add the DebugLoc parameter to getVLENFactoredAmount().
The MachineBasicBlock::iterator is continuously changing during
generating the frame handling instructions. We should use the DebugLoc
from the caller, instead of getting it from the changing iterator.

If the prologue instructions located in a basic block without any other
instructions after these prologue instructions, the iterator will be
updated to the boundary of the basic block and it is invalid to use the
iterator to access DebugLoc. This patch also fixes the crash when
accessing DebugLoc using the iterator.

Differential Revision: https://reviews.llvm.org/D102386
2021-05-14 21:31:06 +08:00
Fraser Cormack 797e580db9 [RISCV][NFC] Simplify test run lines
Several tests had -verify-machineinstrs twice, and several tests were
explicitly specifying the default FileCheck prefix of CHECK.
2021-05-13 12:41:00 +01:00
Fraser Cormack c5ec00e62b [TargetLowering] Improve legalization of scalable vector types
This patch extends the vector type-conversion and legalization capabilities of
scalable vector types.

Firstly, `vscale x 1` types now behave more like the corresponding `vscale x
2+` types. This enables the integer promotion legalization of extended scalable
types, such as the promotion of `<vscale x 1 x i5>` to `<vscale x 1 x i8>`.

These `vscale x 1` types are also now better handled by
`getVectorTypeBreakdown`, where what looks like older handling for 1-element
fixed-length vector types was spuriously updated to include scalable types.

Widening of scalable types is now better supported, by using `INSERT_SUBVECTOR`
to insert the smaller scalable vector "value" type into the wider scalable
vector "part" type. This allows AArch64 to pass and return `vscale x 1` types
by value by widening.

There are still cases where we are unable to legalize `vscale x 1` types, such
as where expansion would require splitting the vector in two.

Reviewed By: sdesmalen

Differential Revision: https://reviews.llvm.org/D102073
2021-05-12 16:33:07 +01:00
Stefan Pintilie 8d37411e48 Revert "[SelectionDAG][Mips][PowerPC][RISCV][WebAssembly] Teach computeKnownBits/ComputeNumSignBits about atomics"
This reverts commit 6c80361b84.
Breaks PowerPC Big Endian buildbots.
2021-05-12 09:46:18 -05:00
Craig Topper d092dd56ae [RISCV] Regenerate stepvector.ll. NFC
It looks like the RV32 and RV64 prefixes were removed from the
RUN lines while another patch was in review that added check
lines that used them.
2021-05-11 13:04:57 -07:00
Fangrui Song ec27c5f170 [RISCV] Prefer to lower MC_GlobalAddress operands to .Lfoo$local
Similar to X86 D73230 and AArch64 D101872

With this change, we can set dso_local in clang's -fpic -fno-semantic-interposition mode,
for default visibility external linkage non-ifunc-non-COMDAT definitions.

For such dso_local definitions, variable access/taking the address of a
function/calling a function will go through a local alias to avoid GOT/PLT.

Reviewed By: jrtc27, luismarques

Differential Revision: https://reviews.llvm.org/D101875
2021-05-11 11:29:45 -07:00
Craig Topper ce6e4f27dd [RISCV] Use fractional LMULs for fixed length types smaller than riscv-v-vector-bits-min.
My thought process is that if v2i64 is an LMUL=1 type then v2i32
should be an LMUL=1/2 type. We limit the fractional LMUL so that
SEW=64 clips to LMUL=1, SEW=32 clips to LMUL=1/2, etc. This
ensures there's always a fractional LMUL available to truncate a type.
This does reduce the number of vsetvlis in some cases.

Some tests increase vsetvlis because the best container type for a
mask type is dependent on the LMUL+SEW that the mask was produced
from, but you can't tell that from the type. I think this is
something we need to solve this in the machine IR when optimizing
vsetvlis.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D101215
2021-05-11 09:42:48 -07:00
Craig Topper dc00cbb505 [RISCV] Match trunc_vector_vl+sra_vl/srl_vl with splat shift amount to vnsra/vnsrl.
Limited to splats because we would need to truncate the shift
amount vector otherwise.

I tried to do this with new ISD nodes and a DAG combine to
avoid such a large pattern, but we don't form the splat until
LegalizeDAG and need DAG combine to remove a scalable->fixed->scalable
cast before it becomes visible to the shift node. By the time that
happens we've already visited the truncate node and won't revisit it.

I think I have an idea how to improve i64 on RV32 I'll save for a
follow up.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D102019
2021-05-11 09:29:31 -07:00
Hsiangkai Wang d8ec2b183e [RISCV] Fix the calculation of the offset of Zvlsseg spilling.
For Zvlsseg spilling, we need to convert the pseudo instructions
into multiple vector load/store instructions with appropriate offsets.
For example, for PseudoVSPILL3_M2, we need to convert it to

VS2R %v2, %base
ADDI %base, %base, (vlenb x 2)
VS2R %v4, %base
ADDI %base, %base, (vlenb x 2)
VS2R %v6, %base

We need to keep the size of the offset in the pseudo spilling instructions.
In this case, it is (vlenb x 2).

In the original implementation, we use the size of frame objects divide the
number of vectors in zvlsseg types. The size of frame objects is not
necessary exactly the same as the spilling data. It may be larger than
it. So, we change it to (VLENB x LMUL) in this patch. The calculation is
more direct and easy to understand.

Differential Revision: https://reviews.llvm.org/D101869
2021-05-11 10:13:18 +08:00
Craig Topper 80b9510806 [RISCV] Correct VL for fixed length masked scatter.
We were incorrectly calling getVectorNumElements on a scalable
vector type. This shouldn't be allowed. This gives a warning on
EVT, but not MVT.
2021-05-10 09:50:08 -07:00
Fraser Cormack 6db0cedd23 [LegalizeVectorOps][RISCV] Add scalable-vector SELECT expansion
This patch extends VectorLegalizer::ExpandSELECT to permit expansion
also for scalable vector types. The only real change is conditionally
checking for BUILD_VECTOR or SPLAT_VECTOR legality depending on the
vector type.

We can use this to fix "cannot select" errors for scalable vector
selects on the RISCV target. Note that in future patches RISCV will
possibly custom-lower vector SELECTs to VSELECTs for branchless codegen.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D102063
2021-05-10 08:22:35 +01:00
Jessica Clarke 6c80361b84 [SelectionDAG][Mips][PowerPC][RISCV][WebAssembly] Teach computeKnownBits/ComputeNumSignBits about atomics
Unlike normal loads these don't have an extension field, but we know
from TargetLowering whether these are sign-extending or zero-extending,
and so can optimise away unnecessary extensions.

This was noticed on RISC-V, where sign extensions in the calling
convention would result in unnecessary explicit extension instructions,
but this also fixes some Mips inefficiencies. PowerPC sees churn in the
tests as all the zero extensions are only for promoting 32-bit to
64-bit, but these zero extensions are still not optimised away as they
should be, likely due to i32 being a legal type.

This also simplifies the WebAssembly code somewhat, which currently
works around the lack of target-independent combines with some ugly
patterns that break once they're optimised away.

Re-landed with correct handling in ComputeNumSignBits for Tmp == VTBits,
where zero-extending atomics were incorrectly returning 0 rather than
the (slightly confusing) required return value of 1.

Reviewed By: RKSimon, atanasyan

Differential Revision: https://reviews.llvm.org/D101342
2021-05-06 04:01:20 +01:00
Jessica Clarke 897d7bceb9 Revert "[SelectionDAG][Mips][PowerPC][RISCV][WebAssembly] Teach computeKnownBits/ComputeNumSignBits about atomics"
This seems to have broken sanitizers, giving lots of

  Assertion `NumBits <= MAX_INT_BITS && "bitwidth too large"' failed.

failures across multiple targets (currently X86 and PowerPC). Reverting
until I have a chance to reproduce and debug.

This reverts commit 6e876f9ded.
2021-05-05 17:02:05 +01:00
Jessica Clarke 6e876f9ded [SelectionDAG][Mips][PowerPC][RISCV][WebAssembly] Teach computeKnownBits/ComputeNumSignBits about atomics
Unlike normal loads these don't have an extension field, but we know
from TargetLowering whether these are sign-extending or zero-extending,
and so can optimise away unnecessary extensions.

This was noticed on RISC-V, where sign extensions in the calling
convention would result in unnecessary explicit extension instructions,
but this also fixes some Mips inefficiencies. PowerPC sees churn in the
tests as all the zero extensions are only for promoting 32-bit to
64-bit, but these zero extensions are still not optimised away as they
should be, likely due to i32 being a legal type.

This also simplifies the WebAssembly code somewhat, which currently
works around the lack of target-independent combines with some ugly
patterns that break once they're optimised away.

Reviewed By: RKSimon, atanasyan

Differential Revision: https://reviews.llvm.org/D101342
2021-05-05 16:34:45 +01:00
Fraser Cormack 61a46375a2 [RISCV][VP][NFC] Add tests for VP_SREM and VP_UREM
As agreed in D101826, these are follow-up tests for the RISC-V VP
support.
2021-05-05 13:13:34 +01:00
Fraser Cormack 437468f319 [RISCV][VP][NFC] Add tests for VP_MUL and VP_[US]DIV
As agreed in D101826, these are follow-up tests for the RISC-V VP
support.
2021-05-05 13:08:57 +01:00
Fraser Cormack 491a3d1359 [RISCV][VP][NFC] Add tests for VP_SHL and VP_LSHR
As agreed in D101826, these are follow-up tests for the RISC-V VP
support. Tests for VP_ASHR were landed as part of D101826.
2021-05-05 13:01:04 +01:00
Fraser Cormack 3fbcf07a99 [RISCV][VP][NFC] Add tests for VP_AND, VP_XOR, VP_OR
As agreed in D101826, these are follow-up tests for the RISC-V VP
support.
2021-05-05 12:58:08 +01:00
Fraser Cormack 6f17613bfb [RISCV][VP] Lower VP ISD nodes to RVV instructions
This patch supports all of the current set of VP integer binary
intrinsics by lowering them to to RVV instructions. It does so by using
the existing RISCVISD *_VL custom nodes as an intermediate layer. Both
scalable and fixed-length vectors are supported by using this method.

One notable change to the existing vector codegen strategy is that
scalable all-ones and all-zeros mask SPLAT_VECTORs are now lowered to
RISCVISD VMSET_VL and VMCLR_VL nodes to match their fixed-length
BUILD_VECTOR counterparts. This allows them to reuse the existing
"all-ones" VL patterns.

To reduce the size of the phabricator diff, some tests are intentionally
left out and will be added later if the patch is accepted.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D101826
2021-05-05 12:32:24 +01:00
Fraser Cormack cd6a52fede [RISCV] Cap legal fixed-length vectors to 256-element types
Previously, RISC-V would make legal all fixed-length vectors types whose
size are less than or equal to some function of the minimum value of
VLEN and the maximum-permissible LMUL grouping.

Due to vector legalization issues, this patch instead caps the legal
fixed-length vector types to those with 256 elements. This value was
chosen because it is the longest vector length which has corresponding
MVTs across all supported element types.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D101839
2021-05-05 09:51:08 +01:00
Fraser Cormack 6523ff6d47 [ValueTypes] Add MVTs for v256i16 and v256f16
This patch adds the two MVTs to fix a legalizer crash when using vector
shuffles of <256 x i16> and <128 x i16> on RISC-V. The legalizer can't
promote the operand of `v256i32 = any_extend_vector_inreg v128i16`.

Reviewed By: craig.topper, RKSimon

Differential Revision: https://reviews.llvm.org/D101769
2021-05-04 18:06:13 +01:00
Jessica Clarke fb92cf9208 [RISCV] Pre-commit tests for D101342
These tests show inefficient sign extension for AMOs on RISC-V. The
normal CodeGen tests use anyext return values, but if marked signext
then we end up generating unnecessary sign extension instructions. This
can be seen when compiling C that returns an i32 (signed or unsigned),
where the calling convention results in a signext return value.
2021-05-04 11:12:43 +01:00
Fraser Cormack 46fa214a6f [RISCV] Lower splats of non-constant i1s as SETCCs
This patch adds support for splatting i1 types to fixed-length or
scalable vector types. It does so by lowering the operation to a SETCC
of the equivalent i8 type.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D101465
2021-05-04 09:14:05 +01:00
Fraser Cormack d23e4f6872 [RISCV] Add support for fmin/fmax vector reductions
Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D101518
2021-05-03 10:33:51 +01:00
Craig Topper ba63cdb8f2 [RISCV] Store SEW in RISCV vector pseudo instructions in log2 form.
This shrinks the immediate that isel table needs to emit for these
instructions. Hoping this allows me to change OPC_EmitInteger to
use a better variable length encoding for representing negative
numbers. Similar to what was done a few months ago for OPC_CheckInteger.

The alternative encoding uses less bytes for negative numbers, but
increases the number of bytes need to encode 64 which was a very
common number in the RISCV table due to SEW=64. By using Log2 this
becomes 6 and is no longer a problem.
2021-05-02 12:09:20 -07:00
Fraser Cormack 1d85b24762 [RISCV][NFC] Merge RV32/RV64 test checks with a common prefix 2021-04-30 09:43:48 +01:00
Fraser Cormack 791766e6d2 [RISCV] Support STEP_VECTOR with a step greater than one
DAGCombiner was recently taught how to combine STEP_VECTOR nodes,
meaning the step value is no longer guaranteed to be one by the time it
reaches the backend for lowering.

This patch supports such cases on RISC-V by lowering to other step
values to a multiply following the vid.v instruction. It includes a
small optimization for common cases where the multiply can be expressed
as a shift left.

Reviewed By: rogfer01

Differential Revision: https://reviews.llvm.org/D100856
2021-04-30 09:36:18 +01:00
luxufan 5603ed60ad [RISCV] Fix StackOffset calculation when using sp to access the fixed stack object in the case of rvv vector objects existed
When rvv vector objects existed, using sp to access the fixed stack object will pass the rvv vector objects field. So the StackOffset needs add a scalable offset of the size of rvv vector objects field

Differential Revision: https://reviews.llvm.org/D100286
2021-04-30 11:02:38 +08:00
luxufan 325b454ed8 [RISCV] Precommit a test case that test accessing a fixed object when has rvv vector object existed
Differential Revision: https://reviews.llvm.org/D100284
2021-04-30 10:35:03 +08:00
Craig Topper dcdda2bdf2 [RISCV] Teach DAG combine to fold (and (select_cc lhs, rhs, cc, -1, c), x) -> (select_cc lhs, rhs, cc, x, (and, x, c))
Similar for or/xor with 0 in place of -1.

This is the canonical form produced by InstCombine for something like `c ? x & y : x;` Since we have to use control flow to expand select we'll usually end up with a mv in basic block. By folding this we may be able to pull the and/or/xor into the block instead and avoid a mv instruction.

The code here is based on code from ARM that uses this to create predicated instructions. I'm doing it on SELECT_CC so it happens late, but we could do it on select earlier which is what ARM does. I'm not sure if we lose any combine opportunities if we do it earlier.

I left out add and sub because this can separate sext.w from the add/sub. It also made a conditional i64 addition/subtraction on RV32 worse. I guess both of those would be fixed by doing this earlier on select.

The select-binop-identity.ll test has not been commited yet, but I made the diff show the changes to it.

Reviewed By: luismarques

Differential Revision: https://reviews.llvm.org/D101485
2021-04-29 09:43:51 -07:00
Craig Topper 60216adef1 [RISCV] Add test cases for D101485. NFC 2021-04-29 09:43:51 -07:00
Craig Topper 0c330afdfa [RISCV] Enable SPLAT_VECTOR for fixed vXi64 types on RV32.
This replaces D98479.

This allows type legalization to form SPLAT_VECTOR_PARTS so we don't
lose the splattedness when the scalar type is split.

I'm handling SPLAT_VECTOR_PARTS for fixed vectors separately so
we can continue using non-VL nodes for scalable vectors.

I limited to RV32+vXi64 because DAGCombiner::visitBUILD_VECTOR likes
to form SPLAT_VECTOR before seeing if it can replace the BUILD_VECTOR
with other operations. Especially interesting is a splat BUILD_VECTOR of
the extract_vector_elt which can become a splat shuffle, but won't if
we form SPLAT_VECTOR first. We either need to reorder visitBUILD_VECTOR
or add visitSPLAT_VECTOR.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D100803
2021-04-29 08:20:09 -07:00
Craig Topper 25391cec3a [RISCV] Teach computeKnownBits that vsetvli returns number less than 2^31.
This seems like a reasonable upper bound on VL. WG discussions for
the V spec would probably allow us to use 2^16 as an upper bound
on VLEN, but this is good enough for now.

This allows us to remove sext and zext if user happens to assign
the size_t result into an int and then uses it as a VL intrinsic
argument which is size_t.

Reviewed By: frasercrmck, rogfer01, arcbbb

Differential Revision: https://reviews.llvm.org/D101472
2021-04-29 08:07:59 -07:00
Fraser Cormack f6c54a61da [RISCV][NFC] Combine identical RV32 and RV64 test checks 2021-04-29 11:38:10 +01:00
Fraser Cormack 43ad058a01 [RISCV] Fix stack slot for argument types (Bug 49500)
This is an complementary/alternative fix for D99068. It takes a slightly
different approach by explicitly summing up all of the required split
part type sizes and ensuring we allocate enough space for them. It also
takes the maximum alignment of each part.

Compared with D99068 there are fewer changes to the stack objects in
existing tests. However, @luismarques has shown in that patch that there
are opportunities to reduce our stack usage in the future.

Reviewed By: luismarques

Differential Revision: https://reviews.llvm.org/D99087
2021-04-29 09:10:48 +01:00
Craig Topper ce09dd54e6 [RISCV] Select 5 bit immediate for VSETIVLI during isel rather than peepholing in the custom inserter.
This adds a special operand type that is allowed to be either
an immediate or register. By giving it a unique operand type the
machine verifier will ignore it.

This perturbs a lot of tests but mostly it is just slightly different
instruction orders. Something bad did happen to some min/max reduction
tests. We're spilling vector registers when we weren't before.

Reviewed By: khchen

Differential Revision: https://reviews.llvm.org/D101246
2021-04-27 14:38:16 -07:00
Craig Topper 262a72f50f [RISCV] Use stack slot to handle SPLAT_VECTOR_PARTS on RV32.
Reduces the amount of vector ALU operations and reduces vector
register pressure.
2021-04-26 15:43:02 -07:00
Craig Topper e2cd92cb9b [RISCV] Match splatted load to scalar load + splat. Form strided load during isel.
This modifies my previous patch to push the strided load formation
to isel. This gives us opportunity to fold the splat into a .vx
operation first. Using a scalar register and a .vx operation reduces
vector register pressure which can be important for larger LMULs.

If we can't fold the splat into a .vx operation, then it can make
sense to use a strided load to free up the vector arithmetic
ALU to do actual arithmetic rather than tying it up with vmv.v.x.

Reviewed By: khchen

Differential Revision: https://reviews.llvm.org/D101138
2021-04-26 13:32:03 -07:00
Ben Shi 60ed86d350 [RISCV] Optimize addition with immediate
Reviewed by: craig.topper

Differential Revision: https://reviews.llvm.org/D101244
2021-04-26 13:26:17 +08:00
Craig Topper 8f5cd49405 [RISCV] Teach DAG combine what bits Zbp instructions demanded from their inputs.
This teaches DAG combine that shift amount operands for grev, gorc
shfl, unshfl only read a few bits.

This also teaches DAG combine that grevw, gorcw, shflw, unshflw,
bcompressw, bdecompressw only consume the lower 32 bits of their
inputs.

In the future we can teach SimplifyDemandedBits to also propagate
demanded bits of the output to the inputs in some cases.
2021-04-25 21:54:06 -07:00
Levy Hsu 8cf54c7ff5 [RISCV] [1/2] Add IR intrinsic for Zbe extension
RV32/64:
bcompress
bdecompress

RV64 ONLY:
bcompressw
bdecompressw

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D101143
2021-04-25 19:14:34 -07:00
Craig Topper 3064a63b2b [RISCV] Remove GetVRegNoV0 from the output register class of masked compare pseudo instructions.
Theses instructions are allowed to write v0 when they are masked.
We'll still never use v0 because of the earlyclobber constraint so
this doesn't really help anything. It just makes the definitions
correct.

While I was there remove an unused multiclass I noticed.

Reviewed By: HsiangKai

Differential Revision: https://reviews.llvm.org/D101118
2021-04-23 09:33:29 -07:00
Fraser Cormack 83b8f8da82 [RISCV] Custom lower vector F(MIN|MAX)NUM to vf(min|max)
This patch adds support for both scalable- and fixed-length vector code
lowering of the llvm.minnum and llvm.maxnum intrinsics to the equivalent
RVV instructions.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D101035
2021-04-23 12:22:15 +01:00
Levy Hsu b49337bbb9 [RISCV] [1/2] Add IR intrinsic for Zbp extension
RV32/64:
    grev
    grevi
    gorc
    gorci
    shfl
    shfli
    unshfl
    unshfli

RV64 ONLY:
    grevw
    greviw
    gorcw
    gorciw
    shflw
    shfli     (For non-existing shfliw)
    unshfli   (For non-existing unshfliw)

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D100830
2021-04-22 16:34:51 -07:00
Craig Topper 5185b52988 [RISCV] Fix crash with fptosi.sat/fptoui.sat intrinsics on RV64. Add test cases.
Add PromoteIntOp_FP_TO_XINT_SAT to type legalize the bit width
operand from i32 to i64 for RV64.

Add test cases for the saturating intrinsics for half/float/double
and i32/i64. CodeGen is definitely not optimal. We can probably
make use of the native behavior of fcvt instructions in many cases.

Fixes PR50083
2021-04-22 15:18:15 -07:00
Craig Topper e01c419ecd [RISCV] Add IR intrinsics for vmsge(u).vv/vx/vi.
These instructions don't really exist, but we have ways we can
emulate them.

.vv will swap operands and use vmsle().vv. .vi will adjust the
immediate and use .vmsgt(u).vi when possible. For .vx we need to
use some of the multiple instruction sequences from the V extension
spec.

For unmasked vmsge(u).vx we use:
  vmslt{u}.vx vd, va, x; vmnand.mm vd, vd, vd

For cases where mask and maskedoff are the same value then we have
vmsge{u}.vx v0, va, x, v0.t which is the vd==v0 case that
requires a temporary so we use:
  vmslt{u}.vx vt, va, x; vmandnot.mm vd, vd, vt

For other masked cases we use this sequence:
  vmslt{u}.vx vd, va, x, v0.t; vmxor.mm vd, vd, v0
We trust that register allocation will prevent vd in vmslt{u}.vx
from being v0 since v0 is still needed by the vmxor.

Differential Revision: https://reviews.llvm.org/D100925
2021-04-22 10:44:38 -07:00
Craig Topper d77d56acfd [RISCV] Add missing tests for vector type for second operand of vmsgt and vmsgtu IR intrinsics.
Refactor to use new multiclass instead of individual patterns.

We already supported this due to SEW=64 on RV32, but we didn't have
test cases for all the types we supported.

Part of D100925
2021-04-22 10:44:38 -07:00
Craig Topper 9524a0553d [RISCV] Support vector type for second operand of vmfge and vmfgt IR intrinsics.
We don't have instructions for these, but can swap the operands
to use vmle/vmflt. This makes the IR interface more consistent and
simplifies the frontend implementation.

Part of D100925
2021-04-22 10:44:38 -07:00
Craig Topper 70254ccb69 [RISCV] Turn splat shuffles of vector loads into strided load with stride of x0.
Implementations are allowed to optimize an x0 stride to perform
less memory accesses. This is the case in SiFive cores.

No idea if this is the case in other implementations. We might
need a tuning flag for this.

Reviewed By: frasercrmck, arcbbb

Differential Revision: https://reviews.llvm.org/D100815
2021-04-22 10:02:57 -07:00
Craig Topper 77f14c96e5 [RISCV] Use stack temporary to splat two GPRs into SEW=64 vector on RV32.
Rather than doing splatting each separately and doing bit manipulation
to merge them in the vector domain, copy the data to the stack
and splat it using a strided load with x0 stride. At least on
some implementations this vector load is optimized to not do
a load for each element.

This is equivalent to how we move i64 to f64 on RV32.

I've only implemented this for the intrinsic fallbacks in this
patch. I think we do similar splatting/shifting/oring in other
places. If this is approved, I'll refactor the others to share
the code.

Differential Revision: https://reviews.llvm.org/D101002
2021-04-22 09:50:07 -07:00
Jun Ma 978eb3f168 [DAGCombiner] Allow operand of step_vector to be negative.
It is proper to relax non-negative limitation of step_vector.
Also this patch adds more combines for step_vector:
(sub X, step_vector(C)) -> (add X,  step_vector(-C))

Differential Revision: https://reviews.llvm.org/D100812
2021-04-22 20:58:03 +08:00
Serge Pavlov 740962e5d0 [RISCV] Custom lowering of SET_ROUNDING
Differential Revision: https://reviews.llvm.org/D91242
2021-04-22 15:04:55 +07:00
Serge Pavlov 6e63dfdae2 [RISCV] Custom lowering of FLT_ROUNDS_
Differential Revision: https://reviews.llvm.org/D90854
2021-04-22 11:39:15 +07:00
Craig Topper f6d8cf7798 [RISCV] Teach lowerSPLAT_VECTOR_PARTS to detect cases where Hi is sign extended from Lo.
This recognizes the case when Hi is (sra Lo, 31). We can use
SPLAT_VECTOR_I64 rather than splatting the high bits and
combining them in the vector register.
2021-04-21 20:24:23 -07:00
Fraser Cormack c141bd3cf9 [DAGCombiner] Support all-ones/all-zeros SPLAT_VECTOR in more combines
This patch adds incrementally-better support for SPLAT_VECTOR in a
handful of vector combines by changing a few more
isBuildVectorAllOnes/isBuildVectorAllZeros to the equivalent
isConstantSplatVectorAllOnes/Zeros calls.

Reviewed By: paulwalker-arm

Differential Revision: https://reviews.llvm.org/D100851
2021-04-21 11:05:37 +01:00
Fraser Cormack 3f02d26943 [RISCV] Further fixes for RVV stack offset computation
This patch fixes a case missed out by D100574, in which RVV scalable
stack offset computations may require three live registers in the case
where the offset's fixed component is 12 bits or larger and has a
scalable component.

Instead of adding an additional emergency spill slot, this patch further
optimizes the scalable stack offset computation sequences to reduce
register usage.

By emitting the sequence to compute the scalable component before the
fixed component, we can free up one scratch register to be reallocated
by the sequence for the fixed component. Doing this saves one register
and thus one additional emergency spill slot.

Compare:

    $x5 = LUI 1
    $x1 = ADDIW killed $x5, -1896
    $x1 = ADD $x2, killed $x1
    $x5 = PseudoReadVLENB
    $x6 = ADDI $x0, 50
    $x5 = MUL killed $x5, killed $x6
    $x1 = ADD killed $x1, killed $x5

versus:

    $x5 = PseudoReadVLENB
    $x1 = ADDI $x0, 50
    $x5 = MUL killed $x5, killed $x1
    $x1 = LUI 1
    $x1 = ADDIW killed $x1, -1896
    $x1 = ADD $x2, killed $x1
    $x1 = ADD killed $x1, killed $x5

Reviewed By: HsiangKai

Differential Revision: https://reviews.llvm.org/D100847
2021-04-21 10:51:07 +01:00
Craig Topper 78abad569c [RISCV] Add missing SEW=64 tests to vmslt-rv32.ll. NFC 2021-04-20 18:31:36 -07:00
Fraser Cormack 60622b82a7 [RISCV][NFC] Add tests for scalable-vector DAGCombiner improvements
These will all be improved by future patches.
2021-04-20 14:26:26 +01:00
Fraser Cormack b4a358a7ba [RISCV] Fix missing emergency slots for scalable stack offsets
This patch adds an additional emergency spill slot to RVV code. This is
required as RVV stack offsets may require an additional register to compute.

This patch includes an optimization by @HsiangKai <kai.wang@sifive.com>
to reduce the number of registers required for the computation of stack
offsets from 3 to 2. Otherwise we'd need two additional emergency spill
slots.

Reviewed By: HsiangKai

Differential Revision: https://reviews.llvm.org/D100574
2021-04-20 09:59:41 +01:00
Jun Ma 1ef5699d1a [DAGCombiner] Support fold zero scalar vector.
This patch changes ISD::isBuildVectorAllZeros to
ISD::isConstantSplatVectorAllZeros which handles zero sclar vector.

TestPlan: check-llvm

Differential Revision: https://reviews.llvm.org/D100813
2021-04-20 16:28:43 +08:00
Fraser Cormack 457da7f298 [SelectionDAG] Relax constraints on STEP_VECTOR step operand
This patch relaxes the requirement that the STEP_VECTOR step constant
must be of a type at least as large as the vector element type. This
does not permit its use on targets which have legal vector element types
larger than the largest legal scalar type, such as i64 vectors on RV32.

As such, the requirement has been loosened so that the step operand must
be any scalar type so long as the constant immediate is non-negative and
the value fits inside the vector element type.

This limits combining optimizations in certain circumstances but in
practice it's unlikely to be a hindrance.

Reviewed By: paulwalker-arm

Differential Revision: https://reviews.llvm.org/D100660
2021-04-20 08:41:42 +01:00
Ben Shi b7249bf3b5 [RISCV][test] Add a new test of addition
Reviewed by: craig.topper

Differential Revision: https://reviews.llvm.org/D100767
2021-04-20 12:11:56 +08:00
Craig Topper 7ed01a420a [RISCV] Pad v4i1/v2i1/v1i1 stores with 0s to make a full byte.
As noted in the FIXME there's a sort of agreement that the any
extra bits stored will be 0.

The generated code is pretty terrible. I was really hoping we
could use a tail undisturbed trick, but tail undisturbed no
longer applies to masked destinations in the current draft
spec.

Fingers crossed that it isn't common to do this. I doubt IR
from clang or the vectorizer would ever create this kind of store.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D100618
2021-04-19 11:05:18 -07:00
Fraser Cormack c9a93c3e01 [RISCV] Lower vector shuffles to vrgather operations
This patch extends the lowering of RVV fixed-length vector shuffles to
avoid the default stack expansion and instead lower to vrgather
instructions.

For "permute"-style shuffles where one vector is swizzled, we can lower
to one vrgather. For shuffles involving two vector operands, we lower to
one unmasked vrgather (or splat, where appropriate) followed by a masked
vrgather which blends in the second half.

On occasion, when it's not possible to create a legal BUILD_VECTOR for
the indices, we use vrgatherei16 instructions with 16-bit index types.

For 8-bit element vectors where we may have indices over 255, we have a
fairly blunt fallback to the stack expansion to avoid custom-splitting
of the vector types.

To enable the selection of masked vrgather instructions, this patch
extends the various RISCVISD::VRGATHER nodes to take a passthru operand.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D100549
2021-04-19 11:13:13 +01:00
David Sherwood 83f5fa519e [CodeGen] Improve code generation for clamping of constant indices with scalable vectors
When trying to clamp a constant index into a scalable vector we can
test if the index is less than the minimum number of elements in the
vector. If so, we can simply return the index because we know it is
guaranteed to fit inside the vector.

Differential Revision: https://reviews.llvm.org/D100639
2021-04-19 08:34:17 +01:00
Fraser Cormack ec0f7c6923 [RISCV] Rerun stack test through update_llc_test_checks.py
Adjusts formatting of comments only. Just to reduce diffs in future
patches.
2021-04-16 11:08:58 +01:00
Jim Lin 2893570e86 [RISCV] Don't emit save-restore call if function is a interrupt handler
It has to save all caller-saved registers before a call in the handler.
So don't emit a call that save/restore registers.

Reviewed By: simoncook, luismarques, asb

Differential Revision: https://reviews.llvm.org/D100532
2021-04-16 12:54:47 +08:00
Fraser Cormack eae0ac3a1f [RISCV] Pre-commit vector shuffle test cases
This codegen will be improved by future patches.
2021-04-15 10:31:13 +01:00
ShihPo Hung d5e962f1f2 [RISCV] Implement COPY for Zvlsseg registers
When copying Zvlsseg register tuples, we split the COPY to NF whole register moves
as below:

  $v10m2_v12m2 = COPY $v4m2_v6m2 # NF = 2
=>
  $v10m2 = PseudoVMV2R_V $v4m2
  $v12m2 = PseudoVMV2R_V $v6m2

This patch copies forwardCopyWillClobberTuple from AArch64 to check
register overlapping.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D100280
2021-04-13 18:55:51 -07:00
Fraser Cormack d737c47137 [RISCV] Support vector SET[U]LT and SET[U]GE with splatted immediates
This patch adds more optimized codegen for the above SETCC forms,
by matching the '.vi' vector forms when the immediate is a 5-bit signed
immediate plus 1. The immediate can be decremented and the corresponding
SET[U]LE or SET[U]GT forms can be matched.

This work was left as a TODO from D94168.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D100096
2021-04-12 18:36:45 +01:00
Craig Topper ff902080a9 [RISCV] Use SLLI/SRLI instead of SLLIW/SRLIW for (srl (and X, 0xffff), C) custom isel on RV64.
We don't need the sign extending behavior here and SLLI/SRLI
are able to compress to C.SLLI/C.SRLI.
2021-04-11 13:59:51 -07:00
Craig Topper 3ae71226ef [RISCV] Drop earlyclobber constraint from vwadd(u).wx, vwsub(u).wx, vfwadd.wf and vfwsub.wf.
The first source has the same EEW as the destination and the other
source is a scalar so the overlap constraints don't apply to
the unmasked version.

For the masked version we have a constraint that the destination
can't be V0 so that covers the only overlap issue there.

Reviewed By: khchen

Differential Revision: https://reviews.llvm.org/D100217
2021-04-11 10:19:45 -07:00
Craig Topper bc0e052730 [RISCV] Teach targetShrinkDemandedConstant to preserve (and X, 0xffff) when zext.h is supported.
Similar to what we do for zext.w.

Disable the (srl (and X, 0xffff), C) custom isel when zext.h is
available.
2021-04-11 10:03:35 -07:00
Craig Topper 48d69edade [RISCV] Add i8 and i16 srli and srai tests to Zbb/Zbp test files. NFC
These require the input to be zero or sign extended. If we have
sext.b, sext.h or zext.h instructions we can use them. Otherwise
we need to use a pair of shifts to accomplish the zero/sign extend
and the final shift.

We don't currently use zext.h when it is available.
2021-04-11 10:00:38 -07:00
Fraser Cormack a5693445ca [RISCV] Support OR/XOR/AND reductions on vector masks
This patch adds RVV codegen support for OR/XOR/AND reductions for both
scalable- and fixed-length vector types. There are a few possible
codegen strategies for each -- vmfirst.m, vmsbf.m, and vmsif.m could be
used to some extent -- but the vpopc.m instruction was chosen since it
produces the scalar result in one instruction, after which scalar
instructions can finish off the computation.

The reductions are lowered identically for both scalable- and
fixed-length vectors, although some alternate strategies may be more
optimal on fixed-length vectors since it's cheaper to get the length of
those types.

Other reduction types were not deemed to be relevant for mask vectors.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D100030
2021-04-08 09:46:38 +01:00
Hsiangkai Wang ba72bdef32 [RISCV] Add scalable offset under very large stack size.
If the stack size is larger than 12 bits, we have to use a scratch
register to store the stack size. Before we introduce the scalable stack
offset, we could simplify

%0 = ADDI %stack.0, 0

=>

%scratch = ... # sequence of instructions to move the offset into
%%scratch
%0 = ADD %fp, %scratch

However, if the offset contains scalable part, we need to consider it.

%0 = ADDI %stack.0, 0

=>

%scratch = ... # sequence of instructions to move the offset into
%%scratch
%scratch = ADD %fp, %scratch
%scalable_offset = ... # sequence of instructions for vscaled-offset.
%0 = ADD/SUB %scratch, %scalable_offset

Differential Revision: https://reviews.llvm.org/D100035
2021-04-08 14:46:05 +08:00
Hsiangkai Wang b8cd668115 [NFC][RISCV] Add test for scalable offset under large stack size.
This test case shows that we access wrong stack slots when the frame
object has scalable offset under large stack size.

Differential Revision: https://reviews.llvm.org/D100084
2021-04-08 14:46:05 +08:00