Commit Graph

806 Commits

Author SHA1 Message Date
Craig Topper 550fab53e1 [RISCV] Fold (sub C, (xor (setcc), 1)) -> (add (setcc), C-1).
Extracted from D131729 where we handled C==0. It's now generalized
to more constants.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D132000
2022-08-17 09:50:08 -07:00
Craig Topper ab4cd154c6 [RISCV] Refactor performSUBCombine to prepare for D132000.
This refactors the code into a separate function with early returns.
D132000 adds an additional operation to the if/else that selects
NewLHS, but can otherwise share the rest of the code.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D132002
2022-08-17 09:50:08 -07:00
Craig Topper d27c147aaa [RISCV] Allow lowerSELECT to fold integer setcc with FP select.
We'd pick it up in DAG combine later even if we didn't handle it here.
No test changes because we get it in DAG combine anyway.
2022-08-16 21:28:54 -07:00
Craig Topper ba1fb54821 [RISCV] Reuse existing VT variable instead of calling getValueType() repeatedly. NFC 2022-08-16 19:56:55 -07:00
Craig Topper 53ce22e429 Recommit "[RISCV] Use setcc's original SDLoc when inverting it in performSUBCombine."
This time using N1 instead of N0 since N1 points to the original
setcc. This now affects scheduling as I expected.

Original commit message:
We change seteq<->setne but it doesn't change the semantics
of the setcc. We should keep original debug location. This is
consistent with visitXor in the generic DAGCombiner.
2022-08-16 15:51:07 -07:00
Craig Topper 2dfa4b6475 Revert "[RISCV] Use setcc's original SDLoc when inverting it in performSUBCombine."
This reverts commit 1380b21ceb.

I mixed up N0 and N1 and didn't do what I intended.
2022-08-16 15:47:01 -07:00
Craig Topper 1380b21ceb [RISCV] Use setcc's original SDLoc when inverting it in performSUBCombine.
We change seteq<->setne but it doesn't change the semantics
of the setcc. We should keep original debug location. This is
consistent with visitXor in the generic DAGCombiner.
2022-08-16 15:40:09 -07:00
Craig Topper b5a18de651 [RISCV] Remove C!=0 restriction from (sub C, (setcc x, y, eq/neq)) -> (add C-1, (setcc x, y, neq/eq)).
While (sub 0, X) can use x0 for the 0, I believe (add X, -1) is
still preferrable. (addi X, -1) can be compressed, sub with x0 on
the LHS is never compressible.
2022-08-16 14:49:52 -07:00
Craig Topper de6fd16971 [RISCV] Don't fold (sub C, (setcc x, y, eq/neq)) -> (add C-1, (setcc x, y, neq/eq)) if C-1 isn't simm12.
We still need to materialize the constant in a register and we
may not be removing all uses of the original constant so it may
increase code size.
2022-08-16 14:11:31 -07:00
Craig Topper 4184edc691 [RISCV] (sub C, (setcc x, y, eq/neq)) -> (add C-1, (setcc x, y, neq/eq)) fold for FP setcc.
This introduce an xori in some cases. I don't believe it was the
intention of the original patch. This was an accident because
nonan FP equality compares also use SETEQ/SETNE.

Also pass the correct type to getSetCCInverse.
2022-08-16 13:00:36 -07:00
Craig Topper c7e58836e8 [RISCV] Minor cleanups to performSUBCombine. NFC
-Rename variable NnzC -> N0C.
-Use SelectionDAG::getSetCC to reduce code.
-Use SDValue::getOperand instead of operator-> and SDNode::getOperand.

Initial steps to add another similar combine to this code.
2022-08-16 12:59:16 -07:00
Craig Topper 7a73ab5818 [RISCV] Enable isTruncateFree in SDAG for i64->i32 on rv64.
We have a good selection of W instructions, so promoting a truncated
value back to i64 is often free.

This appears to be a net code size reduction on SPECINT2006.

This has been split from D130397 as one of the patches needed to
complete that.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D131819
2022-08-15 08:32:51 -07:00
LiaoChunyu 99ef0ddea3 [RISCV] Fold (sub constant, (setcc x, y, eq/neq)) -> (add constant - 1, (setcc x, y, neq/eq))
(setcc x, y, eq/neq) are seqz, snez that set rd = 0/1.

addi is used to process immediate, which can save instructions for load immediate.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D131471
2022-08-13 20:37:57 +08:00
jacquesguan 0fe5f03eeb [RISCV][NFC] Use nested namespace definations.
Since we use C++17 now, we could use nested namespace definations to simplify code.

Differential Revision: https://reviews.llvm.org/D131751
2022-08-13 09:56:59 +08:00
Alex Bradbury 47b1f8362a [RISCV] Implement isUsedByReturnOnly TargetLowering hook in order to tailcall more libcalls
Prior to this patch, libcalls inserted by the SelectionDAG legalizer
could never be tailcalled. The eligibility of libcalls for tail calling
is is partly determined by checking TargetLowering::isInTailCallPosition
and comparing the return type of the libcall and the calleer.
isInTailCallPosition in turn calls TargetLowering::isUsedByReturnOnly
(which always returns false if not implemented by the target).

This patch provides a minimal implementation of
TargetLowering::isUsedByReturnOnly - enough to support tail calling
libcalls on hard float ABIs. Soft-float ABIs are left for a follow on
patch. libcall-tail-calls.ll also shows missed opportunities to tail
call integer libcalls, but this is due to issues outside of
the isUsedByReturnOnly hook.

Differential Revision: https://reviews.llvm.org/D131087
2022-08-10 10:50:29 +01:00
Nikita Popov f5ed0cb217 [RISCV] Add target feature to force-enable atomics
This adds a +forced-atomics target feature with the same semantics
as +atomics-32 on ARM (D130480). For RISCV targets without the +a
extension, this forces LLVM to assume that lock-free atomics
(up to 32/64 bits for riscv32/64 respectively) are available.

This means that atomic load/store are lowered to a simple load/store
(and fence as necessary), as these are guaranteed to be atomic
(as long as they're aligned). Atomic RMW/CAS are lowered to __sync
(rather than __atomic) libcalls. Responsibility for providing the
__sync libcalls lies with the user (for privileged single-core code
they can be implemented by disabling interrupts). Code using
+forced-atomics and -forced-atomics are not ABI compatible if atomic
variables cross the ABI boundary.

For context, the difference between __sync and __atomic is that the
former are required to be lock-free, while the latter requires a
shared global lock provided by a shared object library. See
https://llvm.org/docs/Atomics.html#libcalls-atomic for a detailed
discussion on the topic.

This target feature will be used by Rust's riscv32i target family
to support the use of atomic load/store without atomic RMW/CAS.

Differential Revision: https://reviews.llvm.org/D130621
2022-08-09 16:04:46 +02:00
Fangrui Song de9d80c1c5 [llvm] LLVM_FALLTHROUGH => [[fallthrough]]. NFC
With C++17 there is no Clang pedantic warning or MSVC C5051.
2022-08-08 11:24:15 -07:00
Kazu Hirata a2d4501718 [llvm] Fix comment typos (NFC) 2022-08-07 00:16:14 -07:00
Craig Topper 12a1ca9c42 [RISCV] Relax another one use restriction in performSRACombine.
When folding (sra (add (shl X, 32), C1), 32 - C) -> (shl (sext_inreg (add X, C1), i32), C)
it's possible that the add is used by multiple sras. We should
allow the combine if all the SRAs will eventually be updated.

After transforming all of the sras, the shls will share a single
(sext_inreg (add X, C1), i32).

This pattern occurs if an sra with 32 is used as index in multiple
GEPs with different scales. The shl from the GEPs will be combined
with the sra before we get a chance to match the sra pattern.
2022-08-04 14:32:31 -07:00
Craig Topper a2de12c987 [RISCV] Relax a one use restriction performSRACombine
When folding (sra (add (shl X, 32), C1), 32 - C) -> (shl (sext_inreg (add X, C1), C)
ignore the use count on the (shl X, 32).

The sext_inreg after the transform is free. So we're only making
2 new instructions, the add and the shl. So we only need to be
concerned with replacing the original sra+add. The original shl
can have other uses. This helps if there are multiple different
constants being added to the same shl.
2022-08-04 11:25:08 -07:00
Craig Topper 53d560b22f [RISCV] Prevent infinite loop after D129980.
D129980 converts (seteq (i64 (and X, 0xffffffff)), C1) into
(seteq (i64 (sext_inreg X, i32)), C1). If bit 31 of X is 0, it
will be turned back into an 'and' by SimplifyDemandedBits which
can cause an infinite loop.

To prevent this, check if bit 31 is 0 with computeKnownBits before
doing the transformation.

Fixes PR56905.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D131113
2022-08-03 15:19:07 -07:00
David Truby 9a976f3661 [llvm] Always use TargetConstant for FP_ROUND ISD Nodes
This patch ensures consistency in the construction of FP_ROUND nodes
such that they always use ISD::TargetConstant instead of ISD::Constant.

This additionally fixes a bug in the AArch64 SVE backend where patterns
were matching against TargetConstant nodes and sometimes failing when
passed a Constant node.

Reviewed By: paulwalker-arm

Differential Revision: https://reviews.llvm.org/D130370
2022-08-03 14:02:11 +01:00
Alex Bradbury 28f12a09ae [RISCV] Teach ComputeNumSignBitsForTargetNode about masked atomic intrinsics
An unnecessary sext.w is generated when masking the result of the
riscv_masked_cmpxchg_i64 intrinsic. Implementing handling of the
intrinsic in ComputeNumSignBitsForTargetNode allows it to be removed.

Although this isn't a particularly important optimisation, removing the
sext.w simplifies implementation of an additional cmpxchg-related
optimisation in D130192.

Although I can't produce a test with different codegen for the other
atomics intrinsics, these are added as well for completeness.

Differential Revision: https://reviews.llvm.org/D130191
2022-08-03 13:41:58 +01:00
Fraser Cormack 646e2f4803 [VP] Rename VP int<->float conversion ISD opcodes
These should be named like the non-VP versions for consistency.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D130967
2022-08-03 10:04:38 +01:00
wanglian e208bab55f [RISCV][NFC] Use defined variable instead some code.
Reviewed By: asb

Differential Revision: https://reviews.llvm.org/D130687
2022-08-02 16:26:33 +08:00
Lorenzo Albano 71b7c03fd6 [RISCV][VP] Custom lower VP_STRIDED_LOAD and VP_STRIDED_STORE
Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D121113
2022-08-01 09:23:45 -07:00
Craig Topper d21b315360 [RISCV] Remove vmerges from vector ceil, floor, trunc lowering.
Use masked operations to suppress spurious exception bits being
set in fflags. Unfortunately, doing this adds extra copies.
2022-07-30 10:58:41 -07:00
Craig Topper a23f07fb1d [RISCV] Add merge operands to more RISCVISD::*_VL opcodes.
This adds a merge operand to all of the binary _VL nodes. Including
integer and widening. They all share multiclasses in tablegen
so doing them all at once was easiest.

I plan to use FADD_VL in an upcoming patch. The rest are just for
consistency to keep tablegen working.

This does reduce the isel table size by about 25k so that's nice.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D130816
2022-07-30 10:26:38 -07:00
Craig Topper 9bf305fe2b [RISCV] Swap the merge and mask operand order for VRGATHER*_VL and FCOPYSIGN_VL nodes.
Based on review feedback from D130816.
2022-07-30 09:57:05 -07:00
Craig Topper 2750873dfe [RISCV] Update lowerFROUND to use masked instructions.
This avoids a vmerge at the end and avoids spurious fflags updates.
This isn't used for constrained intrinsic so we technically don't have
to worry about fflags, but it doesn't cost much to support it.

To support I've extend our FCOPYSIGN_VL node to support a passthru
operand. Similar to what was done for VRGATHER*_VL nodes.

I plan to do a similar update for trunc, floor, and ceil.

Reviewed By: reames, frasercrmck

Differential Revision: https://reviews.llvm.org/D130659
2022-07-28 10:05:19 -07:00
Craig Topper 89173dee71 [RISCV] Remove duplicate code. NFC
The same operations are part of `FloatingPointVecReduceOps` a little
bit earlier.
2022-07-28 10:05:19 -07:00
Craig Topper 1d1d8d6025 [RISCV] Reorder code in lowerFROUND to make the diff in D130659 cleaner. NFC 2022-07-27 17:13:04 -07:00
Craig Topper 98647330bf [RISCV] Add merge operand to RISCVISD::FCOPYSIGN_VL.
Similar to what was done for VRGATHER*_VL recently.

This will be used in D130659.
2022-07-27 15:25:34 -07:00
LiaoChunyu bf4f9a468a [RISCV]Enable isIntDivCheap when attribute is minsize
Don't expand divisions by constants when attribute is minsize.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D130543
2022-07-27 18:22:51 +08:00
Craig Topper 45944e7cf4 [RISCV] Refactor translateSetCCForBranch to prepare for D130508. NFC.
D130508 handles more constants than just 1 or -1. We need to extract
the constant instead of relying isOneConstant or isAllOnesConstant.
2022-07-25 15:54:54 -07:00
jacquesguan d8800ead62 [RISCV] Scalarize binop followed by extractelement.
This patch adds shouldScalarizeBinop to RISCV target in order to convert an extract element of a vector binary operation into an extract element followed by a scalar binary operation.

Differential Revision: https://reviews.llvm.org/D129545
2022-07-25 17:23:31 +08:00
Craig Topper 9adc00a9d0 [RISCV] Add a continue to reduce nesting. NFC 2022-07-23 17:36:12 -07:00
Kazu Hirata 1cc7f5bede Use static_assert instead of assert (NFC)
Identified with misc-static-assert.
2022-07-23 09:22:27 -07:00
Craig Topper add17fc8e4 [RISCV] Combine (select_cc (srl (and X, 1<<C), C), 0, eq/ne, true, fale)
(srl (and X, 1<<C), C) is the form we receive for testing bit C.
An earlier combine removed the setcc so it wasn't there to match
when we created the SELECT_CC. This doesn't happen for BR_CC because
generic DAG combine rebuilds the setcc if it is used by BRCOND.

We can shift X left by XLen-1-C to put the bit to be tested in the
MSB, and use a signed compare with 0 to test the MSB.
2022-07-20 22:32:11 -07:00
Craig Topper 7dda6c71b1 [RISCV] Refactor the common combines for SELECT_CC and BR_CC into a helper function.
The only difference between the combines were the calls to getNode
that include the true/false values for SELECT_CC or the chain
and branch target for BR_CC.

Wrap the rest of the code into a helper that reads LHS, RHS, and
CC and outputs new values and a bool if a new node needs to be
created.
2022-07-20 21:18:07 -07:00
Craig Topper 8983db15a3 [RISCV] Optimize (brcond (seteq (and X, 1 << C), 0))
If C > 10, this will require a constant to be materialized for the
And. To avoid this, we can shift X left by XLen-1-C bits to put the
tested bit in the MSB, then we can do a signed compare with 0 to
determine if the MSB is 0 or 1. Thanks to @reames for the suggestion.

I've implemented this inside of translateSetCCForBranch which is
called when setcc+brcond or setcc+select is converted to br_cc or
select_cc during lowering. It doesn't make sense to do this for
general setcc since we lack a sgez instruction.

I've tested bit 10, 11, 31, 32, 63 and a couple bits betwen 11 and 31
and between 32 and 63 for both i32 and i64 where applicable. Select
has some deficiencies where we receive (and (srl X, C), 1) instead.
This doesn't happen for br_cc due to the call to rebuildSetCC in the
generic DAGCombiner for brcond. I'll explore improving select in a
future patch.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D130203
2022-07-20 18:40:49 -07:00
ksyx 3198364e6e [RISCV][Clang] Add support for Zmmul extension
This patch implements recently ratified extension Zmmul, a subextension
of M (Integer Multiplication and Division) consisting only
multiplication part of it.

Differential Revision: https://reviews.llvm.org/D103313
Reviewed By: craig.topper, jrtc27, asb
2022-07-18 20:26:08 -04:00
Craig Topper 0b02752899 [RISCV] Optimize (seteq (i64 (and X, 0xffffffff)), C1)
(and X, 0xffffffff) requires 2 shifts in the base ISA. Since we
know the result is being used by a compare, we can use a sext_inreg
instead of an AND if we also modify C1 to have 33 sign bits instead
of 32 leading zeros. This can also improve the generated code for
materializing C1.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D129980
2022-07-18 10:54:45 -07:00
Simon Pilgrim 259c36e7c1 [DAG] Add asserts to isDesirableToCommuteWithShift overrides to ensure its being called from a shift. NFC. 2022-07-18 13:11:24 +01:00
jacquesguan 2b11174079 [RISCV][NFC] Use more Arrayref in TargetLowering functions.
This patch replaces some foreach with Arrayref, and abstract some same literal array with a variable.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D125656
2022-07-18 03:33:45 +00:00
Fangrui Song d955497112 [RISCV] Simplify lowerGlobalAddress. NFC 2022-07-17 15:42:45 -07:00
Craig Topper decf385c27 [RISCV] Teach targetShrinkDemandedConstant to handle OR and XOR.
We were only handling AND before, but SimplifyDemandedBits can
also call it for OR and XOR.
2022-07-17 12:36:33 -07:00
Craig Topper 257755530a [RISCV] Fold (sra (sext_inreg (shl X, C1), i32), C2) -> (sra (shl X, C1+32), C2+32).
The former pattern will select as slliw+sraiw while the latter
will select as slli+srai. This can enable the slli+srai to be
compressed.

Differential Revision: https://reviews.llvm.org/D129688
2022-07-13 14:34:17 -07:00
Philip Reames dde2a7fb6d [RISCV] Exploit fact that vscale is always power of two to replace urem sequence
When doing scalable vectorization, the loop vectorizer uses a urem in the computation of the vector trip count. The RHS of that urem is a (possibly shifted) call to @llvm.vscale.

vscale is effectively the number of "blocks" in the vector register. (That is, types such as <vscale x 8 x i8> and <vscale x 1 x i8> both fill one 64 bit block, and vscale is essentially how many of those blocks there are in a single vector register at runtime.)

We know from the RISCV V extension specification that VLEN must be a power of two between ELEN and 2^16. Since our block size is 64 bits, the must be a power of two numbers of blocks. (For everything other than VLEN<=32, but that's already broken.)

It is worth noting that AArch64 SVE specification explicitly allows non-power-of-two sizes for the vector registers and thus can't claim that vscale is a power of two by this logic.

Differential Revision: https://reviews.llvm.org/D129609
2022-07-13 10:54:47 -07:00
Craig Topper c5be6a8308 [RISCV] Use X0 in place of VLMaxSentinel in lowering.
I thought I had already fixed all of these, but I guess I missed one.
2022-07-11 23:29:04 -07:00
Craig Topper c3c17b1695 [RISCV] Use MVT for the argument to getMaskTypeFor. NFC
Only one caller didn't already have an MVT and that was easy to
fix. Since the return type is MVT and it uses MVT::getVectorVT,
taking an MVT as input makes the most sense.
2022-07-11 15:14:44 -07:00
Craig Topper 1a2bd44b77 [RISCV] Make shouldConvertConstantLoadToIntImm return true unless enableUnalignedScalarMem is true.
This restores the old behavior before D129402 when
enableUnalignedScalarMem is false. This fixes a regression spotted
by @asb.

To fix this correctly, we need to consider alignment of the load
we'd be replacing, but that's not possible in the current interface.
2022-07-11 09:40:08 -07:00
LiaoChunyu 3f68f0f816 [RISCV] Optimize 2x SELECT for floating-point types
Including the following opcode:
 Select_FPR16_Using_CC_GPR
 Select_FPR32_Using_CC_GPR
 Select_FPR64_Using_CC_GPR

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D127871
2022-07-11 14:10:27 +08:00
Craig Topper 35ec8a423d [RISCV] Teach shouldConvertConstantLoadToIntImm that constant materialization can use constant pools.
I think it only makes sense to return true here if we aren't going
to turn around and create a constant pool for the immmediate.

I left out the check for useConstantPoolForLargeInts() thinking
that even if you don't want the commpiler to create a constant pool
you might still want to avoid materializing an integer that is
already available in a global variable.

Test file was copied from AArch64/ARM and has not been commited yet.
Will post separate review for that.

Reviewed By: luismarques

Differential Revision: https://reviews.llvm.org/D129402
2022-07-10 14:10:17 -07:00
Lian Wang 9cfb28d672 [RISCV] Change VECTOR_SPLICE mask operation from expand to promote
Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D128717
2022-07-08 06:20:22 +00:00
Diego Caballero bf1758c3dc Revert "[RISCV] Optimize 2x SELECT for floating-point types"
This reverts commit 1178992c72.
2022-07-07 22:54:00 +00:00
Craig Topper 51d672946e [RISCV] Fold (sra (add (shl X, 32), C1), 32 - C) -> (shl (sext_inreg (add X, C1), C)
Similar for a subtract with a constant left hand side.

(sra (add (shl X, 32), C1<<32), 32) is the canonical IR from InstCombine
for (sext (add (trunc X to i32), 32) to i32).

For RISCV, we should lower this as addiw which means turning it into
(sext_inreg (add X, C1)).

There is an existing DAG combine to convert back to (sext (add (trunc X
to i32), 32) to i32), but it requires isTruncateFree to return true
and for i32 to be a legal type as it used sign_extend and truncate
nodes. So that doesn't work for RISCV.

If the outer sra happens be used by a shl by constant, it will be
folded and the shift amount of the sra will be changed before we
can do our own DAG combine. This requires us to match the more
general pattern and restore the shl.

I had wanted to do this as a separate (add (shl X, 32), C1<<32) ->
(shl (add X, C1), 32) combine, but that hit an infinite loop for some
values of C1.

Reviewed By: asb

Differential Revision: https://reviews.llvm.org/D128869
2022-06-30 09:01:24 -07:00
Craig Topper 9ace5af049 [RISCV] DAG combine (sra (shl X, 32), 32 - C) -> (shl (sext_inreg X, i32), C).
The sext_inreg can often be folded into an earlier instruction by
using a W instruction. The sext_inreg also works better with our ABI.

This is one of the steps to improving the generated code for this https://godbolt.org/z/hssn6sPco

Reviewed By: asb

Differential Revision: https://reviews.llvm.org/D128843
2022-06-30 09:01:24 -07:00
Philip Reames 860c62f53c [RISCV] Refine known bits for READ_VLENB
This implements known bits for READ_VALUE using any information known about minimum and maximum VLEN. There's an additional assumption that VLEN is a power of two.

The motivation here is mostly to remove the last use of getMinVLen, but while I was here, I decided to also fix the bug for VLEN < 128 and handle max from command line generically too.

Differential Revision: https://reviews.llvm.org/D128758
2022-06-28 15:42:14 -07:00
Lian Wang 96ab083622 [RISCV] Support VECTOR_REVERSE mask operation.
Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D128627
2022-06-28 07:48:51 +00:00
LiaoChunyu 1178992c72 [RISCV] Optimize 2x SELECT for floating-point types
Including the following opcode:
 Select_FPR16_Using_CC_GPR
 Select_FPR32_Using_CC_GPR
 Select_FPR64_Using_CC_GPR

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D127871
2022-06-28 12:02:05 +08:00
Craig Topper ea1b861278 [RISCV] Fix misleading formatting and remove a dead getNode call. NFC 2022-06-27 18:49:57 -07:00
Philip Reames 0533b6e2f6 [RISCV] Remove a use of getMinVLen in favor of getRealMinVLen
The later is possibly greater than the former, and thus the assert was overly strong when a wider VLEN was set at the command line.
2022-06-27 12:52:24 -07:00
Philip Reames a0443dd47c [RISCV] Simplify 16 bit index handling in lowerVECTOR_REVERSE [nfc]
getRealMaxVLen returns an upper bound on the value of VLEN.  We can use this upper bound (which unless explicitly set at command line is going to result in a e8 MaxVLMax of much greater than 256) instead of explicitly handling the unknown case separately from the bounded by number greater than 256 case.

Note as well that this code already implicitly depends on a capped value for VLEN.  If infinite VLEN were possible, than 16 bit indices wouldn't be enough.
2022-06-24 13:08:39 -07:00
Philip Reames f1e1c3ce77 [RISCV] Replace two calls to getMinRVVVectorSizeInBits in fixed length lowering [nfc]
Both of these are only reached if useRVVForFixedLengthVectors is true.  Given that, we know that getRealMinVLen() == getMinRVVVectorSizeInBits().
2022-06-24 13:00:57 -07:00
Craig Topper c579ab53bd [RISCV] Move vfma_vl+fneg_vl matching to DAG combine.
This patch adds 3 new _VL RISCVISD opcodes to represent VFMA_VL with
different portions negated. It also adds a DAG combine to peek
through FNEG_VL to create these new opcodes.

This is modeled after similar code from X86.

This makes the isel patterns more regular and reduces the size of
the isel table by ~37K.

The test changes look like regressions, but they point to a bug that
was already there. We aren't able to commute a masked FMA instruction
to improve register allocation because we always use a mask undisturbed
policy. Prior to this patch we matched two multiply operands in a
different order and hid this issue for these test cases, but a different
test still could have encountered it.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D128310
2022-06-24 00:00:37 -07:00
Craig Topper 8b10ffabae [RISCV] Disable <vscale x 1 x *> types with Zve32x or Zve32f.
According to the vector spec, mf8 is not supported for i8 if ELEN
is 32. Similarily mf4 is not suported for i16/f16 or mf2 for i32/f32.

Since RVVBitsPerBlock is 64 and LMUL is calculated as
((MinNumElements * ElementSize) / RVVBitsPerBlock) this means we
need to disable any type with MinNumElements==1.

For generic IR, these types will now be widened in type legalization.
For RVV intrinsics, we'll probably hit a fatal error somewhere. I plan
to work on disabling the intrinsics in the riscv_vector.h header.

Reviewed By: arcbbb

Differential Revision: https://reviews.llvm.org/D128286
2022-06-23 08:49:18 -07:00
Craig Topper f912d21e67 [RISCV] Add RISCVISD opcodes for the rest of get*Addr.
This adds RISCVISD opccodes for LA, LA_TLS_IE, and LA_TLS_GD to
remove creation of MachineSDNodes form get*Addr. This makes the
code consistent with the previous patches that added RISCVISD::HI,
ADD_LO, LLA, and TPREL_ADD.

Reviewed By: asb

Differential Revision: https://reviews.llvm.org/D128325
2022-06-22 09:21:07 -07:00
Craig Topper 0efbf5bfbb [RISCV] Move the passthru operand for RISCVISD::VRGATHER*_VL nodes. NFC
Put it before the VL instead of as the first operand. I want to add
passthru to more operands, but the commutable ones like VADD_VL
require the commutable operands to be operand 0 and 1. So we can't
have the passthru as operand 0 for those.
2022-06-21 14:01:02 -07:00
Craig Topper e01353f816 [RISCV] Add RISCVISD opcode for PseudoAddTPRel.
Use it along with RISCVISD::HI and ADD_LO to avoid emitting
MachineSDNodes during lowering.
2022-06-20 20:56:52 -07:00
Kazu Hirata 0916d96d12 Don't use Optional::hasValue (NFC) 2022-06-20 20:17:57 -07:00
Craig Topper 16d3a82de5 [RISCV] Add merge operand to RISCVISD::VRGATHER*_VL nodes.
Use it in place of VSELECT_VL+VRGATHER*_VL.

This simplifies the isel patterns.

Overall, I think trying to match select+op to create masked instructions
in isel doesn't scale. We either need to do it in DAG combine, pre-isel
peepole, or post-isel peephole. I don't yet know which is the right
answer, but for this case it seemed best to be able to request the
masked form directly from lowering.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D128023
2022-06-20 18:58:24 -07:00
Kazu Hirata e0e687a615 [llvm] Don't use Optional::hasValue (NFC) 2022-06-20 10:38:12 -07:00
Craig Topper 545a71c0d6 [RISCV] Pre-promote v1i1/v2i1/v4i1->i1/i2/i4 bitcasts before type legalization
Type legalization will convert the bitcast into a vector store and
scalar load.

Instead this patch widens the vector to v8i1 with undef, and bitcasts
it to i8. v8i1->i8 has custom handling for type legalization already to
bitcast to a v1i8 vector and use an extract_element.

The code here was lifted from X86's avx512 support.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D128099
2022-06-18 11:06:45 -07:00
Craig Topper cbf6737cc4 [RISCV] Use RVVBitsPerBlock instead of hardcoding multiples of 64. NFC 2022-06-17 14:10:39 -07:00
Craig Topper 9d7b01dc95 [RISCV] Implement RISCVTargetLowering::getTargetConstantFromLoad.
This allows computeKnownBits to see the constant being loaded.

This recovers the rv64zbp test case changes from D127520.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D127679
2022-06-16 15:11:18 -07:00
Craig Topper 5afdceb82b [RISCV] Add RISCVISD opcode for PseudoLLA.
Rather than emitting a MachineSDNode from lowering. Let isel match it.

This is consistent with the RISCVISD::HI and ADD_LO nodes that were
also added. Having them both the same will make D127679 consistent.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D127714
2022-06-16 15:11:03 -07:00
Craig Topper 4191de262f [RISCV] Don't emit LUI/ADDI MachineSDNodes from getAddr
Instead add RISCVISD opcodes that will be selected to LUI/ADDI
during isel.

I'm looking into maybe moving doPeepholeLoadStoreADDI into isel.
Having the ADDI as a RISCVISD node will make it visible to isel.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D127713
2022-06-16 14:56:07 -07:00
Craig Topper e4062522d3 [RISCV] Disable matchSplatAsGather for i1 vectors to prevent creating illegal nodes.
We were incorrectly creating a VRGATHER node with i1 vector type. We
could support this by promoting the mask to i8 and truncating it, but
for now I want to prevent the crash.

Fixes PR56007.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D127681
2022-06-13 13:41:39 -07:00
Craig Topper cef03e3dcd [RISCV] Move creation of constant pools from isel to lowering.
This simplifies the isel code by removing the manual load creation.
It also improves our ability to use 0 strided loads for vector splats.

There is an assumption here that Mask and ShiftedMask constants are
cheap enough that they don't become constant pool loads so that our
isel optimizations involving And still work. I believe those constants
are 3 instructions in the worst case.

The rv64zbp-intrinsic.ll changes is a regression caused by intrinsics
being expanded to RISCVISD also occuring during lowering. So the optimizations
were only happening during the last DAGCombine, which can't see through the
load. I believe we can fix this test by implementing
TargetLowering::getTargetConstantFromLoad for RISC-V or by adding the intrinsic
to computeKnownBitsForTargetNode to enable earlier DAG combine. Since Zbp is not
a ratified extension, I don't view these as blocking this patch.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D127520
2022-06-13 09:07:57 -07:00
Craig Topper e91051184c [RISCV] Mark FSIN and other math functions as Expand for scalable vectors.
This prevents them from being assumed legal by the cost model.

This matches what is done for AArch64 SVE.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D123799
2022-06-10 08:40:07 -07:00
Shao-Ce SUN 862f30a428 [RISCV] Add ISD::EH_DWARF_CFA
Based on D24038.
LLVM has an @llvm.eh.dwarf.cfa intrinsic, used to lower the GCC-compatible __builtin_dwarf_cfa() builtin.

Reviewed By: StephenFan

Differential Revision: https://reviews.llvm.org/D126181
2022-06-08 22:03:30 +08:00
Craig Topper aeb27f133a [RISCV] Fix i64<->f64 and i32<->f32 bitcasts with VLS vectors enabled.
We enable a custom handler to optimize conversions between scalars
and fixed vectors. Unfortunately, the custom handler picks up scalar
to scalar conversions as well. If the scalar types are both legal,
we wouldn't match any of the fixed vector cases and would return SDValue()
causing the LegalizeDAG to expand the bitcast through memory.

This patch fixes this by checking if it's a scalar to scalar conversion
and returns `Op` if both types are legal.

Differential Revision: https://reviews.llvm.org/D126739
2022-06-01 08:13:49 -07:00
Craig Topper b09e54541a [RISCV] Use template version of SignExtend64 for constant extends. NFC
We were inconsistent about which one we used.
2022-05-27 13:11:15 -07:00
Craig Topper d0f65eaa85 [RISCV] Remove unused variables. NFC 2022-05-27 12:13:45 -07:00
Craig Topper aaad507546 [RISCV] Return false from isOffsetFoldingLegal instead of reversing the fold in lowering.
When lowering GlobalAddressNodes, we were removing a non-zero offset and
creating a separate ADD.

It already comes out of SelectionDAGBuilder with a separate ADD. The
ADD was being removed by DAGCombiner.

This patch disables the DAG combine so we don't have to reverse it.
Test changes all look to be instruction order changes. Probably due
to different DAG node ordering.

Differential Revision: https://reviews.llvm.org/D126558
2022-05-27 11:05:18 -07:00
Philip Reames 8a3b6ba756 [RISCV] Add a subtarget feature to enable unaligned scalar loads and stores
A RISCV implementation can choose to implement unaligned load/store support. We currently don't have a way for such a processor to indicate a preference for unaligned load/stores, so add a subtarget feature.

There doesn't appear to be a formal extension for unaligned support. The RISCV Profiles (https://github.com/riscv/riscv-profiles/blob/main/profiles.adoc#rva20u64-profile) docs use the name Zicclsm, but a) that doesn't appear to actually been standardized, and b) isn't quite what we want here anyway due to the perf comment.

Instead, we can follow precedent from other backends and have a feature flag for the existence of misaligned load/stores with sufficient performance that user code should actually use them.

Differential Revision: https://reviews.llvm.org/D126085
2022-05-26 15:25:47 -07:00
Craig Topper e9ac99b609 [RISCV] Simplfy creation of IndexVT in lowerMaskedGather/lowerMaskedScatter. NFC
The scalar element width is not a factor in how ContainerVT is
determined. We don't need to check the relative size of VT and
IndexVT.
2022-05-26 13:13:32 -07:00
jacquesguan b271488e8b [RISCV] Replace ISD::FP_EXTEND and ISD::FP_ROUND with RVV VL op.
This patch tries to solve the incoordination between the direct and intermediate  cast caused by D123975.
This patch replaces ISD::FP_EXTEND and ISD::FP_ROUND with RVV VL op in the lowering of FP scalable vector direct cast to unify with the intermediate cast.
And it also changes the FP widenning pattern with the VL op.

Differential Revision: https://reviews.llvm.org/D125364
2022-05-26 02:17:31 +00:00
Craig Topper 172149e98c [RISCV] Preserve fast math flags in lowerVPOp.
Update test to check MIR after finalize-isel instead of debug output.

This is of course not the only place we should preserve FMF, but
it's the most obvious one.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D126306
2022-05-25 09:16:07 -07:00
Paul Walker 258dac43d6 [SVE] Enable use of 32bit gather/scatter indices for fixed length vectors
Differential Revision: https://reviews.llvm.org/D125193
2022-05-22 12:32:30 +01:00
Jay Foad 6bec3e9303 [APInt] Remove all uses of zextOrSelf, sextOrSelf and truncOrSelf
Most clients only used these methods because they wanted to be able to
extend or truncate to the same bit width (which is a no-op). Now that
the standard zext, sext and trunc allow this, there is no reason to use
the OrSelf versions.

The OrSelf versions additionally have the strange behaviour of allowing
extending to a *smaller* width, or truncating to a *larger* width, which
are also treated as no-ops. A small amount of client code relied on this
(ConstantRange::castOp and MicrosoftCXXNameMangler::mangleNumber) and
needed rewriting.

Differential Revision: https://reviews.llvm.org/D125557
2022-05-19 11:23:13 +01:00
Paul Walker 7dd05ba9ed [SelectionDAG] Remove duplicate "is scaled" information from gather/scatter SDNodes.
During early gather/scatter enablement two different approaches
were taken to represent scaled indices:

* A Scale operand whereby byte_offsets = Index * Scale
* An IndexType whereby byte_offsets = Index * sizeof(MemVT.ElementType)

Having multiple representations is bad as shown by this patch which
fixes instances where the two are out of sync. The dedicated scale
operand is more flexible and pervasive so this patch removes the
UNSCALED values from IndexType. This means all indices are scaled
but the scale can be one, hence unscaled. SDNodes now use the scale
operand to answer the "isScaledIndex" question.

I toyed with the idea of keeping the UNSCALED enums and helper
functions but because they will have no uses and force SDNodes to
validate the set of supported values I figured it's best to remove
them. We can re-add them if there's a real need. For similar
reasons I've kept the IndexType enum when a bool could be used as I
think being explicitly looks better.

Depends On D123347

Differential Revision: https://reviews.llvm.org/D123381
2022-05-16 20:47:52 +01:00
jacquesguan a8426ada49 [RISCV][NFC] Replace for-each with array argument call.
This patch replaces some for-each set with the new arrayref argument API, since it already used an array in defination, I think this change won't cause any ambiguity.

Differential Revision: https://reviews.llvm.org/D125455
2022-05-16 02:12:48 +00:00
Craig Topper 5a19fbad83 [RISCV] Remove unneeded check for ISD::VSCALE operand being a constant. NFC
ISD::VSCALE only allows constant operands.
2022-05-14 13:45:03 -07:00
Roger Ferrer Ibanez 189ca6958e [RISCV] Use the new chain when converting a fixed RVV load
When building the final merged node, we were using the original chain
rather than the output chain of the new operation. After some collapsing
of the chain this could cause the loads be incorrectly scheduled respect
to later stores.

This was uncovered by SingleSource/Regression/C/gcc-c-torture/execute/pr36038.c
of the llvm testsuite.

https://reviews.llvm.org/D125560
2022-05-13 22:21:08 +00:00
Zakk Chen 7dfc56c107 [RISCV] Add the passthru operand for RVV unmasked segment load IR intrinsics.
The goal is support tail and mask policy in RVV builtins.
We focus on IR part first.
If the passthru operand is undef, we use tail agnostic, otherwise
use tail undisturbed.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D125323
2022-05-13 02:16:40 -07:00
Craig Topper 0ebb02b90a [RISCV] Override TargetLowering::shouldProduceAndByConstByHoistingConstFromShiftsLHSOfAnd.
This hook determines if SimplifySetcc transforms (X & (C l>>/<< Y))
==/!= 0 into ((X <</l>> Y) & C) ==/!= 0. Where C is a constant and
X might be a constant.

The default implementation favors doing the transform if X is not
a constant. Otherwise the code is left alone. There is a provision
that if the target supports a bit test instruction then the transform
will favor ((1 << Y) & X) ==/!= 0. RISCV does not say it has a variable
bit test operation.

RISCV with Zbs does have a BEXT instruction that performs (X >> Y) & 1.
Without Zbs, (X >> Y) & 1 still looks preferable to ((1 << Y) & X) since
we can fold use ANDI instead of putting a 1 in a register for SLL.

This patch overrides this hook to favor bit extract patterns and
otherwise falls back to the "do the transform if X is not a constant"
heuristic.

I've added tests where both C and X are constants with both the shl form
and lshr form. I've also added a test for a switch statement that lowers
to a bit test. That was my original motivation for looking at this.

Reviewed By: asb

Differential Revision: https://reviews.llvm.org/D124639
2022-05-11 11:13:17 -07:00
Craig Topper 0781742785 [RISCV] Add a DAG combine to pre-promote (i32 (and (srl X, Y), 1)) with Zbs on RV64.
Type legalization will want to turn (srl X, Y) into RISCVISD::SRLW,
which will prevent us from using a BEXT instruction.

I don't think there is any precedent for type promotion checking
users to decide how to promote. Instead, I've added this DAG combine to
do it before type legalization.

Reviewed By: asb

Differential Revision: https://reviews.llvm.org/D124109
2022-05-11 10:49:16 -07:00
Fraser Cormack c1d48b35d8 [SelectionDAG][VP] Rename VP sext/zext/trunc ISD opcodes
Rather than VP_SEXT/VP_ZEXT/VP_TRUNC, having
VP_SIGN_EXTEND/VP_ZERO_EXTEND/VP_TRUNCATE better matches their non-VP
counterparts.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D125298
2022-05-11 10:25:51 +01:00