Commit Graph

864 Commits

Author SHA1 Message Date
Craig Topper b41fe90dc3 [RISCV] Correct the setcc in vp.floor/ceil/round/roundeven lowering.
We want to emit a masked setcc that preserves zeros in all of the bits
where the original mask is zero. To do this we need to pass the original
mask as the passthru operand as well. Otherwise, we'll use the mask agnostic
policy and replace the zeros with 1s on some CPUs.

Differential Revision: https://reviews.llvm.org/D135122
2022-10-03 20:58:05 -07:00
Philip Reames e884324145 [RISCV] Generalize select (and (x , 0x1) == 0), y, (z ^ y) ) and select (and (x , 0x1) == 0), y, (z | y) ) transforms by removing and-clause
These transforms were recently added (by me) in D134881. Looking at the code again, I realized we don't need the (and x, 0x1) portion of the pattern, we just need to know that the result of that sub-tree is either 0 or 1. Checking for this directly allows us to match slightly more broadly. The test changes are zext i1 arguments, but this could also kick in for e.g. shifts of high bits, or any other source of known bits.

Differential Revision: https://reviews.llvm.org/D135081
2022-10-03 13:57:38 -07:00
Philip Reames a200b0fc25 [DAG] Introduce getSplat utility for common dispatch pattern [nfc]
We have a very common pattern of dispatching between BUILD_VECTOR and SPLAT_VECTOR creation repeated in many cases in code.  Common the pattern into a utility function.
2022-10-03 12:49:39 -07:00
Craig Topper f3e87a63e5 [RISCV] Add missing VL arguments to the creation of RISCVISD::VMV_V_X_VL nodes.
VMV_V_X_VL nodes should always have a passthru, a splat, and a VL.
We were sometimes missing the VL.

This went unnoticed because these cases were all selected into the
following node to form a .vx or .vi instruction. The ComplexPattern
that does this, doesn't check the VL operand. I've added an assert
to the ComplexPattern to catch if the operand is missing.

@qcolombet spotted some of these in D134703.
2022-10-03 12:21:05 -07:00
Craig Topper 5b06ccb611 Revert "foo"
This reverts commit 2138ef354a.

Forgot to squash
2022-10-03 12:15:41 -07:00
Craig Topper a55cdcae3e Revert "[RISCV] Add missing VL arguments to the creation of RISCVISD::VMV_V_X_VL nodes."
This reverts commit 4c03c9f375.

Forgot to squash
2022-10-03 12:15:28 -07:00
Craig Topper 4c03c9f375 [RISCV] Add missing VL arguments to the creation of RISCVISD::VMV_V_X_VL nodes.
VMV_V_X_VL nodes should always have a passthru, a splat, and a VL.
We were sometimes missing the VL.

This went unnoticed because these cases were all selected into the
following node to form a .vx or .vi instruction. The ComplexPattern
that does this, doesn't check the VL operand. I've added an assert
to the ComplexPattern to catch if the operand is missing.

@qcolombet spotted some of these in D134703.
2022-10-03 12:13:21 -07:00
Craig Topper 2138ef354a foo 2022-10-03 12:13:20 -07:00
Yeting Kuo cefb7aab61 [VP][RISCV] Add vp.copysign and RISC-V support.
Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D134935
2022-10-01 10:19:10 +08:00
Philip Reames 1e3c179519 [RISCV] Address post commit review comments from D134881 2022-09-30 08:31:40 -07:00
Philip Reames 2b5960028e [RISCV] Branchless lowering for select (and (x , 0x1) == 0), y, (z ^ y) ) and select (and (x , 0x1) == 0), y, (z | y) )
This code is directly ported from the X86 backend which applies the same rewrite (along with several others). Planning on looking more closely at the other branchless variants from x86 to see if any are worth porting in future changes.

Motivation here is the coremark crc8 routine from https://github.com/eembc/coremark/blob/main/core_util.c#L165. This patch significantly reduces the number of unpredictable branches in the workload.

Differential Revision: https://reviews.llvm.org/D134881
2022-09-30 08:24:32 -07:00
Ray Wang 4c786c9747 [RISCV] Remove some unused var decl. NFC
Differential Revision: https://reviews.llvm.org/D134707
2022-09-30 08:08:15 -07:00
Yeting Kuo 1cc02b05b7 [SelectionDAG] Add helper function to check whether a SDValue is neutral element. NFC.
Using this helper makes work about neutral elements more easier. Although I only
find one case now, I think it will have more chance to be used since so many
combine works are related to neutral elements.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D133866
2022-09-30 11:29:11 +08:00
eopXD 02a982829c [RISCV] Add lowering for llvm.roundeven
Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D134785
2022-09-29 06:08:14 -07:00
eopXD 9677d70eb2 [VP][RISCV] Add vp.floor, vp.round, vp.roundeven and their RISC-V support
Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D134759
2022-09-27 19:45:58 -07:00
Han-Kuan Chen c595c874cb [RISCV] Lower BUILD_VECTOR to RISCVISD::VID_VL if it is floating-point type.
Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D133688
2022-09-27 17:25:34 -07:00
eopXD 163cb33854 [VP][RISCV] Add vp.ceil and RISC-V support
Previous commit 8b00b24f85 missed to add `int_ceil` anchor for the
llvm.ceil.* section under LangRef.rst

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D134586
2022-09-27 12:04:09 -07:00
eopXD 384b8b3da7 Revert "[VP][RISCV] Add vp.ceil and RISC-V support"
This reverts commit 8b00b24f85.
2022-09-27 11:12:57 -07:00
eopXD 8b00b24f85 [VP][RISCV] Add vp.ceil and RISC-V support
Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D134586
2022-09-27 11:08:27 -07:00
Yeting Kuo 04e1301f3d [VP][RISCV] Add vp.maxnum and vp.minnum intrinsics and RISC-V support.
Add vp.maxnum and vp.minnum which are vector predicted intrinsics of llvm.maxnum
and llvm.minnum.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D134639
2022-09-27 13:36:45 +08:00
Yeting Kuo 43c5fbdd3a [VP][RISCV] Add vp.sqrt intrinsic and RISC-V support.
The patch modeled vp.fabs patch D132793.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D133690
2022-09-26 10:47:40 +08:00
Philip Reames 6e7c54ecaf [RISCV] Add lowering for scalable @llvm.riscv.masked.strided.load/store
The code previously assumed fixed length vectors; make the relevant code conditional.

Having the lowering in place is neccessary for an upcoming change to generalize scatter/gather matching to scalable vectors.

Differential Revision: https://reviews.llvm.org/D134489
2022-09-24 17:41:57 -07:00
Craig Topper 19850cc2d8 Revert "[RISCV] Lower BUILD_VECTOR to RISCVISD::VID_VL if it is floating-point type."
This reverts commit dd53a0bb30.

We have seen crashes from this internally. Probably due to the use
of RoundingMode::Dynamic.
2022-09-23 18:41:41 -07:00
Craig Topper 90a5d8499a [RISCV] Promote f16 STRICT_FCEIL/FLOOR/TRUNC/NEARBYINT/RINT/ROUND,ROUNDEVEN to f32. 2022-09-23 14:01:51 -07:00
Philip Reames 60c91fd364 [RISCV] Disallow scale for scatter/gather
RISCV doesn't actually support a scaled form of indexed load and store. We previously handled this by forming the scaled SDNode, and then doing custom legalization during lowering. This patch instead adds a callback via TLI to prevent formation entirely.

This has two effects:
* First, the GEP gets expanded (and used). Instead of the shift being created with an SDLoc of the memory operation, it has the SDLoc of the GEP instruction. This avoids the scheduler perturbing IR order when there's no reason to.
* Second, we fix what appears to be a bug in index calculation with RV32. The rules for GEPs require index calculation be done in particular bitwidth, and it appears the custom legalization code got this wrong for the case where index type exceeds pointer width. (Or at least, I trust the generic GEP lowering to be correct a lot more.)

The DAGCombiner change to handle VPScatter/VPGather is technically separate, but is required to prevent a regression on those intrinsics.

Differential Revision: https://reviews.llvm.org/D134382
2022-09-22 15:31:26 -07:00
Craig Topper 52708be182 [RISCV] Remove support for the unratified Zbe, Zbf, and Zbm extensions.
These extensions do not appear to be on their way to ratification.
2022-09-22 13:04:41 -07:00
Fraser Cormack 92d71c615d [RISCV] Use structured bindings in common RVV lowering code
This patch uses structured bindings to simplify a couple of specific
cases when lowering RVV operations where we commonly declare two
SDValues and immediately 'tie' them to the mask and vector length.

There's also a couple places where we split vectors that structured
bindings make sense to use.

This patch tries to keep these sorts of changes minimal and to cases
where the returned types are commonly understood, rather than applying
this wholesale to the RISCV backend.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D134442
2022-09-22 16:38:40 +01:00
Craig Topper bf7c7696fe [RISCV] Improve support for vector fp_to_sint_sat/uint_sat.
The default fixed vector legalization is to unroll. The default
scalable vector legalization is to clamp in the FP domain. The
RVV vfcvt instructions have saturating behavior so we can use them
directly. The only difference is that RVV instruction turn nan into
the max value, but the _SAT intrinsics want 0.

I'm only supporting 1 step of narrowing for now. I think we can
support more steps by using VNCLIP to saturate and narrower.

The only case that needs 2 steps of widening is f16->i64 which we can
do as f16->f32->i64.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D134400
2022-09-22 08:13:48 -07:00
Craig Topper 8b8e18e11f [RISCV] Replace RISCVISD::GREV/GORC/SHFL/UNSHFL with BREV8/ORC_B/ZIP/UNZIP.
With Zbp removed, we no longer need the generalized forms.

The computeKnownBitsForTargetNode code brev8/orc.b is still based
on the general form with the shift amount forced to 7.
2022-09-21 21:57:59 -07:00
Craig Topper 182aa0cbe0 [RISCV] Remove support for the unratified Zbp extension.
This extension does not appear to be on its way to ratification.

Still need some follow up to simplify the RISCVISD nodes.
2022-09-21 21:22:42 -07:00
Craig Topper 1d8a7adca6 [RISCV] Rename RISCVISD::SINT_TO_FP_VL/UINT_TO_FP_VL. NFC
Name them after the instructions VFCVT_RTZ_X(U)_F_VL to make it
clear that the ISD nodes don't have the poison semantics of
ISD::SINT_TO_FP/UINT_TO_FP.

I play to reuse this node for a FP_TO_SINT_SAT/FP_TO_UINT_SAT
patch and need the instruction semantics.
2022-09-21 15:33:04 -07:00
Craig Topper 70a64fe7b1 [RISCV] Remove support for the unratified Zbt extension.
This extension does not appear to be on its way to ratification.

Out of the unratified bitmanip extensions, this one had the
largest impact on the compiler.

Posting this patch to start a discussion about whether we should
remove these extensions. We'll talk more at the RISC-V sync meeting this
Thursday.

Reviewed By: asb, reames

Differential Revision: https://reviews.llvm.org/D133834
2022-09-20 20:26:48 -07:00
LiaoChunyu 2e74157ad4 [RISCV]Preserve (and X, 0xffff) in targetShrinkDemandedConstant
shrinkdemandedconstant does some optimizations, but is not very friendly to riscv, targetShrinkDemandedConstant to limit the damage.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D134155
2022-09-19 14:19:38 +08:00
LiaoChunyu 8fee91c435 [RISCV][NFC]Remove outdated comment from targetShrinkDemandedConstant
Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D134154
2022-09-19 10:23:06 +08:00
Craig Topper 61595c45af [RISCV] Simplify some code in vector fp<->int handling. NFC
We changed the way container types are selected since this code
was written. We no longer need to use the largest type.
2022-09-16 12:56:42 -07:00
Sergei Barannikov c6acb4eb0f [SDAG] Add `getCALLSEQ_END` overload taking `uint64_t`s
All in-tree targets pass pointer-sized ConstantSDNodes to the
method. This overload reduced amount of boilerplate code a bit.  This
also makes getCALLSEQ_END consistent with getCALLSEQ_START, which
already takes uint64_ts.
2022-09-15 14:02:12 -04:00
Han-Kuan Chen dd53a0bb30 [RISCV] Lower BUILD_VECTOR to RISCVISD::VID_VL if it is floating-point type.
Differential Revision: https://reviews.llvm.org/D133688
2022-09-13 18:50:20 -07:00
Craig Topper 8d7e73effe [RISCV] Teach lowerVECTOR_SHUFFLE to recognize some shuffles as vnsrl.
Unary shuffles such as <0,2,4,6,8,10,12,14> or <1,3,5,7,9,11,13,15>
where half the elements are returned, can be lowered using vnsrl.

SelectionDAGBuilder lowers such shuffles as a build_vector of
extract_elements since the mask has less elements than the source.
To fix this, I've enable the extractSubvectorIsCheapHook to allow
DAGCombine to rebuild the shuffle using 2 extract_subvectors preceding
the shufffle.

I've gone very conservative on extractSubvectorIsCheapHook to minimize
test impact and match what we have test coverage for. This can be
improved in the future.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D133736
2022-09-13 11:07:11 -07:00
Alex Bradbury c44c1e9d3e [RISCV] Implement isMaskAndCmp0FoldingBeneficial hook
This hook is currently only used by CodeGenPrepare, which will sink *and
duplicate* an 'and' into a block that has an 'icmp 0' user of it if the
hook returns true.

This hook is less useful for RISC-V than for targets like AArch64 that
have a TBZ (test bit and branch if zero instruction), but may still be
profitable if Zbs is available and a BEXTI can be selected.

Conservatively, we return false even if Zbs is enabled for any masks
that fit in the ANDI immediate because it's possible the only use is a
branch on the result, and ANDI+BNEZ => BEXTI+BNEZ isn't a profitable
transformation.

Differential Revision: https://reviews.llvm.org/D131492
2022-09-13 18:54:00 +01:00
Alex Bradbury 547160848c [RISCV] Return true in hasBitTest when Zbs is enabled and update BEXTI pattern for resulting canonicalisation
As the Zbs extension includes bext[i] for bit extract, we can
unconditionally return true from this hook. This hook causes the DAG
combiner to perform the following canonicalisation:

  and (not (srl X, C)), 1 --> (and X, 1<<C) == 0
  and (srl (not X), C)), 1 --> (and X, 1<<C) == 0

As simply changing the hook causes a codegen regression, this patch also
modifies a BEXTI pattern to match this canonicalised form.

As BSETINVMask is now used for BEXT as well as BSET and BINV, it has
been renamed to the more generic SingleBitSetMask.

There is one codegen change in bittest.ll for bittest_31_i64 (NOT+BEXTI
rather than NOT+SRLIW). This is neutral in terms of code quality.

Differential Revision: https://reviews.llvm.org/D131482
2022-09-13 16:51:47 +01:00
Craig Topper 5224bae613 [RISCV] Fix a bug in i32 FP_TO_UINT_SAT lowering on RV64.
We use the saturating behavior of fcvt.wu.h/s/d but forgot to
take into account that fcvt.wu will sign extend the saturated
result. According to computeKnownBits a promoted FP_TO_UINT_SAT
is expected to zero extend the saturated value.

In many case the upper bits aren't be demanded so this wouldn't
be an issue. But if we computeKnownBits caused an AND to be removed
it would be a bug.

This patch inserts an AND during to zero the upper bits.

Unfortunately, this pessimizes code if we aren't able to tell if
the upper bits are demanded. To fix that we could custom type
promote the FP_TO_UINT_SAT with SEXT_INREG after it, but I'll
leave that for future work.

I haven't found a failure from this, I was revisiting the code to
add vector support and spotted it.

Differential Revision: https://reviews.llvm.org/D133746
2022-09-13 08:41:32 -07:00
Craig Topper 4186a49d79 [RISCV] Custom type legalize i32 loads by sign extending.
The default is to use extload which can become a zextload or
sextload if it is followed by an 'and' or sext_inreg.

Sometimes type legalization will introduce an 'and' from promoting
something like 'srl X, C' and a sext_inreg from from a setcc. The
'and' could be freely folded with the promoted 'srl' by using srliw,
but the sext_inreg can't be folded into a compare. DAG combiner
will see both of these choices and may decide to fold the 'and'
instead of the 'sext_inreg'. This forces the sext_inreg to become
a sext.w.

By picking sextload in the type legalizer we take this choice away.
Looking at spec2006 compiled with Zba and Zbb this appeared to be
net reduction in lines of code in the objdump disassembly output.

This is similar to what we do with i32 add/sub/mul/shl in
type legalization where we always emit a sext_inreg.

Reviewed By: asb

Differential Revision: https://reviews.llvm.org/D130397
2022-09-12 09:13:07 -07:00
Joe Loser 5e96cea1db [llvm] Use std::size instead of llvm::array_lengthof
LLVM contains a helpful function for getting the size of a C-style
array: `llvm::array_lengthof`. This is useful prior to C++17, but not as
helpful for C++17 or later: `std::size` already has support for C-style
arrays.

Change call sites to use `std::size` instead.

Differential Revision: https://reviews.llvm.org/D133429
2022-09-08 09:01:53 -06:00
Craig Topper 5d30565d80 [RISCV] Improve vector fround lowering by changing FRM.
This is a follow up to D133238 which did this for ceil/floor.

Reviewed By: arcbbb, frasercrmck

Differential Revision: https://reviews.llvm.org/D133335
2022-09-06 09:33:13 -07:00
Craig Topper f0332d12ae [RISCV] Improve vector fceil/ffloor lowering by changing FRM.
This adds new VFCVT pseudoinstructions that take a rounding mode operand. A custom inserter is used to insert additional instructions to change FRM around the
VFCVT.

Some of this is borrowed from D122860, but takes a somewhat different direction. We may migrate to that patch, but for now I was trying to keep this as independent from
RVV intrinsics as I could.

A followup patch will use this approach for FROUND too.

Still need to fix the cost model.

Reviewed By: arcbbb

Differential Revision: https://reviews.llvm.org/D133238
2022-09-05 19:03:44 -07:00
liqinweng c45810f810 [RISCV] When ISD::SETUGT && Imm == -1, has processed before lowering
When ISD::SETUGT && Imm == -1, has processed before lowering. Use assert replace it

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D132373
2022-09-01 15:38:16 +08:00
Craig Topper 6e0ae7e940 [RISCV] Slightly simplify coode in combineVWADD_W_VL_VWSUB_W_VL and combineMUL_VLToVWMUL_VL. NFC
Use computeMaxSignificantBits instead of ComputeNumSignBits. Create
APInt as part of call to MaskedValueIsZero instead of creating
a named temporary.
2022-08-31 15:02:03 -07:00
Craig Topper 1c334b306e [RISCV] Add more invertible setccs to tryDemorganOfBooleanCondition.
This builds on D132771 to invert (setlt 0, X) to (setlt X, 1) and
vice versa.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D132798
2022-08-29 12:23:03 -07:00
Craig Topper 9d12bb77f9 [RISCV] Apply DeMorgan to (beqz (and/or (seteq), (xor Z, 1))) to remove the xor.
We can rewrite to (bnez (or/and (setne), Z) is Z is 0/1.

Alternatively, we could canonicalize to (xor (or/and (setne), Z), 1)
even if there is no branch. The xor would not always get removed,
but it might enable other DeMorgan combines. I decided to be
conservative for this first patch and require the xor to be removed.

I have a couple other invertible setccs I will add in a follow up
patch.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D132771
2022-08-29 12:16:34 -07:00
Craig Topper 2f811a6c7f [VP][RISCV] Add vp.fabs intrinsic and RISC-V support.
Mostly just modeled after vp.fneg except there is a
"functional instruction" for fneg while fabs is always an
intrinsic.

Reviewed By: fakepaper56

Differential Revision: https://reviews.llvm.org/D132793
2022-08-29 09:32:06 -07:00
Craig Topper 41a3b5739b [RISCV] Teach combineDeMorganOfBoolean to handle (and (xor X, 1), (not Y)).
SimplifyDemandedBits tries to agressively turn xor immediates into -1
to match a 'not' instruction. In this case, because X is a boolean, the
upper bits of (xor X, 1) are known to be 0. Because this is an AND
instruction, that means those bits aren't demanded from the other
operand, and thus SimplifyDemandedBits can turn (xor Y, 1) to (not Y).

We need to detect that this has happened to enable the DeMorgan
optimization. To do this we allow one of the xors to use -1 when
the outer operation is And.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D132671
2022-08-25 10:55:45 -07:00
Craig Topper ec91d761ac [RISCV] Apply DeMorgan's law to (and/or (xor X, 1), (xor Y, 1)) if X and Y are 0/1.
This optimizes xors that appear due to legalizing setge/setle which
require an xor with 1. This reduces the number of xors and may
allow the xor to fold with a beqz or bnez.

Differential Revision: https://reviews.llvm.org/D132614
2022-08-25 08:49:30 -07:00
Simon Pilgrim f9de13232f [X86] Promote i8/i16 CTTZ (BSF) instructions and remove speculation branch
This patch adds a Type operand to the TLI isCheapToSpeculateCttz/isCheapToSpeculateCtlz callbacks, allowing targets to decide whether branches should occur on a type-by-type/legality basis.

For X86, this patch proposes to allow CTTZ speculation for i8/i16 types that will lower to promoted i32 BSF instructions by masking the operand above the msb (we already do something similar for i8/i16 TZCNT). This required a minor tweak to CTTZ lowering - if the src operand is known never zero (i.e. due to the promotion masking) we can remove the CMOV zero src handling.

Although BSF isn't very fast, most CPUs from the last 20 years don't do that bad a job with it, although there are some annoying passthrough EFLAGS dependencies. Additionally, now that we emit 'REP BSF' in most cases, we are tending towards assuming this will most likely be executed as a TZCNT instruction on any semi-modern CPU.

Differential Revision: https://reviews.llvm.org/D132520
2022-08-24 17:28:18 +01:00
Craig Topper 1a042dd6ed [RISCV] Optimize x <s -1 ? x : -1. Improve x >u 1 ? x : 1.
Similar to D132211, we can optimize x <s -1 ? x : -1 -> x <s 0 ? x : -1

Also improve the unsigned case from D132211 to use x != 0 which
will give a bnez instruction which might be compressible.

Differential Revision: https://reviews.llvm.org/D132252
2022-08-21 11:48:28 -07:00
LiaoChunyu 1fb87ace4d [RISCV] Optimize x > 1 ? x : 1 -> x > 0 ? x : 1
if x == 1,
  x > 1 ? x : 1  return x, which is also 1.
  x > 0 ? x : 1  return 1.

Reduce the number of load 1 instructions.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D132211
2022-08-21 20:26:39 +08:00
Craig Topper 6227b7ae31 [RISCV] Move xori creation for scalar setccs to lowering.
This patch enables expansion or custom lowering for some integer
condition codes so that any xori that is needed is created before
the last DAG combine to enable optimization.

I've seen cases where we end up with
(or (xori (setcc), 1), (xori (setcc), 1)) which we would ideally
convert to (xori (and (setcc), (setcc)), 1). This patch doesn't
accomplish that yet, but it should allow us to add DAG
combines as follow ups. Example https://godbolt.org/z/Y4qnvsq1b

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D131729
2022-08-19 13:51:53 -07:00
Craig Topper 961838cc13 [RISCV] Add passthru operand to RISCVISD::SETCC_VL.
Use it to the fix a bug in the fceil/ffloor lowerings. We were
setting the passthru to IMPLICIT_DEF before and using a mask
agnostic policy. This means where the incoming bits in
the mask were 0 they could be anything in the outgoing mask. We
want those bits in the outgoing mask to be 0. This means we need to
pass the input mask as the passthru.

This generates worse code because we are unable to allocate the
v0 register to the output due to an earlyclobber constraint. We
probably need a special TIED pseudoinstruction and probably custom
isel since you can't use V0 twice in the input pattern.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D132058
2022-08-19 08:53:44 -07:00
Craig Topper ba1f4cab44 [RISCV] Copy SDNodeFlags in lowerToScalableOp.
Reviewed By: arcbbb

Differential Revision: https://reviews.llvm.org/D132177
2022-08-18 20:42:59 -07:00
Craig Topper 550fab53e1 [RISCV] Fold (sub C, (xor (setcc), 1)) -> (add (setcc), C-1).
Extracted from D131729 where we handled C==0. It's now generalized
to more constants.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D132000
2022-08-17 09:50:08 -07:00
Craig Topper ab4cd154c6 [RISCV] Refactor performSUBCombine to prepare for D132000.
This refactors the code into a separate function with early returns.
D132000 adds an additional operation to the if/else that selects
NewLHS, but can otherwise share the rest of the code.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D132002
2022-08-17 09:50:08 -07:00
Craig Topper d27c147aaa [RISCV] Allow lowerSELECT to fold integer setcc with FP select.
We'd pick it up in DAG combine later even if we didn't handle it here.
No test changes because we get it in DAG combine anyway.
2022-08-16 21:28:54 -07:00
Craig Topper ba1fb54821 [RISCV] Reuse existing VT variable instead of calling getValueType() repeatedly. NFC 2022-08-16 19:56:55 -07:00
Craig Topper 53ce22e429 Recommit "[RISCV] Use setcc's original SDLoc when inverting it in performSUBCombine."
This time using N1 instead of N0 since N1 points to the original
setcc. This now affects scheduling as I expected.

Original commit message:
We change seteq<->setne but it doesn't change the semantics
of the setcc. We should keep original debug location. This is
consistent with visitXor in the generic DAGCombiner.
2022-08-16 15:51:07 -07:00
Craig Topper 2dfa4b6475 Revert "[RISCV] Use setcc's original SDLoc when inverting it in performSUBCombine."
This reverts commit 1380b21ceb.

I mixed up N0 and N1 and didn't do what I intended.
2022-08-16 15:47:01 -07:00
Craig Topper 1380b21ceb [RISCV] Use setcc's original SDLoc when inverting it in performSUBCombine.
We change seteq<->setne but it doesn't change the semantics
of the setcc. We should keep original debug location. This is
consistent with visitXor in the generic DAGCombiner.
2022-08-16 15:40:09 -07:00
Craig Topper b5a18de651 [RISCV] Remove C!=0 restriction from (sub C, (setcc x, y, eq/neq)) -> (add C-1, (setcc x, y, neq/eq)).
While (sub 0, X) can use x0 for the 0, I believe (add X, -1) is
still preferrable. (addi X, -1) can be compressed, sub with x0 on
the LHS is never compressible.
2022-08-16 14:49:52 -07:00
Craig Topper de6fd16971 [RISCV] Don't fold (sub C, (setcc x, y, eq/neq)) -> (add C-1, (setcc x, y, neq/eq)) if C-1 isn't simm12.
We still need to materialize the constant in a register and we
may not be removing all uses of the original constant so it may
increase code size.
2022-08-16 14:11:31 -07:00
Craig Topper 4184edc691 [RISCV] (sub C, (setcc x, y, eq/neq)) -> (add C-1, (setcc x, y, neq/eq)) fold for FP setcc.
This introduce an xori in some cases. I don't believe it was the
intention of the original patch. This was an accident because
nonan FP equality compares also use SETEQ/SETNE.

Also pass the correct type to getSetCCInverse.
2022-08-16 13:00:36 -07:00
Craig Topper c7e58836e8 [RISCV] Minor cleanups to performSUBCombine. NFC
-Rename variable NnzC -> N0C.
-Use SelectionDAG::getSetCC to reduce code.
-Use SDValue::getOperand instead of operator-> and SDNode::getOperand.

Initial steps to add another similar combine to this code.
2022-08-16 12:59:16 -07:00
Craig Topper 7a73ab5818 [RISCV] Enable isTruncateFree in SDAG for i64->i32 on rv64.
We have a good selection of W instructions, so promoting a truncated
value back to i64 is often free.

This appears to be a net code size reduction on SPECINT2006.

This has been split from D130397 as one of the patches needed to
complete that.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D131819
2022-08-15 08:32:51 -07:00
LiaoChunyu 99ef0ddea3 [RISCV] Fold (sub constant, (setcc x, y, eq/neq)) -> (add constant - 1, (setcc x, y, neq/eq))
(setcc x, y, eq/neq) are seqz, snez that set rd = 0/1.

addi is used to process immediate, which can save instructions for load immediate.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D131471
2022-08-13 20:37:57 +08:00
jacquesguan 0fe5f03eeb [RISCV][NFC] Use nested namespace definations.
Since we use C++17 now, we could use nested namespace definations to simplify code.

Differential Revision: https://reviews.llvm.org/D131751
2022-08-13 09:56:59 +08:00
Alex Bradbury 47b1f8362a [RISCV] Implement isUsedByReturnOnly TargetLowering hook in order to tailcall more libcalls
Prior to this patch, libcalls inserted by the SelectionDAG legalizer
could never be tailcalled. The eligibility of libcalls for tail calling
is is partly determined by checking TargetLowering::isInTailCallPosition
and comparing the return type of the libcall and the calleer.
isInTailCallPosition in turn calls TargetLowering::isUsedByReturnOnly
(which always returns false if not implemented by the target).

This patch provides a minimal implementation of
TargetLowering::isUsedByReturnOnly - enough to support tail calling
libcalls on hard float ABIs. Soft-float ABIs are left for a follow on
patch. libcall-tail-calls.ll also shows missed opportunities to tail
call integer libcalls, but this is due to issues outside of
the isUsedByReturnOnly hook.

Differential Revision: https://reviews.llvm.org/D131087
2022-08-10 10:50:29 +01:00
Nikita Popov f5ed0cb217 [RISCV] Add target feature to force-enable atomics
This adds a +forced-atomics target feature with the same semantics
as +atomics-32 on ARM (D130480). For RISCV targets without the +a
extension, this forces LLVM to assume that lock-free atomics
(up to 32/64 bits for riscv32/64 respectively) are available.

This means that atomic load/store are lowered to a simple load/store
(and fence as necessary), as these are guaranteed to be atomic
(as long as they're aligned). Atomic RMW/CAS are lowered to __sync
(rather than __atomic) libcalls. Responsibility for providing the
__sync libcalls lies with the user (for privileged single-core code
they can be implemented by disabling interrupts). Code using
+forced-atomics and -forced-atomics are not ABI compatible if atomic
variables cross the ABI boundary.

For context, the difference between __sync and __atomic is that the
former are required to be lock-free, while the latter requires a
shared global lock provided by a shared object library. See
https://llvm.org/docs/Atomics.html#libcalls-atomic for a detailed
discussion on the topic.

This target feature will be used by Rust's riscv32i target family
to support the use of atomic load/store without atomic RMW/CAS.

Differential Revision: https://reviews.llvm.org/D130621
2022-08-09 16:04:46 +02:00
Fangrui Song de9d80c1c5 [llvm] LLVM_FALLTHROUGH => [[fallthrough]]. NFC
With C++17 there is no Clang pedantic warning or MSVC C5051.
2022-08-08 11:24:15 -07:00
Kazu Hirata a2d4501718 [llvm] Fix comment typos (NFC) 2022-08-07 00:16:14 -07:00
Craig Topper 12a1ca9c42 [RISCV] Relax another one use restriction in performSRACombine.
When folding (sra (add (shl X, 32), C1), 32 - C) -> (shl (sext_inreg (add X, C1), i32), C)
it's possible that the add is used by multiple sras. We should
allow the combine if all the SRAs will eventually be updated.

After transforming all of the sras, the shls will share a single
(sext_inreg (add X, C1), i32).

This pattern occurs if an sra with 32 is used as index in multiple
GEPs with different scales. The shl from the GEPs will be combined
with the sra before we get a chance to match the sra pattern.
2022-08-04 14:32:31 -07:00
Craig Topper a2de12c987 [RISCV] Relax a one use restriction performSRACombine
When folding (sra (add (shl X, 32), C1), 32 - C) -> (shl (sext_inreg (add X, C1), C)
ignore the use count on the (shl X, 32).

The sext_inreg after the transform is free. So we're only making
2 new instructions, the add and the shl. So we only need to be
concerned with replacing the original sra+add. The original shl
can have other uses. This helps if there are multiple different
constants being added to the same shl.
2022-08-04 11:25:08 -07:00
Craig Topper 53d560b22f [RISCV] Prevent infinite loop after D129980.
D129980 converts (seteq (i64 (and X, 0xffffffff)), C1) into
(seteq (i64 (sext_inreg X, i32)), C1). If bit 31 of X is 0, it
will be turned back into an 'and' by SimplifyDemandedBits which
can cause an infinite loop.

To prevent this, check if bit 31 is 0 with computeKnownBits before
doing the transformation.

Fixes PR56905.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D131113
2022-08-03 15:19:07 -07:00
David Truby 9a976f3661 [llvm] Always use TargetConstant for FP_ROUND ISD Nodes
This patch ensures consistency in the construction of FP_ROUND nodes
such that they always use ISD::TargetConstant instead of ISD::Constant.

This additionally fixes a bug in the AArch64 SVE backend where patterns
were matching against TargetConstant nodes and sometimes failing when
passed a Constant node.

Reviewed By: paulwalker-arm

Differential Revision: https://reviews.llvm.org/D130370
2022-08-03 14:02:11 +01:00
Alex Bradbury 28f12a09ae [RISCV] Teach ComputeNumSignBitsForTargetNode about masked atomic intrinsics
An unnecessary sext.w is generated when masking the result of the
riscv_masked_cmpxchg_i64 intrinsic. Implementing handling of the
intrinsic in ComputeNumSignBitsForTargetNode allows it to be removed.

Although this isn't a particularly important optimisation, removing the
sext.w simplifies implementation of an additional cmpxchg-related
optimisation in D130192.

Although I can't produce a test with different codegen for the other
atomics intrinsics, these are added as well for completeness.

Differential Revision: https://reviews.llvm.org/D130191
2022-08-03 13:41:58 +01:00
Fraser Cormack 646e2f4803 [VP] Rename VP int<->float conversion ISD opcodes
These should be named like the non-VP versions for consistency.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D130967
2022-08-03 10:04:38 +01:00
wanglian e208bab55f [RISCV][NFC] Use defined variable instead some code.
Reviewed By: asb

Differential Revision: https://reviews.llvm.org/D130687
2022-08-02 16:26:33 +08:00
Lorenzo Albano 71b7c03fd6 [RISCV][VP] Custom lower VP_STRIDED_LOAD and VP_STRIDED_STORE
Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D121113
2022-08-01 09:23:45 -07:00
Craig Topper d21b315360 [RISCV] Remove vmerges from vector ceil, floor, trunc lowering.
Use masked operations to suppress spurious exception bits being
set in fflags. Unfortunately, doing this adds extra copies.
2022-07-30 10:58:41 -07:00
Craig Topper a23f07fb1d [RISCV] Add merge operands to more RISCVISD::*_VL opcodes.
This adds a merge operand to all of the binary _VL nodes. Including
integer and widening. They all share multiclasses in tablegen
so doing them all at once was easiest.

I plan to use FADD_VL in an upcoming patch. The rest are just for
consistency to keep tablegen working.

This does reduce the isel table size by about 25k so that's nice.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D130816
2022-07-30 10:26:38 -07:00
Craig Topper 9bf305fe2b [RISCV] Swap the merge and mask operand order for VRGATHER*_VL and FCOPYSIGN_VL nodes.
Based on review feedback from D130816.
2022-07-30 09:57:05 -07:00
Craig Topper 2750873dfe [RISCV] Update lowerFROUND to use masked instructions.
This avoids a vmerge at the end and avoids spurious fflags updates.
This isn't used for constrained intrinsic so we technically don't have
to worry about fflags, but it doesn't cost much to support it.

To support I've extend our FCOPYSIGN_VL node to support a passthru
operand. Similar to what was done for VRGATHER*_VL nodes.

I plan to do a similar update for trunc, floor, and ceil.

Reviewed By: reames, frasercrmck

Differential Revision: https://reviews.llvm.org/D130659
2022-07-28 10:05:19 -07:00
Craig Topper 89173dee71 [RISCV] Remove duplicate code. NFC
The same operations are part of `FloatingPointVecReduceOps` a little
bit earlier.
2022-07-28 10:05:19 -07:00
Craig Topper 1d1d8d6025 [RISCV] Reorder code in lowerFROUND to make the diff in D130659 cleaner. NFC 2022-07-27 17:13:04 -07:00
Craig Topper 98647330bf [RISCV] Add merge operand to RISCVISD::FCOPYSIGN_VL.
Similar to what was done for VRGATHER*_VL recently.

This will be used in D130659.
2022-07-27 15:25:34 -07:00
LiaoChunyu bf4f9a468a [RISCV]Enable isIntDivCheap when attribute is minsize
Don't expand divisions by constants when attribute is minsize.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D130543
2022-07-27 18:22:51 +08:00
Craig Topper 45944e7cf4 [RISCV] Refactor translateSetCCForBranch to prepare for D130508. NFC.
D130508 handles more constants than just 1 or -1. We need to extract
the constant instead of relying isOneConstant or isAllOnesConstant.
2022-07-25 15:54:54 -07:00
jacquesguan d8800ead62 [RISCV] Scalarize binop followed by extractelement.
This patch adds shouldScalarizeBinop to RISCV target in order to convert an extract element of a vector binary operation into an extract element followed by a scalar binary operation.

Differential Revision: https://reviews.llvm.org/D129545
2022-07-25 17:23:31 +08:00
Craig Topper 9adc00a9d0 [RISCV] Add a continue to reduce nesting. NFC 2022-07-23 17:36:12 -07:00
Kazu Hirata 1cc7f5bede Use static_assert instead of assert (NFC)
Identified with misc-static-assert.
2022-07-23 09:22:27 -07:00
Craig Topper add17fc8e4 [RISCV] Combine (select_cc (srl (and X, 1<<C), C), 0, eq/ne, true, fale)
(srl (and X, 1<<C), C) is the form we receive for testing bit C.
An earlier combine removed the setcc so it wasn't there to match
when we created the SELECT_CC. This doesn't happen for BR_CC because
generic DAG combine rebuilds the setcc if it is used by BRCOND.

We can shift X left by XLen-1-C to put the bit to be tested in the
MSB, and use a signed compare with 0 to test the MSB.
2022-07-20 22:32:11 -07:00
Craig Topper 7dda6c71b1 [RISCV] Refactor the common combines for SELECT_CC and BR_CC into a helper function.
The only difference between the combines were the calls to getNode
that include the true/false values for SELECT_CC or the chain
and branch target for BR_CC.

Wrap the rest of the code into a helper that reads LHS, RHS, and
CC and outputs new values and a bool if a new node needs to be
created.
2022-07-20 21:18:07 -07:00
Craig Topper 8983db15a3 [RISCV] Optimize (brcond (seteq (and X, 1 << C), 0))
If C > 10, this will require a constant to be materialized for the
And. To avoid this, we can shift X left by XLen-1-C bits to put the
tested bit in the MSB, then we can do a signed compare with 0 to
determine if the MSB is 0 or 1. Thanks to @reames for the suggestion.

I've implemented this inside of translateSetCCForBranch which is
called when setcc+brcond or setcc+select is converted to br_cc or
select_cc during lowering. It doesn't make sense to do this for
general setcc since we lack a sgez instruction.

I've tested bit 10, 11, 31, 32, 63 and a couple bits betwen 11 and 31
and between 32 and 63 for both i32 and i64 where applicable. Select
has some deficiencies where we receive (and (srl X, C), 1) instead.
This doesn't happen for br_cc due to the call to rebuildSetCC in the
generic DAGCombiner for brcond. I'll explore improving select in a
future patch.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D130203
2022-07-20 18:40:49 -07:00
ksyx 3198364e6e [RISCV][Clang] Add support for Zmmul extension
This patch implements recently ratified extension Zmmul, a subextension
of M (Integer Multiplication and Division) consisting only
multiplication part of it.

Differential Revision: https://reviews.llvm.org/D103313
Reviewed By: craig.topper, jrtc27, asb
2022-07-18 20:26:08 -04:00