Commit Graph

1997 Commits

Author SHA1 Message Date
Yeting Kuo 1b56b2b267 [RISCV] Transform VMERGE_VVM_<LMUL>_TU with all ones mask to VADD_VI_<LMUL>_TU.
The transformation is benefit because vmerge.vvm always needs mask operand but
vadd.vi may not.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D133255
2022-09-14 10:01:37 +08:00
Han-Kuan Chen dd53a0bb30 [RISCV] Lower BUILD_VECTOR to RISCVISD::VID_VL if it is floating-point type.
Differential Revision: https://reviews.llvm.org/D133688
2022-09-13 18:50:20 -07:00
Philip Reames 09d73fe8cd [RISCV] Add MIR comments for VecPolicy operands
Analogous to what we already do for SEW operands, aimed at making the resulting MIR readable by a human.
2022-09-13 15:36:33 -07:00
Philip Reames c97952ee1e [RISCV] Split vmerge peephole tests so they can be auto-gened
Per original review, only one test needed the MIR output checks.  Copy that to it's own file, and use auto-gen for all three files.
2022-09-13 11:44:50 -07:00
Craig Topper 8d7e73effe [RISCV] Teach lowerVECTOR_SHUFFLE to recognize some shuffles as vnsrl.
Unary shuffles such as <0,2,4,6,8,10,12,14> or <1,3,5,7,9,11,13,15>
where half the elements are returned, can be lowered using vnsrl.

SelectionDAGBuilder lowers such shuffles as a build_vector of
extract_elements since the mask has less elements than the source.
To fix this, I've enable the extractSubvectorIsCheapHook to allow
DAGCombine to rebuild the shuffle using 2 extract_subvectors preceding
the shufffle.

I've gone very conservative on extractSubvectorIsCheapHook to minimize
test impact and match what we have test coverage for. This can be
improved in the future.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D133736
2022-09-13 11:07:11 -07:00
Alex Bradbury 547160848c [RISCV] Return true in hasBitTest when Zbs is enabled and update BEXTI pattern for resulting canonicalisation
As the Zbs extension includes bext[i] for bit extract, we can
unconditionally return true from this hook. This hook causes the DAG
combiner to perform the following canonicalisation:

  and (not (srl X, C)), 1 --> (and X, 1<<C) == 0
  and (srl (not X), C)), 1 --> (and X, 1<<C) == 0

As simply changing the hook causes a codegen regression, this patch also
modifies a BEXTI pattern to match this canonicalised form.

As BSETINVMask is now used for BEXT as well as BSET and BINV, it has
been renamed to the more generic SingleBitSetMask.

There is one codegen change in bittest.ll for bittest_31_i64 (NOT+BEXTI
rather than NOT+SRLIW). This is neutral in terms of code quality.

Differential Revision: https://reviews.llvm.org/D131482
2022-09-13 16:51:47 +01:00
Craig Topper 5224bae613 [RISCV] Fix a bug in i32 FP_TO_UINT_SAT lowering on RV64.
We use the saturating behavior of fcvt.wu.h/s/d but forgot to
take into account that fcvt.wu will sign extend the saturated
result. According to computeKnownBits a promoted FP_TO_UINT_SAT
is expected to zero extend the saturated value.

In many case the upper bits aren't be demanded so this wouldn't
be an issue. But if we computeKnownBits caused an AND to be removed
it would be a bug.

This patch inserts an AND during to zero the upper bits.

Unfortunately, this pessimizes code if we aren't able to tell if
the upper bits are demanded. To fix that we could custom type
promote the FP_TO_UINT_SAT with SEXT_INREG after it, but I'll
leave that for future work.

I haven't found a failure from this, I was revisiting the code to
add vector support and spotted it.

Differential Revision: https://reviews.llvm.org/D133746
2022-09-13 08:41:32 -07:00
Craig Topper b14b0b5213 [RISCV] Add test cases with result of fp_to_s/uint_sat sign/zero-extended from i32 to i64. NFC
I believe the result for fp_to_uint_sat is incorrect for this case.
2022-09-12 20:27:25 -07:00
Craig Topper 00d36f61ad [RISCV] Move some vector test into the rvv test directory. NFC 2022-09-12 16:36:14 -07:00
Craig Topper 38ffa2bb96 [LegalizeTypes] Improve splitting for urem/udiv by constant for some constants.
For remainder:
If (1 << (Bitwidth / 2)) % Divisor == 1, we can add the high and low halves
together and use a (Bitwidth / 2) urem. If (BitWidth /2) is a legal integer
type, this urem will be expand by DAGCombiner using multiply by magic
constant. We do have to take into account that adding high and low
together can produce a carry, making it a (BitWidth / 2)+1 bit number.
So we need to also add back in the carry from the first addition.

For division:
We can use the above trick to compute the remainder, subtract that
remainder from the dividend, then multiply by the multiplicative
inverse of the Divisor modulo (1 << BitWidth).

This is based on the section "Remainder by Summing Digits" in
Hacker's delight.

The remainder trick is similar to a trick you may have learned for
determining if a decimal number is divisible by 3. You can add all the
digits together and see if the sum is divisible by 3. If you're not sure
if the sum is divisible by 3, you can add its digits together. This
can be repeated until you have a single decimal digit. If that digit
is 3, 6, or 9, then the original number is divisible by 3. This works
because 10 % 3 == 1.

gcc already does this same trick. There are additional tricks gcc
does urem as well as srem, udiv, and sdiv that I plan to add in
future patches.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D130862
2022-09-12 10:34:52 -07:00
Craig Topper 4186a49d79 [RISCV] Custom type legalize i32 loads by sign extending.
The default is to use extload which can become a zextload or
sextload if it is followed by an 'and' or sext_inreg.

Sometimes type legalization will introduce an 'and' from promoting
something like 'srl X, C' and a sext_inreg from from a setcc. The
'and' could be freely folded with the promoted 'srl' by using srliw,
but the sext_inreg can't be folded into a compare. DAG combiner
will see both of these choices and may decide to fold the 'and'
instead of the 'sext_inreg'. This forces the sext_inreg to become
a sext.w.

By picking sextload in the type legalizer we take this choice away.
Looking at spec2006 compiled with Zba and Zbb this appeared to be
net reduction in lines of code in the objdump disassembly output.

This is similar to what we do with i32 add/sub/mul/shl in
type legalization where we always emit a sext_inreg.

Reviewed By: asb

Differential Revision: https://reviews.llvm.org/D130397
2022-09-12 09:13:07 -07:00
Alex Bradbury 51ae462447 [RISCV] Add the GlobalMerge pass (disabled by default)
Split out from D129178, this just adds the GlobalMerge tests (other than global-merge-minsize.ll which is testing a specific configuration of the pass when it's enabled) and exposes `-riscv-enable-global-merge` and //doesn't enable it by default//.

Note that the comment "// FIXME: Unify control over GlobalMerge." is copied from the Arm and AArch64 backends, which expose the same flag. Presumably the author is imagining some later refactoring that provides a target-independent flag.

Reviewed By: craig.topper, reames, hiraditya

Differential Revision: https://reviews.llvm.org/D130481
2022-09-08 18:40:38 -07:00
Eric Wang d8a2d3f7d4 [NFC][Regalloc] Introduce the RegAllocPriorityAdvisorAnalysis
This patch introduces the priority analysis and the priority advisor,
the default implementation, and the scaffolding for introducing the
other implementations of the advisor.

Reviewed By: mtrofin

Differential Revision: https://reviews.llvm.org/D132835
2022-09-08 07:50:03 -07:00
Philip Reames a4a29438f4 [RISCV][MC] Add minimal support for Ztso extension
This is a minimalist implementation which simply adds the extension (in the experimental namespace since its not ratified), and wires up the setting of the required ELF header flag. Future changes will include codegen changes to exploit the stronger memory model.

This is intended to implement v0.1 of the proposed specification which can be found in Chapter 25 of https://github.com/riscv/riscv-isa-manual/releases/download/draft-20220723-10eea63/riscv-spec.pdf.

Differential Revision: https://reviews.llvm.org/D133239
2022-09-07 09:30:57 -07:00
Craig Topper 5d30565d80 [RISCV] Improve vector fround lowering by changing FRM.
This is a follow up to D133238 which did this for ceil/floor.

Reviewed By: arcbbb, frasercrmck

Differential Revision: https://reviews.llvm.org/D133335
2022-09-06 09:33:13 -07:00
Matthias Gehre af3758d678 Fix remaining test failures for "[llvm/CodeGen] Enable the ExpandLargeDivRem pass for X86, Arm and AArch64" 2022-09-06 16:38:43 +01:00
Craig Topper f0332d12ae [RISCV] Improve vector fceil/ffloor lowering by changing FRM.
This adds new VFCVT pseudoinstructions that take a rounding mode operand. A custom inserter is used to insert additional instructions to change FRM around the
VFCVT.

Some of this is borrowed from D122860, but takes a somewhat different direction. We may migrate to that patch, but for now I was trying to keep this as independent from
RVV intrinsics as I could.

A followup patch will use this approach for FROUND too.

Still need to fix the cost model.

Reviewed By: arcbbb

Differential Revision: https://reviews.llvm.org/D133238
2022-09-05 19:03:44 -07:00
Craig Topper 893f5e95e2 [RISCV] Improve isel of AND with shiftedMask containing 32 leading zeros and some trailing zeros.
We can use srliw to shift out the trailing bits and slli to shift
back in zeros. The sign extend of srliw will 0 the upper 32 bits
since we will be shifting a 0 into bit 31.
2022-08-30 12:22:46 -07:00
wanglian e2bb9774b1 [LegalizeTypes] Support widen result for VECTOR_REVERSE.
Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D132359
2022-08-30 10:01:26 +08:00
Craig Topper e25eb61d03 [RISCV] Enable (srl (and X, C2), C) to form SRLIW in more cases.
Don't require the AND has one use and don't depend on
targetShrinkDemandedConstant turning C2 into 0xffffffff. Instead,
check that the constant is 0xffffffff after replacing any bits
that will be shifted out with 1s.

Another way to fix this might be to prevent SimplifyDemandedBits
from destroying the ANDI after type legalization using
targetShrinkDemandedBits. That would prevent the CSE that created
this mess. targetShrinkDemandedBits is currently only enable after
legalize ops. Quick experiment shows we can't just change when it
runs, we would need to try a different heuristic for post type
legalization.
2022-08-29 15:52:08 -07:00
Craig Topper 5bd92d21b0 [RISCV] Add test for failure to use ANDI and SRLIW due to SimplifyDemandedBits. 2022-08-29 15:47:55 -07:00
Craig Topper 0fbe71e91f [RISCV] Use hasAllWUsers to recover ANDI.
SimplifyDemandedBits can 0 the upper bits and targetShrinkDemandedConstant
isn't alway able to recover it.

At least part of that may be because targetShrinkDemandedConstant
only runs in the last DAGCombine. Might be worth seeing what happens
if we move it post type legalization.
2022-08-29 14:11:09 -07:00
Craig Topper 7c17b0afb1 [RISCV] Add test case for missed opportunity to use ANDI.
Immediate was messed up by SimplfyDemandedBits.
2022-08-29 14:02:07 -07:00
Craig Topper 1c334b306e [RISCV] Add more invertible setccs to tryDemorganOfBooleanCondition.
This builds on D132771 to invert (setlt 0, X) to (setlt X, 1) and
vice versa.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D132798
2022-08-29 12:23:03 -07:00
Craig Topper 34e83525aa [RISCV] Pre-commit tests for D132798. NFC 2022-08-29 12:20:36 -07:00
Craig Topper 9d12bb77f9 [RISCV] Apply DeMorgan to (beqz (and/or (seteq), (xor Z, 1))) to remove the xor.
We can rewrite to (bnez (or/and (setne), Z) is Z is 0/1.

Alternatively, we could canonicalize to (xor (or/and (setne), Z), 1)
even if there is no branch. The xor would not always get removed,
but it might enable other DeMorgan combines. I decided to be
conservative for this first patch and require the xor to be removed.

I have a couple other invertible setccs I will add in a follow up
patch.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D132771
2022-08-29 12:16:34 -07:00
Craig Topper 2f811a6c7f [VP][RISCV] Add vp.fabs intrinsic and RISC-V support.
Mostly just modeled after vp.fneg except there is a
"functional instruction" for fneg while fabs is always an
intrinsic.

Reviewed By: fakepaper56

Differential Revision: https://reviews.llvm.org/D132793
2022-08-29 09:32:06 -07:00
Craig Topper e0a9da2562 [RISCV] Add Uses=[FRM] and mayRaiseFPException to VF(N/W)CVT instructions.
Reviewed By: arcbbb, kito-cheng

Differential Revision: https://reviews.llvm.org/D132792
2022-08-29 09:26:33 -07:00
Yeting Kuo abf0416328 [RISCV] Merge vmerge.vvm and unmasked intrinsic with VLMAX vector length.
The motivation of this patch is to lower the IR pattern
(vp.merge mask, (add x, y), false, vl) to
(PseudoVADD_VV_<LMUL>_MASK false, x, y, mask, vl).

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D131841
2022-08-29 11:44:51 +08:00
jacquesguan 1a1c59f995 [RISCV][NFC] Refactor fadd test to match the code.
Change fadd test case in D122563 to match the fold base case.

Differential Revision: https://reviews.llvm.org/D132722
2022-08-29 10:45:29 +08:00
Alexey Baturo e3485345d3 [RISC-V][HWASAN] Add support for lowering HWASAN intrinsic for RISC-V
Reviewed By: vitalybuka

Differential Revision: https://reviews.llvm.org/D131343
2022-08-28 21:22:13 +03:00
Craig Topper faf373e526 [RISCV] Pre-commit tests for D132771. NFC 2022-08-26 16:41:01 -07:00
Philip Reames b45a262679 [RISCV] Enable fixed length vectors and loop vectorization with same
This change enables the use of RISCV's variable length vector registers for fixed length vectors in the IR, and implicitly enables various IR transforms which generate fixed length vectors if legal (e.g. LoopVectorize). Specifically, this enables fixed length vectors which are known to be inbounds of the underlying variable hardware size.

For context, remember that the +V extension provides a minimum VLEN of 128. The embedded variants provide lower minimums. The analogy here is essentially vectorizing for SSE on a machine which may or may not include AVX2/AVX512. We won't get full utilization by default, but we will get some benefit. And of course, with an explicit mcpu we can vectorize to the exact target hardware.

The LV impact is mostly related to vectorizer robustness. In cases we haven't yet fully implemented scalable vectorization support, we can fall back to fixed length vectorization.

SLP has been disabled for now, even when fixed vectors are enabled.  See a310637 and associated review.  There are a few addiitional code quality issues which need worked through before turning SLP on would be reasonable.

Differential Revision: https://reviews.llvm.org/D131508
2022-08-26 14:45:23 -07:00
Yunze Zhu 3846e3970f [RISCV] Generate correct ELF abi flag when empty .ll file has target-abi attribute
In patch D121183, target abi is get from .ll file's target-abi
attribute and set in RISCVAsmPrinter::emitFunctionEntryLabel
function. In https://github.com/llvm/llvm-project/issues/57242,
an api mismatch error may be caused by failing to call function
RISCVAsmPrinter::emitFunctionEntryLabel to set target-abi to
correct one when the .ll is empty or a module has no function.

This patch move setting target-abi part to function
RISCVAsmPrinter::emitStartOfAsmFile, make sure all .ll file and
module in LTO read target-abi from module flag and set, with or
without function.

Signed-off-by: xiaojing.zhang <xiaojing.zhang@xcalibyte.com>
Signed-off-by: jianxin.lai <jianxin.lai@xcalibyte.com>

Reviewed By: luismarques

Differential Revision: https://reviews.llvm.org/D132204
2022-08-26 14:39:39 +08:00
jacquesguan b28b54f8fc [RISCV][NFC] Use common prefix to simplify test.
Differential Revision: https://reviews.llvm.org/D132637
2022-08-26 10:39:41 +08:00
Craig Topper 41a3b5739b [RISCV] Teach combineDeMorganOfBoolean to handle (and (xor X, 1), (not Y)).
SimplifyDemandedBits tries to agressively turn xor immediates into -1
to match a 'not' instruction. In this case, because X is a boolean, the
upper bits of (xor X, 1) are known to be 0. Because this is an AND
instruction, that means those bits aren't demanded from the other
operand, and thus SimplifyDemandedBits can turn (xor Y, 1) to (not Y).

We need to detect that this has happened to enable the DeMorgan
optimization. To do this we allow one of the xors to use -1 when
the outer operation is And.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D132671
2022-08-25 10:55:45 -07:00
Craig Topper ec91d761ac [RISCV] Apply DeMorgan's law to (and/or (xor X, 1), (xor Y, 1)) if X and Y are 0/1.
This optimizes xors that appear due to legalizing setge/setle which
require an xor with 1. This reduces the number of xors and may
allow the xor to fold with a beqz or bnez.

Differential Revision: https://reviews.llvm.org/D132614
2022-08-25 08:49:30 -07:00
ZHU Zijia 395bda933f [RISCV][test] Update branch-relaxation.ll with update_llc_test_checks.py [NFC]
Update `llvm/test/CodeGen/RISCV/branch-relaxation.ll` with
`update_llc_test_checks.py`, according to
https://reviews.llvm.org/D130560#3746417:

>>! In D130560#3746417, @luismarques wrote:
>>>! In D130560#3746379, @luismarques wrote:
>> The tests don't seem to have been properly updated with
>> `update_llc_test_checks.py`.
>> `llvm/test/CodeGen/RISCV/branch-relaxation.ll` contains RV64 RUN
>> lines but the corresponding CHECK lines are missing in
>> some functions.
>
> Looking more closely at this, I guess you tried to only include the
> `CHECK-RV64` and `CHECK-RV32` checks when relevant. That's a good
> instinct but I guess it goes a bit against how we normally use
> `update_llc_test_checks.py`. My understanding of the trade-off of
> using that tool is that the test updates are much easier, even if
> sometimes the CHECKs aren't as tight as something more tailormade.

Reviewed By: luismarques

Differential Revision: https://reviews.llvm.org/D132625
2022-08-25 17:01:34 +08:00
ZHU Zijia d098e45eed Revert "[RISCV][test] Update branch-relaxation.ll with update_llc_test_checks.py [NFC]"
This reverts commit c374789fba.
2022-08-25 17:01:33 +08:00
ZHU Zijia c374789fba [RISCV][test] Update branch-relaxation.ll with update_llc_test_checks.py [NFC]
Update `llvm/test/CodeGen/RISCV/branch-relaxation.ll` with
`update_llc_test_checks.py`, according to
https://reviews.llvm.org/D130560#3746417:

>>! In D130560#3746417, @luismarques wrote:
>>>! In D130560#3746379, @luismarques wrote:
>> The tests don't seem to have been properly updated with
>> `update_llc_test_checks.py`.
>> `llvm/test/CodeGen/RISCV/branch-relaxation.ll` contains RV64 RUN
>> lines but the corresponding CHECK lines are missing in
>> some functions.
>
> Looking more closely at this, I guess you tried to only include the
> `CHECK-RV64` and `CHECK-RV32` checks when relevant. That's a good
> instinct but I guess it goes a bit against how we normally use
> `update_llc_test_checks.py`. My understanding of the trade-off of
> using that tool is that the test updates are much easier, even if
> sometimes the CHECKs aren't as tight as something more tailormade.
2022-08-25 16:54:48 +08:00
Craig Topper ecde303690 [RISCV] Pre-commit tests for D132614. NFC 2022-08-24 15:31:17 -07:00
Kito Cheng 96c85f80f0 [RISCV] Don't outline pcrel-lo operand.
This issue is found by build llvm-testsuite with `-Oz`, linker will complain
`dangerous relocation: %pcrel_lo missing matching %pcrel_hi` and that
turn out cause by we outlined pcrel-lo, but leave pcrel-hi there, that's
not problem in general, but the problem is they put into different section, they
pcrel-hi and pcrel-lo pair (e.g. AUIPC+ADDI) *MUST* put be present in same
section due to the implementation.

Outlined function will put into .text name, but the source functions
will put in .text.<function-name> if function-section is enabled or the
function has `comdat` attribute.

There are few solutions for this issue:
1. Always disallow instructions with pcrel-lo flags.
2. Only disallow instructions with pcrel-lo flags that when function-section is
   enabled or this function has `comdat` attribute.
3. Check the corresponding instruction with pcrel-high also included in the
   outlining candidate sequence or not, and allow that only when pcrel-high is
   included in the outlining candidate.

First one is most conservative, that might lose some optimization
opportunities, and second one could save those opportunities, and last
one is hard to implement, and don't have any benefits since pcrel-high
are using different label even accessing same symbol.

Use custom section name might also cause this problem, but that already
filtered by RISCVInstrInfo::isFunctionSafeToOutlineFrom.

Reviewed By: luismarques

Differential Revision: https://reviews.llvm.org/D132528
2022-08-24 21:47:46 +08:00
Kito Cheng f054cbfe91 [RISCV] Precommit test for machine outliner issue for instruction with pcrel-lo.
Differential Revision: https://reviews.llvm.org/D132527
2022-08-24 21:16:23 +08:00
ZHU Zijia 9c85382ade [RISCV] Handle register spill in branch relaxation
In branch relaxation pass, `j`'s with offset over 1MiB will be relaxed
to `jump` pseudo-instructions.

This patch allocates a stack slot for functions with a size greater than
1MiB. If the register scavenger cannot find a scratch register for
`jump`, spill a register to the slot before the jump and restore it
after the jump.

.mbb:
        foo
        j       .dest_bb
        bar
        bar
        bar
.dest_bb:
        baz

The above code will be relaxed to the following code.

.mbb:
        foo
        sd      s11, 0(sp)
        jump    .restore_bb, s11
        bar
        bar
        bar
        j       .dest_bb
.restore_bb:
        ld      s11, 0(sp)
.dest_bb:
        baz

Depends on D129999.

Reviewed By: StephenFan

Differential Revision: https://reviews.llvm.org/D130560
2022-08-24 13:27:56 +08:00
Shao-Ce SUN 7167a4207e [RISCV] Add zihintntl instructions
Reviewed By: kito-cheng

Differential Revision: https://reviews.llvm.org/D121670
2022-08-22 12:06:30 +08:00
Craig Topper 1a042dd6ed [RISCV] Optimize x <s -1 ? x : -1. Improve x >u 1 ? x : 1.
Similar to D132211, we can optimize x <s -1 ? x : -1 -> x <s 0 ? x : -1

Also improve the unsigned case from D132211 to use x != 0 which
will give a bnez instruction which might be compressible.

Differential Revision: https://reviews.llvm.org/D132252
2022-08-21 11:48:28 -07:00
LiaoChunyu 1fb87ace4d [RISCV] Optimize x > 1 ? x : 1 -> x > 0 ? x : 1
if x == 1,
  x > 1 ? x : 1  return x, which is also 1.
  x > 0 ? x : 1  return 1.

Reduce the number of load 1 instructions.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D132211
2022-08-21 20:26:39 +08:00
Lorenzo Albano 98117fe208 [VP] Add splitting for VP_STRIDED_STORE and VP_STRIDED_LOAD
Following the comment's thread of D117235, I added checks for the widening + splitting case, which also causes a split with one of the resulting vectors to be empty. Due to the same issues described in that same thread, the `fixed-vectors-strided-store.ll` test is missing the widening + splitting case, while the same case in the `strided-vpload.ll` test requires to manually split the loaded vector.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D121784
2022-08-19 18:15:56 -07:00
Craig Topper 6227b7ae31 [RISCV] Move xori creation for scalar setccs to lowering.
This patch enables expansion or custom lowering for some integer
condition codes so that any xori that is needed is created before
the last DAG combine to enable optimization.

I've seen cases where we end up with
(or (xori (setcc), 1), (xori (setcc), 1)) which we would ideally
convert to (xori (and (setcc), (setcc)), 1). This patch doesn't
accomplish that yet, but it should allow us to add DAG
combines as follow ups. Example https://godbolt.org/z/Y4qnvsq1b

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D131729
2022-08-19 13:51:53 -07:00
Craig Topper 961838cc13 [RISCV] Add passthru operand to RISCVISD::SETCC_VL.
Use it to the fix a bug in the fceil/ffloor lowerings. We were
setting the passthru to IMPLICIT_DEF before and using a mask
agnostic policy. This means where the incoming bits in
the mask were 0 they could be anything in the outgoing mask. We
want those bits in the outgoing mask to be 0. This means we need to
pass the input mask as the passthru.

This generates worse code because we are unable to allocate the
v0 register to the output due to an earlyclobber constraint. We
probably need a special TIED pseudoinstruction and probably custom
isel since you can't use V0 twice in the input pattern.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D132058
2022-08-19 08:53:44 -07:00
Craig Topper c9a41fe60a [RISCV] Prefer vnsrl.wi v8, v8, 0 over vnsrl.wx v8, v8, x0.
I have a couple data points that some microarchitectures prefer
the immediate 0 over x0. Does anyone know of microarchitectures
where the opposite is true?

Unfortunately, this is different than the vncvt.x.x.w alias
from the spec. Perhaps the alias was poorly chosen if x0 isn't
as optimal as immediate 0 on all microarchitectures.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D132041
2022-08-19 08:40:17 -07:00
Archibald Elliott 3a729069e4 [IR] Update llvm.prefetch to match docs
The current llvm.prefetch intrinsic docs state "The rw, locality and
cache type arguments must be constant integers."

This change:
- Makes arg 3 (cache type) an ImmArg
- Improves the verifier error messages to reference the incorrect
  argument.
- Fixes two tests which contradict the docs.

This is needed as the lowering to GlobalISel is different for ImmArgs
compared to other constants. The non-ImmArgs create a G_CONSTANT MIR
instruction, the for ImmArgs the constant is put directly on the
intrinsic's MIR instruction as an immediate.

Differential Revision: https://reviews.llvm.org/D132042
2022-08-19 09:11:17 +01:00
Craig Topper ba1f4cab44 [RISCV] Copy SDNodeFlags in lowerToScalableOp.
Reviewed By: arcbbb

Differential Revision: https://reviews.llvm.org/D132177
2022-08-18 20:42:59 -07:00
Craig Topper 5349aa2354 [RISCV] Copy SDNodeFlags in doPeepholeMaskedRVV and doPeepholeMergeVVMFold
Especially the NoFPExcept flag for FP.

Reviewed By: fakepaper56

Differential Revision: https://reviews.llvm.org/D132173
2022-08-18 20:42:46 -07:00
Craig Topper 550fab53e1 [RISCV] Fold (sub C, (xor (setcc), 1)) -> (add (setcc), C-1).
Extracted from D131729 where we handled C==0. It's now generalized
to more constants.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D132000
2022-08-17 09:50:08 -07:00
Alex Bradbury ce38128194 [RISCV] Avoid redundant branch-to-branch when expanding cmpxchg
If the success value of a cmpxchg is used in a branch, the expanded
cmpxchg sequence ends up with a redundant branch-to-branch (as the
backend atomics expansion happens as late as possible, passes to
optimise such cases have already run). This patch identifies this case
and avoid it when expanding the cmpxchg.

Note that a similar optimisation is possible for a BEQ on the cmpxchg
success value. As it's hard to imagine a case where real-world code may
do that, this patch doens't handle that case.

Differential Revision: https://reviews.llvm.org/D130192
2022-08-17 13:49:15 +01:00
Craig Topper 39707c1a9a [RISCV] Add test coverage for (select (icmp X, Y), float, float). NFC
We fold integer setcc into SELECT_CC during DAG combine even if
the SELECT_CC has FP result type, but we had no test coverage.
2022-08-16 21:28:26 -07:00
Craig Topper d8cdd78b6c [RISCV] Add test cases to show missed opportunity to fold (sub C, (xor (setcc), 1)). NFC
(sub C, (xori X, 1)) can be folded to (add X, C-1) if X is 0 or 1.

This would avoid the xori and in some cases remove an instruction
neede to materialize the constant.
2022-08-16 16:40:53 -07:00
Craig Topper 53ce22e429 Recommit "[RISCV] Use setcc's original SDLoc when inverting it in performSUBCombine."
This time using N1 instead of N0 since N1 points to the original
setcc. This now affects scheduling as I expected.

Original commit message:
We change seteq<->setne but it doesn't change the semantics
of the setcc. We should keep original debug location. This is
consistent with visitXor in the generic DAGCombiner.
2022-08-16 15:51:07 -07:00
Craig Topper b5a18de651 [RISCV] Remove C!=0 restriction from (sub C, (setcc x, y, eq/neq)) -> (add C-1, (setcc x, y, neq/eq)).
While (sub 0, X) can use x0 for the 0, I believe (add X, -1) is
still preferrable. (addi X, -1) can be compressed, sub with x0 on
the LHS is never compressible.
2022-08-16 14:49:52 -07:00
Craig Topper de6fd16971 [RISCV] Don't fold (sub C, (setcc x, y, eq/neq)) -> (add C-1, (setcc x, y, neq/eq)) if C-1 isn't simm12.
We still need to materialize the constant in a register and we
may not be removing all uses of the original constant so it may
increase code size.
2022-08-16 14:11:31 -07:00
Craig Topper 1180ed41ee [RISCV] Add more test cases for (sub C, (setcc x, y, eq/neq)) -> (add C-1, (setcc x, y, neq/eq)). NFC
In these test cases we do the transform, but the immediate is too
large to form an ADDI so it didn't save any instructions.

If the constant is opaque or has additional users we shouldn't do
the transform if it doesn't form an ADDI.
2022-08-16 14:08:42 -07:00
Craig Topper 4854fa217f [RISCV] Move test from setcc-logic.ll to select-const.ll. NFC
Also add setne version of the test.

Add some common prefixes to reduce number of identical CHECK lines.
2022-08-16 14:08:42 -07:00
Craig Topper 4184edc691 [RISCV] (sub C, (setcc x, y, eq/neq)) -> (add C-1, (setcc x, y, neq/eq)) fold for FP setcc.
This introduce an xori in some cases. I don't believe it was the
intention of the original patch. This was an accident because
nonan FP equality compares also use SETEQ/SETNE.

Also pass the correct type to getSetCCInverse.
2022-08-16 13:00:36 -07:00
Craig Topper 87e7837293 [RISCV] Add test cases to show where we inverted a fp setcc and introduced an extra xori.
In these tests we had (sub C, (seteq X, Y)) which we converted to
the (add (setne X, Y), C-1). We don't have a FNE compare instruction
so this created an XORI to invert an FEQ instruction.

This might be a good idea since it can save a constant materialization,
but does not appear to be the intention of the original patch.
2022-08-16 12:59:16 -07:00
Craig Topper 7a73ab5818 [RISCV] Enable isTruncateFree in SDAG for i64->i32 on rv64.
We have a good selection of W instructions, so promoting a truncated
value back to i64 is often free.

This appears to be a net code size reduction on SPECINT2006.

This has been split from D130397 as one of the patches needed to
complete that.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D131819
2022-08-15 08:32:51 -07:00
Simon Pilgrim 3a73133217 [DAG] canCreateUndefOrPoison - add freeze(sign_extend_inreg(x,vt)) -> sign_extend_inreg(freeze(x),vt) support
Guaranteed not to create undef/poison
2022-08-15 12:18:59 +01:00
Simon Pilgrim 7e294e676e [DAG] canCreateUndefOrPoison - add freeze(assertsext/zext(x,bt)) -> assertsext/zext(freeze(x),vt) support
These are guaranteed not to create undef/poison (although they may pass through) - the associated ISD::VALUETYPE node is also guaranteed never to generate poison
2022-08-15 11:13:43 +01:00
Craig Topper b8c5420d74 [X86][RISCV] Pre-commit tests for D130862. NFC
Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D131442
2022-08-14 16:31:15 -07:00
LiaoChunyu 99ef0ddea3 [RISCV] Fold (sub constant, (setcc x, y, eq/neq)) -> (add constant - 1, (setcc x, y, neq/eq))
(setcc x, y, eq/neq) are seqz, snez that set rd = 0/1.

addi is used to process immediate, which can save instructions for load immediate.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D131471
2022-08-13 20:37:57 +08:00
Craig Topper e493944f5f [RISCV] Use SLTIU X, -1 for (setne X, -1).
Since -1 is the maximum unsigned value, all values less than it
are not equal to it.
2022-08-11 15:36:04 -07:00
Craig Topper 2c79801a0e [RISCV] Add more ineg+setcc isel patterns to avoid creating neg+xori+slti(u).
Including patterns to select addiw if only the lower 32 bits are used.

I'm not excited about adding this many patterns. I'm looking at whether
we can create the xori during lowering and move the ineg patterns to
DAGCombiner.
2022-08-11 14:24:09 -07:00
Yeting Kuo 875694089d [RISCV] Peephole optimization to fold merge.vvm and unmasked intrinsics.
The patch uses peephole method to fold merge.vvm and unmasked intrinsics to
masked intrinsics. Using peephole intead of tablegen patterns is to avoid large
auto gnerated code.

Note: The patch ignores segment loads since I don't know how to test them.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D130442
2022-08-11 17:58:11 +08:00
Yeting Kuo 7050f2102e [RISCV][test] Precommitted test for optimization for vmerge and unmasked intrinsics.
Precommitted test cases for D130442.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D130753
2022-08-11 13:23:08 +08:00
Alex Bradbury 47b1f8362a [RISCV] Implement isUsedByReturnOnly TargetLowering hook in order to tailcall more libcalls
Prior to this patch, libcalls inserted by the SelectionDAG legalizer
could never be tailcalled. The eligibility of libcalls for tail calling
is is partly determined by checking TargetLowering::isInTailCallPosition
and comparing the return type of the libcall and the calleer.
isInTailCallPosition in turn calls TargetLowering::isUsedByReturnOnly
(which always returns false if not implemented by the target).

This patch provides a minimal implementation of
TargetLowering::isUsedByReturnOnly - enough to support tail calling
libcalls on hard float ABIs. Soft-float ABIs are left for a follow on
patch. libcall-tail-calls.ll also shows missed opportunities to tail
call integer libcalls, but this is due to issues outside of
the isUsedByReturnOnly hook.

Differential Revision: https://reviews.llvm.org/D131087
2022-08-10 10:50:29 +01:00
Philip Reames 46fe24e2fb [RISCV] Split check lines for fpclamptosat_vec test
This is currently exercising scalarization code path; with vectors enabled, we hit a different code path.  Explicitly exercise both so that both configurations have testing.
2022-08-09 15:03:29 -07:00
Philip Reames 8800b1103a [RISCV] Pin a test to scalar lowering to preserve test intent [nfc]
In an upcoming change to enable fixed length vector lowering via vector registers, the codepath exercised would change.  Pin this to the old lowering.
2022-08-09 10:12:18 -07:00
Nikita Popov f5ed0cb217 [RISCV] Add target feature to force-enable atomics
This adds a +forced-atomics target feature with the same semantics
as +atomics-32 on ARM (D130480). For RISCV targets without the +a
extension, this forces LLVM to assume that lock-free atomics
(up to 32/64 bits for riscv32/64 respectively) are available.

This means that atomic load/store are lowered to a simple load/store
(and fence as necessary), as these are guaranteed to be atomic
(as long as they're aligned). Atomic RMW/CAS are lowered to __sync
(rather than __atomic) libcalls. Responsibility for providing the
__sync libcalls lies with the user (for privileged single-core code
they can be implemented by disabling interrupts). Code using
+forced-atomics and -forced-atomics are not ABI compatible if atomic
variables cross the ABI boundary.

For context, the difference between __sync and __atomic is that the
former are required to be lock-free, while the latter requires a
shared global lock provided by a shared object library. See
https://llvm.org/docs/Atomics.html#libcalls-atomic for a detailed
discussion on the topic.

This target feature will be used by Rust's riscv32i target family
to support the use of atomic load/store without atomic RMW/CAS.

Differential Revision: https://reviews.llvm.org/D130621
2022-08-09 16:04:46 +02:00
Shubham Narlawar ab4fc87a9d [DAG] Emit table lookup from TargetLowering::expandCTTZ()
This patch emits table lookup in expandCTTZ.

Context -
https://reviews.llvm.org/D113291 transforms set of IR instructions to
cttz intrinsic but there are some targets which does not support CTTZ or
CTLZ. Hence, I generate a table lookup in TargetLowering::expandCTTZ().

Differential Revision: https://reviews.llvm.org/D128911
2022-08-08 12:08:05 +01:00
Simon Pilgrim e5e93b6130 [DAG] FoldConstantArithmetic - add initial support for undef elements in bitcasted binop constant folding
FoldConstantArithmetic can fold constant vectors hidden behind bitcasts (e.g. vXi64 -> v2Xi32 on 32-bit platforms), but currently bails if either vector contains undef elements. These undefs can often occur due to SimplifyDemandedBits/VectorElts calls recognising that the upper bits are often unnecessary (e.g. funnel-shift/rotate implicit-modulo and AND masks).

This patch adds a basic 'FoldValueWithUndef' handler that will attempt to constant fold if one or both of the ops are undef - so far this just handles the AND and MUL cases where we always fold to zero.

The RISCV codegen increase is interesting - it looks like the BUILD_VECTOR lowering was loading a constant pool entry but now (with all elements defined constant) it can materialize the constant instead?

Differential Revision: https://reviews.llvm.org/D130839
2022-08-08 11:53:56 +01:00
Craig Topper 75c64c7c4e [RISCV] Don't use li+sh3add for constants that can use lui+add.
If we're adding a constant that can't use addi we try a few tricks,
one of which is using li+sh3add. We should not do this if lui+add
would work. For example adding 8192. Using sh3add prevents folding
a sext.w to form addw, thus increasing instruction count.
2022-08-05 12:47:03 -07:00
Philip Reames 9a9848f4b9 [RISCVInsertVSETVLI] Remove an unsound optimization
This fixes a bug reported privately by @craig.topper. Here's an example which illustrates the problem:

vsetivli a1, a0, e32, m1, ta, mu # both DefInfo and PrevInfo
vsetivli a2, a1, e32, m4, ta, mu

With the unsound result being:

vsetivli a1, a0, e32, m1, ta, mu
vsetivli a2, a0, e32, m4, ta, mu

Consider the case where this is running on a machine with VLEN=512,. For this case, the VLMAXs are 16 and 64 respectively.

Consider for a0 = 33. The correct result is: a1 = 16, and a2 = 16

After the unsound optimization: a1 = 16 and a2 = 33

This particular example used VLMAXs which differed by more than a power of two. With a difference of only one power of two, there's another form of this bug which involves the AVL < 2 x VLMAX special case, but that ones more complicated to construct as many examples turn out accidentally sound.

This patch takes the approach of simply removing the unsound optimization, but there are multiple sound sub-cases of it. I plan to return to at least a couple of them, but figured it was cleaner to remove the unsound optimization (for ease of backporting), and then review the new optimizations on their own.

Differential Revision: https://reviews.llvm.org/D131264
2022-08-05 12:13:08 -07:00
Craig Topper 12a1ca9c42 [RISCV] Relax another one use restriction in performSRACombine.
When folding (sra (add (shl X, 32), C1), 32 - C) -> (shl (sext_inreg (add X, C1), i32), C)
it's possible that the add is used by multiple sras. We should
allow the combine if all the SRAs will eventually be updated.

After transforming all of the sras, the shls will share a single
(sext_inreg (add X, C1), i32).

This pattern occurs if an sra with 32 is used as index in multiple
GEPs with different scales. The shl from the GEPs will be combined
with the sra before we get a chance to match the sra pattern.
2022-08-04 14:32:31 -07:00
Craig Topper a2de12c987 [RISCV] Relax a one use restriction performSRACombine
When folding (sra (add (shl X, 32), C1), 32 - C) -> (shl (sext_inreg (add X, C1), C)
ignore the use count on the (shl X, 32).

The sext_inreg after the transform is free. So we're only making
2 new instructions, the add and the shl. So we only need to be
concerned with replacing the original sra+add. The original shl
can have other uses. This helps if there are multiple different
constants being added to the same shl.
2022-08-04 11:25:08 -07:00
Lorenzo Albano 74940d2668 [VP] Add widening for VP_STRIDED_LOAD and VP_STRIDED_STORE
Reviewed By: frasercrmck, craig.topper

Differential Revision: https://reviews.llvm.org/D121114
2022-08-04 16:12:01 +02:00
wanglian b6b0690355 [LegalizeTypes][VP] Add split operand support for VP float and integer casting
Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D130685
2022-08-04 15:41:50 +08:00
Craig Topper 53d560b22f [RISCV] Prevent infinite loop after D129980.
D129980 converts (seteq (i64 (and X, 0xffffffff)), C1) into
(seteq (i64 (sext_inreg X, i32)), C1). If bit 31 of X is 0, it
will be turned back into an 'and' by SimplifyDemandedBits which
can cause an infinite loop.

To prevent this, check if bit 31 is 0 with computeKnownBits before
doing the transformation.

Fixes PR56905.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D131113
2022-08-03 15:19:07 -07:00
Alex Bradbury 9e966dd298 [RISCV][test] Add test for ability to tailcall libcalls
Although there's good coverage of the libcalls within llvm/test/CodeGen,
it's useful to have tests for all ABI and hard/soft-float combinations
in order to properly test the logic that enables libcall tail calls
(which will be implemented in a follow-up patch).
2022-08-03 19:27:59 +01:00
Alex Bradbury 28f12a09ae [RISCV] Teach ComputeNumSignBitsForTargetNode about masked atomic intrinsics
An unnecessary sext.w is generated when masking the result of the
riscv_masked_cmpxchg_i64 intrinsic. Implementing handling of the
intrinsic in ComputeNumSignBitsForTargetNode allows it to be removed.

Although this isn't a particularly important optimisation, removing the
sext.w simplifies implementation of an additional cmpxchg-related
optimisation in D130192.

Although I can't produce a test with different codegen for the other
atomics intrinsics, these are added as well for completeness.

Differential Revision: https://reviews.llvm.org/D130191
2022-08-03 13:41:58 +01:00
Craig Topper c2d0685286 [RISCV] Simplify test case from D130931. NFC 2022-08-01 16:50:56 -07:00
Craig Topper da5b1bf5bb [RISCV] Teach RISCVMergeBaseOffset to merge %lo/%pcrel_lo into load/store after folding arithmetic.
It's possible we have:
lui  a0, %hi(sym)
addi a0, %lo(sym)
addi a0, <offset1>
lw a0, <offset2>(a0)

We want to arrive at
lui a0, %hi(sym+offset1+offset2)
lw a0, %lo(sym+offset1+offset2)

We currently fail to do this because we only consider loads/stores
if we didn't find any arithmetic.

This patch splits arithmetic folding and load/store folding into
two separate phases. The load/store folding can no longer assume
the offset in hi/lo is 0 so we must combine the offsets. I've applied
the same simm32 limit that we applied in the arithmetic folding.

Reviewed By: luismarques

Differential Revision: https://reviews.llvm.org/D130931
2022-08-01 15:33:21 -07:00
Craig Topper e07a8155f5 [RISCV] Move Pre-RA pseudo expansion from addMachineSSAOptimization to addPreRegAlloc.
addMachineSSAOptimization is skipped for -O0, but this pass is
required for -O0.
2022-08-01 13:44:43 -07:00
Craig Topper f9b05e6dad [RISCV] Pre-commit tests for D130931. NFC 2022-08-01 12:57:09 -07:00
Craig Topper 450edb0b37 [RISCV] Explicitly select second operand of branch condition to X0.
At least based on the lit tests, the coalescer sometimes fails to
propagate the copy from X0 into the branch instruction. This patch
does it manually during isel. The majority of the changes are from
the select patterns.

Some of the changes are just register allocation changes. Only
the Select change affects the whether a b*z instruction is generated
in the tests. I changed the branch pattern for consistency.

Reviewed By: asb

Differential Revision: https://reviews.llvm.org/D130809
2022-08-01 11:16:48 -07:00
Lorenzo Albano 71b7c03fd6 [RISCV][VP] Custom lower VP_STRIDED_LOAD and VP_STRIDED_STORE
Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D121113
2022-08-01 09:23:45 -07:00
Luís Marques 0bc177b6f5 [RISCV] Extend the Merge Base Offset pass to handle AUIPC+ADDI
Builds upon D123264, adding support for merging the low part of the LLA
address into the load/store instruction offsets.

Differential Revision: https://reviews.llvm.org/D123265
2022-08-01 11:30:02 +02:00
Luís Marques 260a641068 [RISCV] Pre-RA expand pseudos pass
Expand load address pseudo-instructions earlier (pre-ra) to allow follow-up
patches to fold the addi of PseudoLLA instructions into the immediate
operand of load/store instructions.

Differential Revision: https://reviews.llvm.org/D123264
2022-07-31 23:19:00 +02:00
Craig Topper d21b315360 [RISCV] Remove vmerges from vector ceil, floor, trunc lowering.
Use masked operations to suppress spurious exception bits being
set in fflags. Unfortunately, doing this adds extra copies.
2022-07-30 10:58:41 -07:00
Luís Marques 383bc7210e [RISCV] Precommit test for D123265
Differential Revision: https://reviews.llvm.org/D128562
2022-07-30 00:13:42 +02:00
Craig Topper e637feee80 [RISCV] Add isel pattern for (setne/eq GPR, -2048)
For constants in the range [-2047, 2048] we use addi. If the constant
is -2048 we can use xori. If we don't match this explicitly, we'll
emit an LI for the -2048 followed by an XOR.
2022-07-29 14:07:38 -07:00